Deploying OTEL Collector as Daemonset

December 02, 2023

The open telemetry collector is one of the most popular CNCF projects that attempts to standardise telemetry data collection, processing, and export. At Pixxel, we use OTEL Collector to discover targets in a k8s cluster, scrape metrics from the configured endpoint and export the metrics to the Grafana Mimir server. This is a mix of both push and pull approaches. The pull part refers to discovering targets and scraping the endpoints. The push part refers to remote-writing the metrics to a Monitoring server.

An alternative way to achieve this setup is to run Prometheus in agent mode which can scrape targets and remotewrite to desired endpoint. However, a significant drawback is fault tolerance. One Prometheus pod per cluster(agent or normal mode) is a single point of failure. If that pod becomes unhealthy, you may lose metrics for the whole cluster. However, these limitations can be overcome by proper resource/limit configuration to signal the priority of the pod to the Kubernetes cluster or by correct retry/buffer configuration that presents you with more ops work. Moreover, you may need to revisit these configurations as and when the workload increases in the cluster. It feels more work than a setup that distributes the scraping workload to many replicas(daemon sets). The OTEL collector was a no-brainer choice since Prometheus doesn’t support a daemon set deployment. Although there are other collectors that support daemonset deployment: like Grafana Agent.

The following section describes configuring the OTEL collector as a daemon set. The fun part is leveraging Kubernetes pod service discovery with a node name filter. Since there weren’t many good tutorials available on this deployment mode, I decided to write one :P

Setup

Let’s use the OTEL Collector helm chart to deploy it as Daemonset. I’ll be using Minikube for the demonstration below. Let’s start a local Kubernetes cluster with two nodes.

minikube start --nodes 2 -p local-k8s-cluster

Add OpenTelemety repo to Helm:

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

Service Discovery

We want to scrape only those pods running on the same node for each OTEL collector pod. We can achieve that in 2 steps:

Discover all pods using kubernetes_sd_config. We will need to provide cluster roles to our daemon set to discover pods.
Filter using nodeName spec. ref.

scrape_configs:
  - job_name: otel-daemon-scraping
    scheme: http
    kubernetes_sd_configs:
      - role: pod
        selectors:
          - role: pod
            # only scrape data from pods running on the same node as the collector
            # assuming KUBE_NODE_NAME env will be set in collector pods env
            field: 'spec.nodeName=${env:KUBE_NODE_NAME}'

With the nodeName field selector filter, we can pick only targets on the same node. The node name is read from the env variable(assuming it is set in the env of the pods during provisioning). Prometheus doesn’t support env variables in the config file, which limits its usage as a daemonset.

After the pods have been discovered per node, we need a way to find the port number and path on the pods to scrape metrics. We can leverage pod labels and annotations for this. OTEL Collector provides the pod label and annotations inside the metric label for that target, which we can read and take required action using relabeling(relabel_configs):

relabel_configs:
  # scrape pods annotated with "prometheus.io/scrape: true"
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    regex: true
    action: keep
  # read the port from "prometheus.io/port: <port>" annotation and update scraping address accordingly
  - source_labels:
      [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    # escaped $1:$2
    replacement: $$1:$$2

The first item in the relabel_configs array checks for the presence of the annotation prometheus.io/scrape: true on the discovered target. Only those targets that match the regex specified in the keep block are kept. keep and drop actions allow us to filter out targets and metrics based on whether our label values match the provided regex. The second item in the array configures the host and port of the target. The scheme is set to http by default. The path is set to /metrics by default. These configs can also be overridden by relabelling but we will go with the defaults for brevity.

So far, we have established the receiver configuration of the OTEL Collector. To complete the demonstration, end to end, I will run a Prometheus Server where our collector pods will write the metrics. We can set up a Grafana instance to query from this server afterwards.

Metric storage with Prometheus

Here is a Kubernetes manifest to create a Prometheus deployment enabled with remote write, exposed via a K8s Service.

# prometheus.yml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server-conf
data:
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
    scrape_configs:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  labels:
    app: prometheus-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-server
  template:
    metadata:
      labels:
        app: prometheus-server
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus
          args:
            - '--storage.tsdb.retention.time=12h'
            - '--config.file=/etc/prometheus/prometheus.yml'
            - '--storage.tsdb.path=/prometheus/'
            - '--enable-feature=remote-write-receiver'
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: prometheus-config-volume
              mountPath: /etc/prometheus/
            - name: prometheus-storage-volume
              mountPath: /prometheus/
      volumes:
        - name: prometheus-config-volume
          configMap:
            defaultMode: 420
            name: prometheus-server-conf
        - name: prometheus-storage-volume
          emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus
spec:
  selector:
    app: prometheus-server
  ports:
    - protocol: TCP
      port: 9090
      targetPort: 9090

Apply the above helm override to our cluster.

kubectl apply -f prometheus.yml

After applying the above file to our cluster we will have a working Prometheus deployment ready to accept remote writes. It is exposed via a kubernetes service which can be accessed via this DNS:

http://prometheus.default.svc.cluster.local:9090

Collector config

Now is the time to push some data to Prometheus. Let’s deploy our OTEL collector with the proper configuration using the Helm override file below.

# otel-override.yaml
mode: 'daemonset'
extraEnvs:
  - name: KUBE_NODE_NAME
    valueFrom:
      fieldRef:
        fieldPath: spec.nodeName

config:
  exporters:
    logging:
      verbosity: detailed
    prometheusremotewrite:
      endpoint: http://prometheus.default.svc.cluster.local:9090/api/v1/write
      external_labels:
        collector: otel-collector

  receivers:
    prometheus:
      config:
        scrape_configs:
          - job_name: metrics-exporter
            scheme: http
            kubernetes_sd_configs:
              - role: pod
                selectors:
                  - role: pod
                    # # only scrape data from pods running on the same node as prometheus
                    field: 'spec.nodeName=${env:KUBE_NODE_NAME}'
            relabel_configs:
              # scrape pods annotated with "prometheus.io/scrape: true"
              - source_labels:
                  [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
                regex: true
                action: keep
              # read the port from "prometheus.io/port: <port>" annotation and update scraping address accordingly
              - source_labels:
                  [
                    __address__,
                    __meta_kubernetes_pod_annotation_prometheus_io_port
                  ]
                action: replace
                target_label: __address__
                regex: ([^:]+)(?::\d+)?;(\d+)
                # escaped $1:$2
                replacement: $$1:$$2

  service:
    pipelines:
      metrics:
        receivers: [prometheus]
        exporters: [prometheusremotewrite]

clusterRole:
  create: true
  rules:
    - apiGroups: ['']
      resources:
        - pods
      verbs: ['get', 'list', 'watch']

  clusterRoleBinding:
    name: 'otel-discoverer'

ports:
  otlp:
    enabled: false
  otlp-http:
    enabled: false
  jaeger-compact:
    enabled: false
  jaeger-thrift:
    enabled: false
  jaeger-grpc:
    enabled: false
  zipkin:
    enabled: false
  metrics:
    enabled: false

This can be applied to cluster: helm install otel-collector open-telemetry/opentelemetry-collector -f otel-override.yaml

Let’s go through some important aspects of this Helm override:

We set the deployment to daemon set using mode and added an extra env var to capture the node name on each daemon pod of OTEL Collector.
The metrics pipeline is built using the prometheus receiver and prometheusremotewrite exporter defined in their respective sections. If you are unfamiliar with receivers and exporters, here is a reference.
The receiver has rules for pod discovery using pod annotations. It picks only those pods that are running on the same node using the nodeName field selector. The nodeName is specified in a templated way, spec.nodeName=${env:KUBE_NODE_NAME}, which will render to node name during runtime.
The exporter writes to the Prometheus server at its remote write endpoint. It also places a label collector: otel-collector in all the series. Such “global” labels can be crucial when running multiple Kubernetes clusters in the same/different cloud accounts(AWS Account/GCP project).
The required cluster role and binding are specified to discover pods.
We also don’t open the OTEL Container Ports that are not needed in the scope of this article.

Node Exporter workload

Let’s run some actual workload and scrape metrics from it. Node Exporter is an utility that exposes machine metrics. For our demonstration, we will run this inside Kubernetes cluster as deamonset. This will ensure there is one pod per minikube node. These pods can be scraped with our OTEL Collector pods.

We will install Node Exporter using the helm chart with all defailt with few additions. The pod annotations will be added for discovering these pods. Since NodeExporter by default exposes the metrics in http scheme and /metrics path, we won’t need to specify it explicitly as our collectors already know that.

Let’s first add the repo:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

For scraping the node exporter pod, we require proper pod annotation that can be specified using the helm override file.

# node-exporter-override.yml
podAnnotations:
  prometheus.io/scrape: "true"
  prometheus.io/port: "9100"

Install NodeExporter using helm with above mentioned override file:

helm install node-exporter prometheus-community/prometheus-node-exporter -f node-exporter-override.yml

With all components(Node Exporter daemon, OTEL Collector daemon and prometheus pod) up and running, this is how the cluster should look like:

➜  kubectl get pods
NAME                                                 READY   STATUS    RESTARTS   AGE
node-exporter-prometheus-node-exporter-h2vxr         1/1     Running   0          7m
node-exporter-prometheus-node-exporter-kcds4         1/1     Running   0          7m
otel-collector-opentelemetry-collector-agent-9rxm7   1/1     Running   0          5m1s
otel-collector-opentelemetry-collector-agent-mbq6x   1/1     Running   0          5m1s
prometheus-5b54c7c696-4kzjj                          1/1     Running   0          6m4h

Querying data

If everything goes right, you should see the data from node_exporter daemons in Prometheus. The simplest way to query is to pop up the prometheus UI on your browser and query for some data.

You can access the Prometheus UI on your browser by port-forwarding the Prometheus Kubernetes service (make sure your port 9090 on your local machine is free to use)

kubectl port-forward svc/prometheus 9090:9090

Navigate to http://localhost:9090 and execute the query shown in the image below. Verify if you are getting a one-time series row per node_exporter.

We can also spin up a Grafana instance for better visualisation: Port-forward Prometheus service locally and add the localhost:port as Data Source(also called Connections) in this Grafana instance.

Debugging

If something doesn’t work as expected, you can use debug logs to see the errors.

OTEL collector debug log: Modify the “service” section to add additional debug config.

service:
  telemetry:
    logs:
      level: "debug"
  pipelines:
    metrics:
      receivers: [prometheus]
      exporters: [prometheusremotewrite]

Prometheus debug logs can be enabled via the flag "--log.level=debug" in the spec.containers.arg list

To recap, we looked into how to deploy OTEL collectors as daemons on a Kubernetes cluster and discover pods local to the collector’s node. We also looked into how to scrape metrics from the discovered pods and export these metrics to a remote server. We ran node exporter pods and scraped metrics from them. We also visualised the metrics in Prometheus UI. In the end, we looked into how to debug in case things don’t work as expected. That’s all I had to document for OTEL Collector usage. Stay tuned for more such blogs!