Skip to content

Kafka 资源监控

前置条件

  • 环境中存在Kafka服务(新版不需要zk)

监控范围

  • Broker
    • kafka_brokers:broker节点数,指 Kafka 集群中 broker 节点的数量。Brokers 是 Kafka 集群中的服务器,负责存储数据并处理客户端请求。它们的职责是接收、存储和向生产者和消费者提供数据。
  • Produce
    • sum(rate(kafka_topic_partition_current_offset[1m])) by (topic):计算每个 Kafka 主题在过去一分钟内 offset 增长的速率。current_offset 是某个主题分区的最新 offset,表示该主题写入了多少数据。用于了解数据被生产到各个主题的速度。
    • sum(increase(kafka_topic_partition_current_offset[5m])/5) by (topic):计算每个主题在 5 分钟间隔内 offset 的平均增长。相较于前一个指标,这个提供了一个更长时间段(5 分钟)的生产速率的平均视图。
    • sum by(topic) (kafka_topic_partitions):计算每个主题的分区总数。知道每个主题的分区数对于理解 Kafka 集群内的数据分布和可扩展性是很重要的。
  • Consumer
    • sum(kafka_consumergroup_lag) by (consumergroup, topic):计算每个消费组在每个主题上的滞后量。滞后是指一个主题分区中的最新消息和消费组已经读取的最新消息之间的差距。监控消费者滞后是理解消费者处理消息的进度以及诊断性能瓶颈或识别表现不佳的消费组的关键。
    • sum(delta(kafka_consumergroup_current_offset[5m])/5) by (consumergroup, topic):计算消费者组在 5 分钟间隔内处理 offset 的平均速率。了解消费者组处理消息的速率。
    • sum(increase(kafka_topic_partition_current_offset[5m])) by(topic) - sum(increase(kafka_consumergroup_current_offset[5m])) by(topic) > 0:检查每个主题的消息处理是否存在滞后。用于检测消费者组在处理消息时落后的主题,从而可以主动干预以解决潜在的性能问题。

安装Exporter

apiVersion: v1
kind: Secret
type: Opaque
metadata:
  labels:
    app: kafka
  name: kafka-jaas
  namespace: infra

data:
  client-password: ""
  inter-broker-password: ""
  zookeeper-password: ""

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: kafka
    app.kubernetes.io/name: kafka
  name: kafka-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: metrics
      app.kubernetes.io/instance: kafka
      app.kubernetes.io/name: kafka
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/component: metrics
        app.kubernetes.io/instance: kafka
        app.kubernetes.io/name: kafka
    spec:
      containers:
      - command:
        - /bin/bash
        - -ec
        - |
          kafka_exporter \
          --kafka.server=kafka-0.kafka-headless.infra.svc.cluster.local:9092 \
          --kafka.server=kafka-1.kafka-headless.infra.svc.cluster.local:9092 \
          --kafka.server=kafka-2.kafka-headless.infra.svc.cluster.local:9092 \
          --web.listen-address=:9308
        env:
        - name: SASL_USER_PASSWORD
          valueFrom:
            secretKeyRef:
              key: client-password
              name: kafka-jaas
        image: docker.io/bitnami/kafka-exporter:1.2.0-debian-10-r131
        imagePullPolicy: IfNotPresent
        name: kafka-exporter
        ports:
        - containerPort: 9308
          name: metrics
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: kafka
    app.kubernetes.io/name: kafka
  name: kafka-metrics
  namespace: monitoring
spec:
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: http-metrics
    port: 9308
    protocol: TCP
    targetPort: metrics
  selector:
    app.kubernetes.io/component: metrics
    app.kubernetes.io/instance: kafka
    app.kubernetes.io/name: kafka

Prometheus 端点配置

    - job_name: 'Kafka'
      kubernetes_sd_configs:
        - role: endpoints
      relabel_configs:
      - source_labels:
          [
            __meta_kubernetes_namespace,
            __meta_kubernetes_service_name,
          ]
        action: keep
        regex: monitoring;kafka-metrics

监控大盘

Dashboard JSONimg.pngimg_1.png

Released under the GPL License.