美文网首页
jvm-exporter整合k8s+prometheus监控报警

jvm-exporter整合k8s+prometheus监控报警

作者: Foghost | 来源:发表于2020-11-09 16:57 被阅读0次

    文章背景:使用Prometheus+Grafana监控JVM,这片文章中介绍了怎么用jvm-exporter监控我们的java应用,在我们的使用场景中需要监控k8s集群中的jvm,接下来谈谈k8s和Prometheus的集成扩展使用,假设我们已经成功将Prometheus部署到我们的k8s集群中了kubernetes集成prometheus+grafana监控,但是kube-prometheus并没有集成jvm-exporter,这就需要我们自己操作。

    1. 将jvm-exporter整合进我们的应用

    整合过程很简单,只需要将jvm-exporter作为javaagent加入到我们的java启动命令就可以了,详细见使用Prometheus+Grafana监控JVM

    1. 配置Prometheus服务自动发现

    对于有Service暴露的服务我们可以用 prometheus-operator 项目定义的ServiceMonitorCRD来配置服务发现,配置模板如下:

    --- # ServiceMonitor 服务自动发现规则
    apiVersion: monitoring.coreos.com/v1
    kind: ServiceMonitor # prometheus-operator 定义的CRD
    metadata:
      name: jmx-metrics
      namespace: monitoring
      labels:
        k8s-apps: jmx-metrics
    spec:
      jobLabel: metrics #监控数据的job标签指定为metrics label的值,即加上数据标签job=jmx-metrics
      selector:
        matchLabels:
          metrics: jmx-metrics # 自动发现 label中有metrics: jmx-metrics 的service
      namespaceSelector:
        matchNames: # 配置需要自动发现的命名空间,可以配置多个
        - my-namespace
      endpoints:
      - port: http-metrics # 拉去metric的端口,这个写的是 service的端口名称,即 service yaml的spec.ports.name
        interval: 15s # 拉取metric的时间间隔
    
    --- # 服务service模板
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        metrics: jmx-metrics # ServiceMonitor 自动发现的关键label
      name: jmx-metrics
      namespace: my-namespace
    spec:
      ports:
      - name: http-metrics #对应 ServiceMonitor 中spec.endpoints.port
        port: 9093 # jmx-exporter 暴露的服务端口
        targetPort: http-metrics # pod yaml 暴露的端口名
      selector:
        metrics: jmx-metrics # service本身的标签选择器
    

    以上配置了my-namespace命名空间的 jmx-metrics Service的服务自动发现,Prometheus会将这个service 的所有关联pod自动加入监控,并从apiserver获取到最新的pod列表,这样当我们的服务副本扩充时也能自动添加到监控系统中。

    那么对于没有创建 Service 的服务,比如以HostPort对集群外暴露服务的实例,我们可以使用 PodMonitor 来做服务发现,相关样例如下:

    --- # PodMonitor 服务自动发现规则,最新的版本支持,旧版本可能不支持
    apiVersion: monitoring.coreos.com/v1
    kind: PodMonitor # prometheus-operator 定义的CRD
    metadata:
      name: jmx-metrics
      namespace: monitoring
      labels:
        k8s-apps: jmx-metrics
    spec:
      jobLabel: metrics #监控数据的job标签指定为metrics label的值,即加上数据标签job=jmx-metrics
      selector:
        matchLabels:
         metrics: jmx-metrics # 自动发现 label中有metrics: jmx-metrics 的pod
      namespaceSelector:
        matchNames: # 配置需要自动发现的命名空间,可以配置多个
        - my-namespace
      podMetricsEndpoints:
      - port: http-metrics # Pod yaml中 metric暴露端口的名称 即 spec.ports.name
        interval: 15s # 拉取metric的时间间隔
    --- # 需要监控的Pod模板
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        metrics: jmx-metrics
      name: jmx-metrics
      namespace: my-namespace
    spec:
      containers:
      - image: tomcat:9.0
        name: tomcat
        ports:
        - containerPort: 9093
          name: http-metrics
    
    1. 为Prometheus serviceAccount 添加对应namespace的权限
    --- # 在对应的ns中创建角色
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: prometheus-k8s
      namespace: my-namespace
    rules:
    - apiGroups:
      - ""
      resources:
      - services
      - endpoints
      - pods
      verbs:
      - get
      - list
      - watch
    --- # 绑定角色 prometheus-k8s 角色到 Role
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: prometheus-k8s
      namespace: my-namespace
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: prometheus-k8s
    subjects:
    - kind: ServiceAccount
      name: prometheus-k8s # Prometheus 容器使用的 serviceAccount,kube-prometheus默认使用prometheus-k8s这个用户
      namespace: monitoring
    
    1. 在Prometheus管理页面中查看服务发现

    服务发现配置成功后会出现在Prometheus的管理界面中:

    image.png
    1. 添加报警规则

    新建报警规则文件:jvm-alert-rules.yaml,填入以下内容

    apiVersion: monitoring.coreos.com/v1
    kind: PrometheusRule
    metadata:
      labels:
        prometheus: k8s
        role: alert-rules
      name: jvm-metrics-rules
      namespace: monitoring
    spec:
      groups:
      - name: jvm-metrics-rules
        rules:
        # 在5分钟里,GC花费时间超过10%
        - alert: GcTimeTooMuch
          expr: increase(jvm_gc_collection_seconds_sum[5m]) > 30
          for: 5m
          labels:
            severity: red
          annotations:
            summary: "{{ $labels.app }} GC时间占比超过10%"
            message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} GC时间占比超过10%,当前值({{ $value }}%)"
        # GC次数太多
        - alert: GcCountTooMuch
          expr: increase(jvm_gc_collection_seconds_count[1m]) > 30
          for: 1m
          labels:
            severity: red
          annotations:
            summary: "{{ $labels.app }} 1分钟GC次数>30次"
            message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 1分钟GC次数>30次,当前值({{ $value }})"
        # FGC次数太多
        - alert: FgcCountTooMuch
          expr: increase(jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep"}[1h]) > 3
          for: 1m
          labels:
            severity: red
          annotations:
            summary: "{{ $labels.app }} 1小时的FGC次数>3次"
            message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 1小时的FGC次数>3次,当前值({{ $value }})"
        # 非堆内存使用超过80%
        - alert: NonheapUsageTooMuch
          expr: jvm_memory_bytes_used{job="jmx-metrics", area="nonheap"} / jvm_memory_bytes_max * 100 > 80
          for: 1m
          labels:
            severity: red
          annotations:
            summary: "{{ $labels.app }} 非堆内存使用>80%"
            message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} 非堆内存使用率>80%,当前值({{ $value }}%)"
        # 内存使用预警
        - alert: HeighMemUsage
          expr: process_resident_memory_bytes{job="jmx-metrics"} / os_total_physical_memory_bytes * 100 > 85
          for: 1m
          labels:
            severity: red
          annotations:
            summary: "{{ $labels.app }} rss内存使用率大于85%"
            message: "ns:{{ $labels.namespace }} pod:{{ $labels.pod }} rss内存使用率大于85%,当前值({{ $value }}%)"
    
    

    执行 kubectl apply -f jvm-alert-rules.yaml使规则生效

    1. 添加报警接收人

    编辑接受人配置:

    global:
      resolve_timeout: 5m
    route:
      group_by: ['job', 'alertname', 'pod']
      group_interval: 2m
      receiver: my-alert-receiver
      routes:
      - match: 
          job: jmx-metrics
        receiver: my-alert-receiver
        repeat_interval: 3h
    receivers:
    - name: my-alert-receiver
      webhook_configs:
      - url: http://mywebhook.com/
        max_alerts: 1
        send_resolved: true
    

    使用工具转换为base64编码,填入alert-manager对应的配置Secret中
    kubectl edit -n monitoring Secret alertmanager-main

    apiVersion: v1
    data:
      alertmanager.yaml: KICAgICJyZWNlaXZlciI6ICJudWxsIg== # base64填入这里
    kind: Secret
    metadata:
      name: alertmanager-main
      namespace: monitoring
    type: Opaque
    

    退出编辑后稍等一会儿生效。

    自此,jvm监控系统配置完成。

    附jvm-exporter接口返回参数示例,可以根据需要自取其中的metric

    # HELP jvm_threads_current Current thread count of a JVM
    # TYPE jvm_threads_current gauge
    jvm_threads_current 218.0
    # HELP jvm_threads_daemon Daemon thread count of a JVM
    # TYPE jvm_threads_daemon gauge
    jvm_threads_daemon 40.0
    # HELP jvm_threads_peak Peak thread count of a JVM
    # TYPE jvm_threads_peak gauge
    jvm_threads_peak 219.0
    # HELP jvm_threads_started_total Started thread count of a JVM
    # TYPE jvm_threads_started_total counter
    jvm_threads_started_total 249.0
    # HELP jvm_threads_deadlocked Cycles of JVM-threads that are in deadlock waiting to acquire object monitors or ownable synchronizers
    # TYPE jvm_threads_deadlocked gauge
    jvm_threads_deadlocked 0.0
    # HELP jvm_threads_deadlocked_monitor Cycles of JVM-threads that are in deadlock waiting to acquire object monitors
    # TYPE jvm_threads_deadlocked_monitor gauge
    jvm_threads_deadlocked_monitor 0.0
    # HELP jvm_threads_state Current count of threads by state
    # TYPE jvm_threads_state gauge
    jvm_threads_state{state="NEW",} 0.0
    jvm_threads_state{state="RUNNABLE",} 49.0
    jvm_threads_state{state="TIMED_WAITING",} 141.0
    jvm_threads_state{state="TERMINATED",} 0.0
    jvm_threads_state{state="WAITING",} 28.0
    jvm_threads_state{state="BLOCKED",} 0.0
    # HELP jvm_info JVM version info
    # TYPE jvm_info gauge
    jvm_info{version="1.8.0_261-b12",vendor="Oracle Corporation",runtime="Java(TM) SE Runtime Environment",} 1.0
    # HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
    # TYPE jvm_memory_bytes_used gauge
    jvm_memory_bytes_used{area="heap",} 1.553562144E9
    jvm_memory_bytes_used{area="nonheap",} 6.5181496E7
    # HELP jvm_memory_bytes_committed Committed (bytes) of a given JVM memory area.
    # TYPE jvm_memory_bytes_committed gauge
    jvm_memory_bytes_committed{area="heap",} 4.08027136E9
    jvm_memory_bytes_committed{area="nonheap",} 6.8747264E7
    # HELP jvm_memory_bytes_max Max (bytes) of a given JVM memory area.
    # TYPE jvm_memory_bytes_max gauge
    jvm_memory_bytes_max{area="heap",} 4.08027136E9
    jvm_memory_bytes_max{area="nonheap",} 1.317011456E9
    # HELP jvm_memory_bytes_init Initial bytes of a given JVM memory area.
    # TYPE jvm_memory_bytes_init gauge
    jvm_memory_bytes_init{area="heap",} 4.294967296E9
    jvm_memory_bytes_init{area="nonheap",} 2555904.0
    # HELP jvm_memory_pool_bytes_used Used bytes of a given JVM memory pool.
    # TYPE jvm_memory_pool_bytes_used gauge
    jvm_memory_pool_bytes_used{pool="Code Cache",} 2.096832E7
    jvm_memory_pool_bytes_used{pool="Metaspace",} 3.9320064E7
    jvm_memory_pool_bytes_used{pool="Compressed Class Space",} 4893112.0
    jvm_memory_pool_bytes_used{pool="Par Eden Space",} 1.71496168E8
    jvm_memory_pool_bytes_used{pool="Par Survivor Space",} 7.1602832E7
    jvm_memory_pool_bytes_used{pool="CMS Old Gen",} 1.310463144E9
    # HELP jvm_memory_pool_bytes_committed Committed bytes of a given JVM memory pool.
    # TYPE jvm_memory_pool_bytes_committed gauge
    jvm_memory_pool_bytes_committed{pool="Code Cache",} 2.3396352E7
    jvm_memory_pool_bytes_committed{pool="Metaspace",} 4.0239104E7
    jvm_memory_pool_bytes_committed{pool="Compressed Class Space",} 5111808.0
    jvm_memory_pool_bytes_committed{pool="Par Eden Space",} 1.718091776E9
    jvm_memory_pool_bytes_committed{pool="Par Survivor Space",} 2.14695936E8
    jvm_memory_pool_bytes_committed{pool="CMS Old Gen",} 2.147483648E9
    # HELP jvm_memory_pool_bytes_max Max bytes of a given JVM memory pool.
    # TYPE jvm_memory_pool_bytes_max gauge
    jvm_memory_pool_bytes_max{pool="Code Cache",} 2.5165824E8
    jvm_memory_pool_bytes_max{pool="Metaspace",} 5.36870912E8
    jvm_memory_pool_bytes_max{pool="Compressed Class Space",} 5.28482304E8
    jvm_memory_pool_bytes_max{pool="Par Eden Space",} 1.718091776E9
    jvm_memory_pool_bytes_max{pool="Par Survivor Space",} 2.14695936E8
    jvm_memory_pool_bytes_max{pool="CMS Old Gen",} 2.147483648E9
    # HELP jvm_memory_pool_bytes_init Initial bytes of a given JVM memory pool.
    # TYPE jvm_memory_pool_bytes_init gauge
    jvm_memory_pool_bytes_init{pool="Code Cache",} 2555904.0
    jvm_memory_pool_bytes_init{pool="Metaspace",} 0.0
    jvm_memory_pool_bytes_init{pool="Compressed Class Space",} 0.0
    jvm_memory_pool_bytes_init{pool="Par Eden Space",} 1.718091776E9
    jvm_memory_pool_bytes_init{pool="Par Survivor Space",} 2.14695936E8
    jvm_memory_pool_bytes_init{pool="CMS Old Gen",} 2.147483648E9
    # HELP jmx_config_reload_failure_total Number of times configuration have failed to be reloaded.
    # TYPE jmx_config_reload_failure_total counter
    jmx_config_reload_failure_total 0.0
    # HELP os_free_physical_memory_bytes FreePhysicalMemorySize (java.lang<type=OperatingSystem><>FreePhysicalMemorySize)
    # TYPE os_free_physical_memory_bytes gauge
    os_free_physical_memory_bytes 9.1234304E8
    # HELP os_committed_virtual_memory_bytes CommittedVirtualMemorySize (java.lang<type=OperatingSystem><>CommittedVirtualMemorySize)
    # TYPE os_committed_virtual_memory_bytes gauge
    os_committed_virtual_memory_bytes 2.2226296832E10
    # HELP os_total_swap_space_bytes TotalSwapSpaceSize (java.lang<type=OperatingSystem><>TotalSwapSpaceSize)
    # TYPE os_total_swap_space_bytes gauge
    os_total_swap_space_bytes 0.0
    # HELP os_max_file_descriptor_count MaxFileDescriptorCount (java.lang<type=OperatingSystem><>MaxFileDescriptorCount)
    # TYPE os_max_file_descriptor_count gauge
    os_max_file_descriptor_count 1048576.0
    # HELP os_system_load_average SystemLoadAverage (java.lang<type=OperatingSystem><>SystemLoadAverage)
    # TYPE os_system_load_average gauge
    os_system_load_average 4.97
    # HELP os_total_physical_memory_bytes TotalPhysicalMemorySize (java.lang<type=OperatingSystem><>TotalPhysicalMemorySize)
    # TYPE os_total_physical_memory_bytes gauge
    os_total_physical_memory_bytes 1.073741824E10
    # HELP os_system_cpu_load SystemCpuLoad (java.lang<type=OperatingSystem><>SystemCpuLoad)
    # TYPE os_system_cpu_load gauge
    os_system_cpu_load 1.0
    # HELP os_free_swap_space_bytes FreeSwapSpaceSize (java.lang<type=OperatingSystem><>FreeSwapSpaceSize)
    # TYPE os_free_swap_space_bytes gauge
    os_free_swap_space_bytes 0.0
    # HELP os_available_processors AvailableProcessors (java.lang<type=OperatingSystem><>AvailableProcessors)
    # TYPE os_available_processors gauge
    os_available_processors 6.0
    # HELP os_process_cpu_load ProcessCpuLoad (java.lang<type=OperatingSystem><>ProcessCpuLoad)
    # TYPE os_process_cpu_load gauge
    os_process_cpu_load 0.14194299011052938
    # HELP os_open_file_descriptor_count OpenFileDescriptorCount (java.lang<type=OperatingSystem><>OpenFileDescriptorCount)
    # TYPE os_open_file_descriptor_count gauge
    os_open_file_descriptor_count 717.0
    # HELP jmx_scrape_duration_seconds Time this JMX scrape took, in seconds.
    # TYPE jmx_scrape_duration_seconds gauge
    jmx_scrape_duration_seconds 0.004494197
    # HELP jmx_scrape_error Non-zero if this scrape failed.
    # TYPE jmx_scrape_error gauge
    jmx_scrape_error 0.0
    # HELP jmx_scrape_cached_beans Number of beans with their matching rule cached
    # TYPE jmx_scrape_cached_beans gauge
    jmx_scrape_cached_beans 0.0
    # HELP jvm_buffer_pool_used_bytes Used bytes of a given JVM buffer pool.
    # TYPE jvm_buffer_pool_used_bytes gauge
    jvm_buffer_pool_used_bytes{pool="direct",} 2.3358974E7
    jvm_buffer_pool_used_bytes{pool="mapped",} 0.0
    # HELP jvm_buffer_pool_capacity_bytes Bytes capacity of a given JVM buffer pool.
    # TYPE jvm_buffer_pool_capacity_bytes gauge
    jvm_buffer_pool_capacity_bytes{pool="direct",} 2.3358974E7
    jvm_buffer_pool_capacity_bytes{pool="mapped",} 0.0
    # HELP jvm_buffer_pool_used_buffers Used buffers of a given JVM buffer pool.
    # TYPE jvm_buffer_pool_used_buffers gauge
    jvm_buffer_pool_used_buffers{pool="direct",} 61.0
    jvm_buffer_pool_used_buffers{pool="mapped",} 0.0
    # HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
    # TYPE jvm_gc_collection_seconds summary
    jvm_gc_collection_seconds_count{gc="ParNew",} 77259.0
    jvm_gc_collection_seconds_sum{gc="ParNew",} 2399.831
    jvm_gc_collection_seconds_count{gc="ConcurrentMarkSweep",} 1.0
    jvm_gc_collection_seconds_sum{gc="ConcurrentMarkSweep",} 0.29
    # HELP jmx_config_reload_success_total Number of times configuration have successfully been reloaded.
    # TYPE jmx_config_reload_success_total counter
    jmx_config_reload_success_total 0.0
    # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
    # TYPE process_cpu_seconds_total counter
    process_cpu_seconds_total 1759604.89
    # HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
    # TYPE process_start_time_seconds gauge
    process_start_time_seconds 1.608630226597E9
    # HELP process_open_fds Number of open file descriptors.
    # TYPE process_open_fds gauge
    process_open_fds 717.0
    # HELP process_max_fds Maximum number of open file descriptors.
    # TYPE process_max_fds gauge
    process_max_fds 1048576.0
    # HELP process_virtual_memory_bytes Virtual memory size in bytes.
    # TYPE process_virtual_memory_bytes gauge
    process_virtual_memory_bytes 2.2226292736E10
    # HELP process_resident_memory_bytes Resident memory size in bytes.
    # TYPE process_resident_memory_bytes gauge
    process_resident_memory_bytes 4.644765696E9
    # HELP jmx_exporter_build_info A metric with a constant '1' value labeled with the version of the JMX exporter.
    # TYPE jmx_exporter_build_info gauge
    jmx_exporter_build_info{version="0.14.0",name="jmx_prometheus_javaagent",} 1.0
    # HELP jvm_memory_pool_allocated_bytes_total Total bytes allocated in a given JVM memory pool. Only updated after GC, not continuously.
    # TYPE jvm_memory_pool_allocated_bytes_total counter
    jvm_memory_pool_allocated_bytes_total{pool="Par Survivor Space",} 1.42928399936E11
    jvm_memory_pool_allocated_bytes_total{pool="CMS Old Gen",} 2.862731656E9
    jvm_memory_pool_allocated_bytes_total{pool="Code Cache",} 2.8398656E7
    jvm_memory_pool_allocated_bytes_total{pool="Compressed Class Space",} 4912848.0
    jvm_memory_pool_allocated_bytes_total{pool="Metaspace",} 3.9438872E7
    jvm_memory_pool_allocated_bytes_total{pool="Par Eden Space",} 1.32737951722432E14
    # HELP jvm_classes_loaded The number of classes that are currently loaded in the JVM
    # TYPE jvm_classes_loaded gauge
    jvm_classes_loaded 7282.0
    # HELP jvm_classes_loaded_total The total number of classes that have been loaded since the JVM has started execution
    # TYPE jvm_classes_loaded_total counter
    jvm_classes_loaded_total 7317.0
    # HELP jvm_classes_unloaded_total The total number of classes that have been unloaded since the JVM has started execution
    # TYPE jvm_classes_unloaded_total counter
    jvm_classes_unloaded_total 35.0
    
    

    相关文章

      网友评论

          本文标题:jvm-exporter整合k8s+prometheus监控报警

          本文链接:https://www.haomeiwen.com/subject/bpilbktx.html