美文网首页
Prometheus 新手排查入门

Prometheus 新手排查入门

作者: 带肥肉的羊肉串 | 来源:发表于2022-07-11 13:36 被阅读0次

各种大神的分享,就不赘述了,刚学习Prometheus 没多久,这里另辟蹊径简单分享下如何排查问题
大佬请忽略

issue 1
Grafana 可以采集到数据,为何监控不显示(spring boot 监控)

问题描述:
我们公司需要监控5个服务
http://ip:9090/targets

image.png
但是监控只显示了2个
image.png

排查过程

  1. 可以确认的是,至少有2个还是可以的,公司服务spring boot 配置的 io.micrometer 都是一样的,不存在依赖包的问题
  2. Grafana 五个服务都可以采集到的,先看看具体的采集数据


    image.png

直接在浏览器访问
(http://ip:port/actuator/prometheus)
找一个成功的,一个不成功的

image.png
使用文本编译器打开
成功的:
# TYPE jvm_gc_max_data_size_bytes gauge
# HELP jvm_gc_max_data_size_bytes Max size of long-lived heap memory pool
jvm_gc_max_data_size_bytes 1.71966464E9
# TYPE jvm_classes_unloaded_classes counter
# HELP jvm_classes_unloaded_classes The total number of classes unloaded since the Java virtual machine has started execution
jvm_classes_unloaded_classes_total 200.0
# TYPE jvm_buffer_count_buffers gauge
# HELP jvm_buffer_count_buffers An estimate of the number of buffers in the pool
jvm_buffer_count_buffers{id="direct"} 12.0
jvm_buffer_count_buffers{id="mapped"} 0.0
# TYPE log4j2_events counter
# HELP log4j2_events Number of fatal level log events
log4j2_events_total{level="warn"} 686.0
log4j2_events_total{level="debug"} 0.0
log4j2_events_total{level="error"} 59010.0
log4j2_events_total{level="trace"} 0.0
log4j2_events_total{level="fatal"} 0.0
log4j2_events_total{level="info"} 3.2804589E7
# TYPE jvm_memory_committed_bytes gauge
# HELP jvm_memory_committed_bytes The amount of memory in bytes that is committed for the Java virtual machine to use
jvm_memory_committed_bytes{area="heap",id="G1 Survivor Space"} 7340032.0
jvm_memory_committed_bytes{area="heap",id="G1 Old Gen"} 8.76609536E8
jvm_memory_committed_bytes{area="nonheap",id="Metaspace"} 1.55648E8
jvm_memory_committed_bytes{area="heap",id="G1 Eden Space"} 8.35715072E8
jvm_memory_committed_bytes{area="nonheap",id="Code Cache"} 1.26222336E8
jvm_memory_committed_bytes{area="nonheap",id="Compressed Class Space"} 1.933312E7
# TYPE system_cpu_count gauge
# HELP system_cpu_count The number of processors available to the Java virtual machine
system_cpu_count 48.0

不成功的:

# TYPE grpc_client_requests_sent_messages counter
# HELP grpc_client_requests_sent_messages The total number of requests sent
grpc_client_requests_sent_messages_total{method="",methodType="",service="com"} 6793741.0
# TYPE zipkin_reporter_spans counter
# HELP zipkin_reporter_spans Spans reported
zipkin_reporter_spans_total 312568.0
# TYPE zipkin_reporter_messages_bytes counter
# HELP zipkin_reporter_messages_bytes Total bytes of messages reported
zipkin_reporter_messages_bytes_total 2.07240049E8
# TYPE grpc_server_requests_received_messages counter
# HELP grpc_server_requests_received_messages The total number of requests received
grpc_server_requests_received_messages_total{method="",methodType="SERVER_STREAMING",service=""} 0.0
grpc_server_requests_received_messages_total{method="add",methodType="UNARY",service=""} 0.0

备注:代码涉及到公司的都删了
不成功的,没有jvm,Grafana 解析采集数据的逻辑我也不懂,学成了再详细分析

  1. 看jvm_memory_used_bytes

    image.png
    地址:
    http://ip:9090/graph
    输入jvm_memory_used_bytes
    可以看到 jvm 相关的,只有一对,两个服务的,验证了,其他三个都没 jvm_memory_used_bytes 相关的数据
  2. 到此可以推断出

  • spring boot 是有数据的
  • 数据缺失jvm_memory_used_bytes
  1. 查看spring boot 代码,去pom里面找找线索
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-actuator</artifactId>
        </dependency>
        <dependency>
            <groupId>io.micrometer</groupId>
            <artifactId>micrometer-registry-prometheus</artifactId>
        </dependency>

这部分没问题,再看配置:

management:
  # https://docs.spring.io/spring-boot/docs/current/reference/html/actuator.html
  endpoints:
    web:
      exposure:
        include: '*'
  endpoint:
    info:
      enabled: true
    metrics:
      enabled: true
    prometheus:
      enabled: true
    health:
      show-details: always

看着也没问题
整个 application.yml 看下来,最近一段时间的改变就是加了 grpc

grpc:
  server:
    port: 
    security:
      enabled: false
  client:

http://ip:port/actuator/metrics 成功的的

{"names":["http.server.requests","jvm.buffer.count","jvm.buffer.memory.used","jvm.buffer.total.capacity","jvm.classes.loaded","jvm.classes.unloaded","jvm.gc.live.data.size","jvm.gc.max.data.size","jvm.gc.memory.allocated","jvm.gc.memory.promoted","jvm.gc.pause","jvm.memory.committed","jvm.memory.max","jvm.memory.used","jvm.threads.daemon","jvm.threads.live","jvm.threads.peak","jvm.threads.states","log4j2.events","process.cpu.usage","process.files.max","process.files.open","process.start.time","process.uptime","spring.data.repository.invocations","system.cpu.count","system.cpu.usage","system.load.average.1m","tomcat.sessions.active.current","tomcat.sessions.active.max","tomcat.sessions.alive.max","tomcat.sessions.created","tomcat.sessions.expired","tomcat.sessions.rejected"]}

http://ip:port/actuator/metrics 不成功的的

{"names":["grpc.client.processing.duration","grpc.client.requests.sent","grpc.client.responses.received","grpc.server.processing.duration","grpc.server.requests.received","grpc.server.responses.sent","http.server.requests","spring.data.repository.invocations","tomcat.sessions.active.current","tomcat.sessions.active.max","tomcat.sessions.alive.max","tomcat.sessions.created","tomcat.sessions.expired","tomcat.sessions.rejected","zipkin.reporter.messages","zipkin.reporter.messages.total","zipkin.reporter.queue.bytes","zipkin.reporter.queue.spans","zipkin.reporter.spans","zipkin.reporter.spans.dropped","zipkin.reporter.spans.total"]}

感觉是spring boot 提供的数据受到了 grpc 的影响,具体原因,研究中

后续研究明白更新

  1. 不成功的,研究下来,有两个配置了 grpc,还有一个没有使用grpc ,采集的数据也没有jvm

application.yml 看下来,prometheus 的配置有问题,具体啥问题就不分享了,如果有同学也遇到配置问题,建议照着其他大佬分享的配置,重新配置下就好

这里分享个厉害的:
Spring Boot (十九):使用 Spring Boot Actuator 监控应用
https://blog.csdn.net/ityouknow/article/details/102693719

image.png
可以通过这些接口排查问题

本次分享仅为新手(包括我)提供一下排查思路,抛石子引砖

相关文章

  • Prometheus 新手排查入门

    各种大神的分享,就不赘述了,刚学习Prometheus 没多久,这里另辟蹊径简单分享下如何排查问题大佬请忽略 i...

  • Prometheus入门系列

    Prometheus入门实践 Prometheus下载地址Prometheus相关文档Prometheus官方文档...

  • SpringBoot2.x整合Prometheus+Grafan

    图文并茂,新手入门教程,建议收藏 SpringBoot2.x整合Prometheus+Grafana【附源码】 附...

  • 怎么炒股新手入门【精华总结】

    怎么炒股新手入门 炒股小白入门 炒股新手入门 新手炒股入门 对于很多想入股市的新手朋友来说,新手怎么炒股,新手股民...

  • Prometheus 学习笔记

    Prometheus官方文档 入门指导 Instructions and example code for a P...

  • Prometheus快速入门(一):安装与部署

    手把手快速入门Prometheus 下载安装 登录官网:https://prometheus.io/ ,找到下载页...

  • Prometheus 入门

    翻译 原文链接 Prometheus是一个监控平台,通过抓取目标上和metric相关的HTTP endpoint,...

  • Prometheus 入门

    简介 Prometheus 是一套开源的系统监控报警框架。它启发于 Google 的 borgmon 监控系统,由...

  • Prometheus 入门

    学习安装 Prometheus 监控和警报系统并编写它的查询。 Prometheus是一个开源的监控和警报系统,它...

  • git/github采用token进行认证访问

    关于蓝桥实验楼的Python 新手入门课,Python 新手入门课的一些更正。Python 新手入门课_Pytho...

网友评论

      本文标题:Prometheus 新手排查入门

      本文链接:https://www.haomeiwen.com/subject/acrnbrtx.html