微服务监控 - 监控自己的服务

作者: CatchZeng | 来源:发表于2021-05-17 13:57


    上一篇 讲解了使用 Exporter 监控 Kubernetes 集群应用。本篇主要向大家介绍如何监控自己的服务。

    要想自己的服务能够被监控,必须要将服务运行中的各项目指标暴露出来,提供给 Prometheus 采集信息。我们可以使用 Prometheus 提供的客户端库暴露自身的运行时信息。


    Prometheus 官方提供了 GoJava or ScalaPythonRuby 的客户端库。其他大部分语言,第三方也提供了相应的支持,详见客户端库文档

    在讲述如何使用客户端在服务中暴露指标前,让我们先来了解一下 Prometheus 库提供的各种指标类型。


    Prometheus 客户端库提供了四种核心指标类型





    gauge 是代表一个数值类型的指标,它的值可以增或减。gauge 通常用于一些度量的值例如温度或是当前内存使用,也可以用于一些可以增减的“计数”,如正在运行的 Goroutine 个数。


    histogram 对观测值(类似请求延迟或回复包大小)进行采样,并用一些可配置的 buckets计数。它也会给出一个所有观测值的总和

    基本指标名称为 <basename> 的 histogram,在指标抓取期间会暴露多个时间序列:

    • 观测 buckets 的累积计数器,暴露为 <basename>_bucket{le="<upper inclusive bound>"}
    • 所有观察值的总和,暴露为 <basename>_sum
    • 已观察到的事件的计数,暴露为 <basename>_count(等同于上文的 <basename>_bucket{le="+Inf"}

    使用 histogram_quantile() 方法可以根据直方图甚至是直方图的聚合来计算分位数。直方图也适用于计算 Apdex 得分。在 buckets 上操作时,请记住直方图是累积的。有关直方图用法的详细信息以及与摘要的差异,请参见直方图和摘要


    跟 histogram 类似,summary 也对观测值(类似请求延迟或回复包大小)进行采样。同时它会给出一个总数以及所有观测值的总和,它在一个滑动的时间窗口上计算可配置的分位数。

    基本度量标准名称为 <basename> 的摘要会在指标抓取期间暴露多个时间序列:

    • streaming φ-位数(0≤φ≤1)观察到的事件,暴露为 <basename>{quantile="<φ>"}
    • 所有观察值的总和,暴露为 <basename>_sum
    • 已经被观察到的事件总数,暴露为 <basename>_count

    有关 φ 分位数的详细说明,摘要用法以及与直方图的差异,请参见直方图和摘要


    下面以 Go 为例,讲解下如何使用 Prometheus 客户端监控自己的服务。

    提供 metrics 接口

    在服务中集成 Prometheus 的第一步就是提供 /metrics 接口。服务应该监听一个只在基础设施内可用的内部端口,通常是在 9xxx 范围内。Prometheus 团队维护一个默认端口分配的列表,选择端口时可以参考。

    以下代码,创建了一个新 HTTP 服务(demo1),通过 http://localhost:9001/metrics 暴露了 Prometheus Golang 应用的默认指标

    // demo1.go
    package main
    import (
    func main() {
        http.Handle("/metrics", promhttp.Handler())
        http.ListenAndServe(":9001", nil)


    go run demo1.go


    ❯ curl http://localhost:9001/metrics
    # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
    # TYPE go_gc_duration_seconds summary
    go_gc_duration_seconds{quantile="0"} 0
    go_gc_duration_seconds{quantile="0.25"} 0
    go_gc_duration_seconds{quantile="0.5"} 0
    go_gc_duration_seconds{quantile="0.75"} 0
    go_gc_duration_seconds{quantile="1"} 0
    go_gc_duration_seconds_sum 0
    go_gc_duration_seconds_count 0
    # HELP go_goroutines Number of goroutines that currently exist.
    # TYPE go_goroutines gauge
    go_goroutines 9
    # HELP go_info Information about the Go environment.
    # TYPE go_info gauge
    go_info{version="go1.13.1"} 1
    # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
    # TYPE go_memstats_alloc_bytes gauge
    go_memstats_alloc_bytes 1.499288e+06
    # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
    # TYPE go_memstats_alloc_bytes_total counter
    go_memstats_alloc_bytes_total 1.499288e+06
    # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
    # TYPE go_memstats_buck_hash_sys_bytes gauge
    go_memstats_buck_hash_sys_bytes 1.443808e+06
    # HELP go_memstats_frees_total Total number of frees.
    # TYPE go_memstats_frees_total counter
    go_memstats_frees_total 151
    # HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
    # TYPE go_memstats_gc_cpu_fraction gauge
    go_memstats_gc_cpu_fraction 0
    # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
    # TYPE go_memstats_gc_sys_bytes gauge
    go_memstats_gc_sys_bytes 2.240512e+06
    # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
    # TYPE go_memstats_heap_alloc_bytes gauge
    go_memstats_heap_alloc_bytes 1.499288e+06
    # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
    # TYPE go_memstats_heap_idle_bytes gauge
    go_memstats_heap_idle_bytes 6.4118784e+07
    # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
    # TYPE go_memstats_heap_inuse_bytes gauge
    go_memstats_heap_inuse_bytes 2.531328e+06
    # HELP go_memstats_heap_objects Number of allocated objects.
    # TYPE go_memstats_heap_objects gauge
    go_memstats_heap_objects 2806
    # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
    # TYPE go_memstats_heap_released_bytes gauge
    go_memstats_heap_released_bytes 6.4118784e+07
    # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
    # TYPE go_memstats_heap_sys_bytes gauge
    go_memstats_heap_sys_bytes 6.6650112e+07
    # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
    # TYPE go_memstats_last_gc_time_seconds gauge
    go_memstats_last_gc_time_seconds 0
    # HELP go_memstats_lookups_total Total number of pointer lookups.
    # TYPE go_memstats_lookups_total counter
    go_memstats_lookups_total 0
    # HELP go_memstats_mallocs_total Total number of mallocs.
    # TYPE go_memstats_mallocs_total counter
    go_memstats_mallocs_total 2957
    # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
    # TYPE go_memstats_mcache_inuse_bytes gauge
    go_memstats_mcache_inuse_bytes 13888
    # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
    # TYPE go_memstats_mcache_sys_bytes gauge
    go_memstats_mcache_sys_bytes 16384
    # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
    # TYPE go_memstats_mspan_inuse_bytes gauge
    go_memstats_mspan_inuse_bytes 23936
    # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
    # TYPE go_memstats_mspan_sys_bytes gauge
    go_memstats_mspan_sys_bytes 32768
    # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
    # TYPE go_memstats_next_gc_bytes gauge
    go_memstats_next_gc_bytes 4.473924e+06
    # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
    # TYPE go_memstats_other_sys_bytes gauge
    go_memstats_other_sys_bytes 1.050904e+06
    # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
    # TYPE go_memstats_stack_inuse_bytes gauge
    go_memstats_stack_inuse_bytes 458752
    # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
    # TYPE go_memstats_stack_sys_bytes gauge
    go_memstats_stack_sys_bytes 458752
    # HELP go_memstats_sys_bytes Number of bytes obtained from system.
    # TYPE go_memstats_sys_bytes gauge
    go_memstats_sys_bytes 7.189324e+07
    # HELP go_threads Number of OS threads created.
    # TYPE go_threads gauge
    go_threads 9
    # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
    # TYPE promhttp_metric_handler_requests_in_flight gauge
    promhttp_metric_handler_requests_in_flight 1
    # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
    # TYPE promhttp_metric_handler_requests_total counter
    promhttp_metric_handler_requests_total{code="200"} 1
    promhttp_metric_handler_requests_total{code="500"} 0
    promhttp_metric_handler_requests_total{code="503"} 0


    demo1 只暴露了默认的指标。下面,我们添加一个名为 myapp_processed_ops_total计数器指标。该计数器对到目前为止已处理的操作数进行计数。每 2 秒,计数器将增加 1。

    // demo2.go
    package main
    import (
    func recordMetrics() {
        go func() {
            for {
                time.Sleep(2 * time.Second)
    var (
        opsProcessed = promauto.NewCounter(prometheus.CounterOpts{
            Name: "myapp_processed_ops_total",
            Help: "The total number of processed events",
    func main() {
        http.Handle("/metrics", promhttp.Handler())
        http.ListenAndServe(":9001", nil)


    go run demo2.go


    ❯ curl http://localhost:9001/metrics
    # HELP myapp_processed_ops_total The total number of processed events
    # TYPE myapp_processed_ops_total counter
    myapp_processed_ops_total 5
    # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
    # TYPE promhttp_metric_handler_requests_in_flight gauge
    promhttp_metric_handler_requests_in_flight 1

    多次查看,可以看到指标 myapp_processed_ops_total 值一直在增加。

    ❯ curl http://localhost:9001/metrics
    # HELP myapp_processed_ops_total The total number of processed events
    # TYPE myapp_processed_ops_total counter
    myapp_processed_ops_total 26
    # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
    # TYPE promhttp_metric_handler_requests_in_flight gauge
    promhttp_metric_handler_requests_in_flight 1


    本篇以计数器为例为大家介绍了如何向自己的服务添加指标。你还可以暴露其他指标类型,详见用法参见 client_golang

    下一篇将为大家带来,Grafana 使用教程

    注:本章内容涉及的 yaml 文件可前往 https://github.com/MakeOptim/service-mesh/prometheus 获取。



