prometheus 配置

作者: xyz098 | 来源:发表于2020-01-19 18:09 被阅读0次

七、JMeter压测实战
Prometheus监控系统
Prometheus:扩展
Prometheus架构从入门到实践(10) --告警规则创建
Openshift中的Prometheus Operator如何
2018-05-17 prometheus 多数据中心实战
OpenShift Prometheus添加Alert Rule
Prometheus架构从入门到实践(9) --通过{文件|co
Prometheus+alertmanager实现分组告警
【实践】2.Prometheus命令和配置详解

install
configuration
rules
promeQL
Federation
Pushgateway
Remote write

install

wget https://github.com/prometheus/prometheus/releases/download/v2.3.2/prometheus-2.3.2.linux-amd64.tar.gz -O prometheus-2.3.2.tar.gz
tar xf prometheus-2.3.2.tar.gz -C /data/

ln -s /data/prometheus /data/prometheus-2.3.2.linux-amd64 
nohup /data/prometheus/prometheus --config.file="/data/prometheus/prometheus.yml" --storage.tsdb.retention=1d --web.enable-lifecycle &

# 校验
promtool check rules rules/host.yml  
promtool check config prometheus.yml 

# reload
curl -XPOST  http://localhost:9090/-/reload     

# UI
http://127.0.0.1:9090

configuration

command-line

./prometheus --config.file=prometheus.yml --storage.tsdb.path=/data --web.enable-lifecycle

-h   # 帮助
--web.enable-lifecycle              # 允许发请求重载配置/-/reload
--web.console.templates consoles/   # 允许查看console-templates
--storage.tsdb.retention.time 15d   # 清楚旧数据

configure-file

conf.good.yml

global:
  scrape_interval:     15s # By default, scrape targets every 15 seconds.
  scrape_timeout:      10s
  evaluation_interval: 15s # Evaluate rules every 15 seconds.
  # Attach these extra labels to all timeseries collected by this Prometheus instance.
  external_labels:
    monitor: 'codelab-monitor'

rule_files:
  - "first.rules"
  - "my/*.rules"

scrape_configs:
  - job_name: 'prometheus'
    # Override the global default and scrape targets from this job every 5 seconds.
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
        labels:
          group: 'localhost'

# 远程读
remote_write:
  - url: http://remote1/push
    name: drop_expensive
    write_relabel_configs:
    - source_labels: [__name__]
      regex:         expensive.*
      action:        drop

# 远程写
remote_read:
  - url: http://remote1/read
    read_recent: true
    name: default

alerting:
  alertmanagers:
  - scheme: https
    static_configs:
    - targets:
      - "1.2.3.4:9093"
      - "1.2.3.5:9093"

rules

go get github.com/prometheus/prometheus/cmd/promtool
promtool check rules /path/to/example.rules.yml

recording rules

作用：提前计算保存计算结果为新的时间记录。查询更快，适用于大屏，注意时间间隔设置

groups:
  - name: example-recording-rules
    rules:
    - record: job:http_inprogress_requests:sum       # recording 冒号
      expr: sum(http_inprogress_requests) by (job)

alerting rules

作用：定义告警条件发送给第三方

groups:
- name: example-alerting-rules
  rules:
  - alert: InstanceDown
    expr: up == 0
    for: 5m    # alert的expr触发后，firing前的等待时间
    labels:    # 添加label,存在的key将会被覆盖
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

PromeQL

查询语句数据类型 query-example

Instant vector

一段时间的值
- =: Select labels that are exactly equal to the provided string.
- !=: Select labels that are not equal to the provided string.
- =~: Select labels that regex-match the provided string.
- !~: Select labels that do not regex-match the provided string.
```
http_requests_total{environment=~"staging|testing|development",method!="GET"}
```
Range vector

时间段范围值可以设置偏移量
- s - seconds
- m - minutes
- h - hours
- d - days
- w - weeks
- y - years
```
http_requests_total{job="prometheus"}[5m]
sum(http_requests_total{method="GET"} offset 5m) // GOOD.
```

example

# count指标
increase(node_cpu[2m]) / 120   # 两分钟内cpu的增长率
rate(node_cpu[2m])             # 两分钟内平均增长率，出现"长尾问题"：某个瞬时cpu100%时无法体现
irate(node_cpu[2m])            # 两分钟内瞬时增长率

# aggregation聚合
sum(container_memory_rss{instance="10.51.1.126:10250"}) by (pod_name)  # 不同pod的内存rss
sum(http_requests_total)    # http的请求总量 
topk(5,http_request_total)  # 获取前5的请求量
sum by (handler)(topk(5,http_request_total))  # 获取前5的请求量，只保留handler的label 

# 动态Label替换
# 1.正则产生新label
label_replace(v instant-vector, dst_label string, replacement string, src_label string, regex string)    
label_replace(up, "host", "$1", "instance",  "(.*):.*")
up{host="localhost",instance="localhost:8080",job="cadvisor"}  # 处理后增加host的标签
# 2. 连接产生新label
label_join(v instant-vector, dst_label string, separator string, src_label_1 string, src_label_2 string, ...)
label_join(up,"info","&","instance","job")
up{instance="localhost:8080",job="cadvisor",info="localhost:8080&cadvisor"}

Federation

作用：prometheus server抓取数据从其他prometheus server

场景：

分层联邦：委派合作。类似树，顶层server收集子server聚合的数据
交叉联邦：分工合作。各server收集的数据存储到同一DB中，一台server可以查询所有server采集的数据。

scrape_configs:
  - job_name: 'federate'
    scrape_interval: 15s

    #  not overwrite any labels exposed by the source server
    honor_labels: true
    metrics_path: '/federate'

    params:
      'match[]':
        - '{job="prometheus"}'
        - '{__name__=~"job:.*"}'

    static_configs:
      - targets:
        - 'source-prometheus-1:9090'
        - 'source-prometheus-2:9090'
        - 'source-prometheus-3:9090'

Pushgateway

适用：抓取服务级别的批量任务

# 命令行测试
cat <<EOF | curl --data-binary @- http://127.0.0.1:9091/metrics/job/some_job/instance/some_instance
# TYPE some_metric counter
some_metric{label="val1"} 42
# TYPE another_metric gauge
# HELP another_metric Just an example.
another_metric 2398.283
EOF

# query
some_metric
some_metric{exported_instance="some_instance",exported_job="some_job",instance="localhost:9091",job="pushgateway",label="val1"} 42

Remote write

实现：prometheus server暴露HTTP API，第三方适配器抓取存储

# 适配器配置 写es prometheusbeat.yml 
prometheusbeat:
  listen: ":8080"
  context: "/prometheus"
 ......
output.elasticsearch:
  hosts: ["localhost:9200"]
  
# prometheus配置 
remote_write:
  url: "http://localhost:8080/prometheus"

网友评论

本文标题：prometheus 配置

本文链接：https://www.haomeiwen.com/subject/xvzrzctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

prometheus 配置

install

configuration

command-line

configure-file

rules

recording rules

alerting rules

PromeQL

example

Federation

Pushgateway

Remote write

相关文章

七、JMeter压测实战

Prometheus监控系统

Prometheus:扩展

Prometheus架构从入门到实践(10) --告警规则创建

Openshift中的Prometheus Operator如何

2018-05-17 prometheus 多数据中心实战

OpenShift Prometheus添加Alert Rule

Prometheus架构从入门到实践(9) --通过{文件|co

Prometheus+alertmanager实现分组告警

【实践】2.Prometheus命令和配置详解

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读