用来监控的工具。
下载安装
我采用了比较简单的 docker 安装方式。 确保你的机器上安装了docker。
-
将image pull 下来
docker pull prom/prometheus
查看 镜像
docker images
-
启动容器
docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
查看容器信息
docker inspect <container ID or NAME>
现在已经下载安装完成了 -
用的机器是ubuntu。可以在nginx 做反向代理,这样就可以从本地机器看到了。接下来想用 python prometheus 写个脚本生成数据从 网页看到。
-
下载安装 prometheus python 包
pip install prometheus_client
-
写一个demo
test_prom.py
from prometheus_client import CollectorRegistry, Counter, Summary, Gauge, start_http_server
import random
import time
app = 'uploader'
uploader_metrics = {}
downloader_metrics = {}
def init_uploader_metrics(env, category):
registry = CollectorRegistry()
start_http_server(50054, '0.0.0.0', registry)
uploader_metrics['app_start'] = Counter(
'app_start', 'uploader starts',
['app', 'env', 'store'], registry=registry,
).labels('uploader', env, category)
if __name__ == '__main__':
env = 'prod'
category = 'ctf_beijing_cytj'
init_uploader_metrics(env, category)
while True:
uploader_metrics['app_start'].inc()
time.sleep(random.random())
- 进入容器里,修改配置文件。从容器信息可以看到配置文件在哪。
进入容器
docker exec -it <container ID> /bin/sh
将 prometheus.yml 修改成如下配置
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 10s # is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: 'test'
static_configs:
- targets: ['172.17.0.1:50054'] # ip 172.17.0.1 对应docker0 网卡的ip
- 重启docker
docker restart <NAME or ID>
遇到的问题
-
容器启动失败
查看日志 执行docker logs <ID or NAME>
.发现是读取配置文件失败。当时通过docker cp 下来配置文件然后再cp 上去。这样好像导致了配置文件变成了只读模式。并未深究 -
配置了job . 从面板上 未看到指标。
通过tcpdump 发现并未有连接。当时上面的配置ip 不是 172.17.0.1,而是localhost。docker 和宿主机相当于不同的机器,所以localhost 不行。因为50054 端口的服务是在容器外的宿主机启的。那如何访问到 容器外的宿主机的ip 端口呢? 可以执行 ifconfig。会看到docker0 的一个网卡。通过那个网卡的ip 就能访问到宿主机的ip。
将prometheus 集成到 grafana上
自带的界面不太好看。因为我之前已经安装了grafana。集成也很简单。 用 默认的用户名密码admin/admin 登录grafana。 在配置里去添加Data Source ,选择prometheus。
界面如下
grafana-prom
alertmanger
- 下载安装alertmanager
·docker pull prom/alertmanager· - 运行 alertmanager
docker run --name alertmanager -d -p 0.0.0.0:9093:9093 prom/alertmanager
- 修改 prometheus.yml 配置
alerting:
alertmanagers:
- static_configs:
- targets: ['ALERMANAGER IP:9093']
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "alert.rules.yml"
- 在与prometheus 同级目录新建 alert.rules.yml 文件
groups:
- name: example
rules:
- alert: AppHigh
expr: app_start_total >= 100
for: 10m
labels:
severity: page
annotations:
summary: High request latency
description: description info
- alert: Up2
expr: count(up) == 2
for: 2m
labels:
severity: page
annotations:
summary: "up summary"
description: up desc info
Note: alert.rule.yml 文件一定要放在正确的目录下
打开url, 看下,如下图
表示配置OK
- 配置alertmanager.yml 文件
···
global:
resolve_timeout: 1m
smtp_smarthost: 'smtp.gmail.com:587'
smtp_hello: 'smtp.gmail.com'
smtp_from: 'user@gmail.com'
smtp_auth_username: 'user@gmail.com'
smtp_auth_password: 'password'
route:
repeat_interval: 30s
receiver: 'team'
receivers:
- name: 'team'
email_configs:- to: 'receiver1@qq.com'
- to: 'receiver2@qq.com'
···
参考
https://songjiayang.gitbooks.io/prometheus/content/promql/summary.html
官方文档
网友评论