下载安装

我采用了比较简单的 docker 安装方式。确保你的机器上安装了docker。

将image pull 下来
docker pull prom/prometheus
查看镜像
docker images
启动容器
docker run --name prometheus -d -p 127.0.0.1:9090:9090 prom/prometheus
查看容器信息
docker inspect <container ID or NAME>
现在已经下载安装完成了
用的机器是ubuntu。可以在nginx 做反向代理，这样就可以从本地机器看到了。接下来想用 python prometheus 写个脚本生成数据从网页看到。
下载安装 prometheus python 包
pip install prometheus_client
写一个demo test_prom.py

from prometheus_client import CollectorRegistry, Counter, Summary, Gauge, start_http_server
import random
import time
app = 'uploader'
uploader_metrics = {}
downloader_metrics = {}


def init_uploader_metrics(env, category):
    registry = CollectorRegistry()

    start_http_server(50054, '0.0.0.0', registry)

    uploader_metrics['app_start'] = Counter(
        'app_start', 'uploader starts',
        ['app', 'env', 'store'], registry=registry,
    ).labels('uploader', env, category)

if __name__ == '__main__':
    env = 'prod'
    category = 'ctf_beijing_cytj'
    init_uploader_metrics(env, category)
    while True:
        uploader_metrics['app_start'].inc()
        time.sleep(random.random())

进入容器里，修改配置文件。从容器信息可以看到配置文件在哪。
进入容器
docker exec -it <container ID> /bin/sh
将 prometheus.yml 修改成如下配置

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  scrape_timeout: 10s     # is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'test'
    static_configs:
    - targets: ['172.17.0.1:50054']  # ip 172.17.0.1 对应docker0 网卡的ip

重启docker
docker restart <NAME or ID>

遇到的问题

容器启动失败
查看日志执行 docker logs <ID or NAME>.发现是读取配置文件失败。当时通过docker cp 下来配置文件然后再cp 上去。这样好像导致了配置文件变成了只读模式。并未深究
配置了job . 从面板上未看到指标。
通过tcpdump 发现并未有连接。当时上面的配置ip 不是 172.17.0.1，而是localhost。docker 和宿主机相当于不同的机器，所以localhost 不行。因为50054 端口的服务是在容器外的宿主机启的。那如何访问到容器外的宿主机的ip 端口呢？可以执行 ifconfig。会看到docker0 的一个网卡。通过那个网卡的ip 就能访问到宿主机的ip。

将prometheus 集成到 grafana上

自带的界面不太好看。因为我之前已经安装了grafana。集成也很简单。用默认的用户名密码admin/admin 登录grafana。在配置里去添加Data Source ，选择prometheus。
界面如下

grafana-prom

alertmanger

下载安装alertmanager
·docker pull prom/alertmanager·
运行 alertmanager
docker run --name alertmanager -d -p 0.0.0.0:9093:9093 prom/alertmanager
修改 prometheus.yml 配置

alerting:
  alertmanagers:
  - static_configs:
    - targets: ['ALERMANAGER IP:9093']

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "alert.rules.yml"

在与prometheus 同级目录新建 alert.rules.yml 文件

groups:
- name: example
  rules:
  - alert: AppHigh
    expr: app_start_total >= 100
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency
      description: description info
  - alert: Up2
    expr: count(up) == 2
    for: 2m
    labels:
      severity: page
    annotations:
      summary: "up summary"
      description: up desc  info

Note: alert.rule.yml 文件一定要放在正确的目录下
打开url, 看下，如下图

image.png

表示配置OK

配置alertmanager.yml 文件
···
global:
resolve_timeout: 1m
smtp_smarthost: 'smtp.gmail.com:587'
smtp_hello: 'smtp.gmail.com'
smtp_from: 'user@gmail.com'
smtp_auth_username: 'user@gmail.com'
smtp_auth_password: 'password'

route:
repeat_interval: 30s
receiver: 'team'
receivers: