安装Prometheus
环境 Centos7
wget https://github.com/prometheus/prometheus/releases/download/v2.6.0/prometheus-2.6.0.linux-amd64.tar.gz
tar -zxvf prometheus-2.6.0.linux-amd64.tar.gz -C /usr/local/
cd /usr/local/
ln -s prometheus-2.6.0.linux-amd64 prometheus
启动Prometheus
cd prometheus
./prometheus
默认使用9090端口
level=info ts=2018-12-29T08:21:57.607765047Z caller=main.go:243 msg="Starting Prometheus" version="(version=2.6.0, branch=HEAD, revision=dbd1d58c894775c0788470944b818cc724f550fb)"
level=info ts=2018-12-29T08:21:57.607833708Z caller=main.go:244 build_context="(go=go1.11.3, user=root@bf5760470f13, date=20181217-15:14:46)"
level=info ts=2018-12-29T08:21:57.607852464Z caller=main.go:245 host_details="(Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 in017001 (none))"
level=info ts=2018-12-29T08:21:57.60787107Z caller=main.go:246 fd_limits="(soft=800000, hard=800000)"
level=info ts=2018-12-29T08:21:57.607885984Z caller=main.go:247 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2018-12-29T08:21:57.608396975Z caller=main.go:561 msg="Starting TSDB ..."
level=info ts=2018-12-29T08:21:57.608460109Z caller=web.go:429 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-12-29T08:21:57.617174962Z caller=main.go:571 msg="TSDB started"
level=info ts=2018-12-29T08:21:57.617209928Z caller=main.go:631 msg="Loading configuration file" filename=prometheus.yml
level=info ts=2018-12-29T08:21:57.617932637Z caller=main.go:657 msg="Completed loading of configuration file" filename=prometheus.yml
level=info ts=2018-12-29T08:21:57.617945451Z caller=main.go:530 msg="Server is ready to receive web requests."
启动之后通过 http://localhost:9090/
访问web UI
prometheus Expression Browser
目标状态页
普罗米修斯自身的监控信息
普罗米修斯本身就是用普罗米修斯的Metrics来度量的,可以通过http://localhost:9090/metrics 来访问,可以看到这些都是可读的text format
注意,不只是来自Prometheus代码本身的指标,还有Go运行时和过程的指标(普罗米修斯使用go语言编写的)
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 6.3878e-05
go_gc_duration_seconds{quantile="0.25"} 7.5231e-05
go_gc_duration_seconds{quantile="0.5"} 8.3594e-05
go_gc_duration_seconds{quantile="0.75"} 0.000104345
go_gc_duration_seconds{quantile="1"} 0.000146722
go_gc_duration_seconds_sum 0.000832191
go_gc_duration_seconds_count 9
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 35
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.11.3"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 1.0968928e+07
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 4.654416e+07
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.453834e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 152693
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 5.805854134243609e-06
...
Using the Expression Browser
Expression Browser可以用来执行临时查询或者调试PromQL
up Metrics
image.pngprocess_resident_memory_bytes Metrics
Prometheus占用的内存
prometheus_tsdb_head_samples_appended_total
the number of samples Prometheus has ingested
Metrics分类:
gauges
counter
rate function
rate 函数计算计数器每秒增长的速度
##计算出普罗米修斯在一分钟内每秒平均摄取多少个样本
rate(prometheus_tsdb_head_samples_appended_total[1m])
Running the Node Exporter
Node exporter用于获取linux 机器内核和机器层面的Metrics。
它提供所有标准指标,如CPU、内存、磁盘空间、磁盘I/O和网络带宽。此外,它还提供了由内核公开的大量额外指标,从负载平均到主板温度
官网下载
https://prometheus.io/download/
tar -xzf node_exporter-*.linux-amd64.tar.gz
cd node_exporter-*.linux-amd64/
./node_exporter
为了让普罗米修斯监视node_exporter,我们需要更新prometheus.yml。添加额外的scrape配置:
global:
scrape_interval: 10sscrape_configs:
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
- job_name: node
static_configs:
- targets:
- localhost:9100
现在可以在 target 中查看Node exporter
标签匹配
process_resident_memory_bytes{job="node"}
查看node exporter 占用的内存
node_network_receive_bytes_total
node_network_receive_bytes_total是一个计数器,表示网络接口接收了多少字节
rate(node_network_receive_bytes_total[1m])
Alerting
报警有两部分构成:
- alerting rules:定义构成警报的逻辑
- Alertmanager:触发警报转换为通知,如电子邮件、钉钉告警
定义报警规则
编辑rules.yml定义告警规则
groups:
- name: example
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
Alertmanager
https://prometheus.io/download/ 下载Alertmanager
tar -xzf alertmanager-*.linux-amd64.tar.gz
cd alertmanager-*.linux-amd64/
编辑alertmanager.yml配置邮件告警
global:
smtp_smarthost: 'localhost:25'
smtp_from: 'youraddress@example.org'
route:
receiver: example-email
receivers:
- name: example-email
email_configs:
- to: 'youraddress@example.org
网友评论