Ceph 监控中应用 Prometheus relabel 功能

作者: blackpiglet | 来源:发表于2018-10-18 12:44 被阅读44次

relabel

1. 问题描述

工作环境中有三个独立的 Ceph 集群，分别负责对象存储、块存储和文件存储。搭建这几个 Ceph 集群时，我对 Ceph 重命名 Cluster name 的难度没有足够的了解，所以使用的都是默认的 cluster name：ceph，不巧的是 Prometheus 的 ceph_exporter 就是用 cluster name 来区分不同集群，结果是 Grafana 中各个集群的数据无法区分，所有的集群数据都绘制在了一个图标中，非常乱不说，而且部分数据还无法正常显示。

也许大家会说，那就改 Ceph cluster name 不就好了。问题是 Ceph 修改 Cluster name 没那么简单，ceph 文件存储目录都是和 Cluster name 有对应关系的，所以很多配置文件和数据都需要修改目录才能生效，对于已经开始正式使用的 Ceph 集群，这么做风险有点大。当然如果给每个 Ceph 集群单独搭建一个 Prometheus 和 Grafana 环境的话，问题也能解决，但这种方式显得太没技术含量了，不到万不得已，实在不想采用。

我最开始想到的解决方式是修改 ceph_exporter，既然 cluster name 不行，那加上 Ceph 的 fsid 总能区分出来了吧，就像这样：

image.png

不过 fsid 这个变量很难直观看出来代表的是哪个 Ceph 集群，也不是一个好的方案。

最后多亏 neurodrone，才了解到 Prometheus 的 relabel 功能，可以完美的解决这个问题。

2. relabel 配置

Relabel 的本意其实修改导出 metrics 信息的 label 字段，可以对 metrics 做过滤，删除某些不必要的 metrics，label 重命名等，而且也支持对 label 的值作出修改。

举一个例子，三个集群的 ceph_pool_write_total 的 label cluster 取值都为 ceph。但在 Prometheus 的配置中，他们分别是分属于不通 job 的，我们可以通过对 job 进行 relabel 来修改 cluster label 的指，来完成区分。

# cluster1's metric
ceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 4

# cluster2's metric
ceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 10

# cluster3's metric
ceph_pool_write_total{cluster="ceph",pool=".rgw.root"} 7

具体的配置如下，cluster label 的值就改为了 ceph*，并且导出到了新 label clusters 中。

scrape_configs:
  - job_name: 'ceph1'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph1"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph1:9128']
      labels:
        alias: ceph1

  - job_name: 'ceph2'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph2"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph2:9128']
      labels:
        alias: ceph2

  - job_name: 'ceph3'
    relabel_configs:
    - source_labels: ["cluster"]
      replacement: "ceph3"
      action: replace
      target_label: "clusters"
    static_configs:
    - targets: ['ceph3:9128']
      labels:
        alias: ceph3

修改后的 metric 信息变成这个样子，这样我们就可以区分出不同的 Ceph 集群的数据了。

# cluster1's metric
ceph_pool_write_total{clusters="ceph1",pool=".rgw.root"} 4

# cluster2's metric
ceph_pool_write_total{clusters="ceph2",pool=".rgw.root"} 10

# cluster3's metric
ceph_pool_write_total{clusters="ceph3",pool=".rgw.root"} 7

3. Grafana dashboard 调整

光是修改 Prometheus 的配置还不够，毕竟我们还要在界面上能体现出来，Grafana 的 dashboard 也要做对应的修改，本文使用的 dashboard 是 Ceph - Cluster。

首先是要 dashboard 添加 clusters 变量，在界面上操作即可。
先点击 dashboard 的 "settings" 按钮（显示齿轮图标的就是）

image.png

如下图所示添加 clusters variable，最后保存。

image.png

我们已经可以在 dashboard 上看到新加的 variable 了：

image.png

接下来每个图表的查询语句也要做对应的修改：

image.png

最终改好的 dashboard json 文件可从如下链接下载到：
ceph-cluster.json

4. 参考文档

网友评论

本文标题：Ceph 监控中应用 Prometheus relabel 功能

本文链接：https://www.haomeiwen.com/subject/rdijzftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Ceph 监控中应用 Prometheus relabel 功能

1. 问题描述

2. relabel 配置

3. Grafana dashboard 调整

4. 参考文档

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

工具癖

部署运维

Ceph

系统运维专家

程序员