有两种方式, 首先都需要获取到集群cluster id
# 定位到服务所在节点
ceph orch ps
# 每个节点上的log都通过如下方式查看
# ceph -s
cluster:
id: 8c2b898a-0324-11ed-8b84-089204a58dfa
health: HEALTH_OK
ls -l /var/log/ceph/<cluster-fsid>
# 或者
# journalctl -u ceph-5c5a50ae-272a-455d-99e9-32c6a013e694@mon.foo
journalctl -u ceph-8c2b898a-0324-11ed-8b84-089204a58dfa@mon.ceph-rbd-1
- 排除ceph 3节点集群内部存在网络不稳定的问题
# 1. 目前先排除掉 ceph 3节点 集群内部存在网络问题,如果有网络问题,log中肯定有其他的err
[root@ceph-rbd-1 ~]# grep -i ERROR /var/log/messages | grep -v "RBD image has snapshots" | grep -v "has overlapping roots" | grep -v mgr-ceph | grep -v ceph-mgr | grep -v ".log"
Nov 22 15:55:54 ceph-rbd-1 ceph-8c2b898a-0324-11ed-8b84-089204a58dfa-grafana-ceph-rbd-1[747899]: server.go:3160: http: TLS handshake error from 10.60.21.48:4660: remote error: tls: unknown certificate
Nov 22 15:56:57 ceph-rbd-1 ceph-8c2b898a-0324-11ed-8b84-089204a58dfa-grafana-ceph-rbd-1[747899]: server.go:3160: http: TLS handshake error from 10.60.23.23:50341: remote error: tls: unknown certificate
Nov 22 15:58:20 ceph-rbd-1 ceph-8c2b898a-0324-11ed-8b84-089204a58dfa-grafana-ceph-rbd-1[747899]: server.go:3160: http: TLS handshake error from 10.60.23.23:50473: remote error: tls: unknown certificate
Nov 23 14:53:22 ceph-rbd-1 ceph-8c2b898a-0324-11ed-8b84-089204a58dfa-grafana-ceph-rbd-1[747899]: server.go:3160: http: TLS handshake error from 10.60.23.23:61933: remote error: tls: unknown certificate
Nov 23 14:57:06 ceph-rbd-1 ceph-8c2b898a-0324-11ed-8b84-089204a58dfa-grafana-ceph-rbd-1[747899]: server.go:3160: http: TLS handshake error from 10.60.23.23:62164: remote error: tls: unknown certificate
Nov 23 15:03:03 ceph-rbd-1 ceph-8c2b898a-0324-11ed-8b84-089204a58dfa-grafana-ceph-rbd-1[747899]: server.go:3160: http: TLS handshake error from 10.60.23.23:62535: remote error: tls: unknown certificate
[root@ceph-rbd-1 ~]#
[root@ceph-rbd-1 ~]#
[root@ceph-rbd-1 ~]# ssh ceph-rbd-2
root@ceph-rbd-2's password:
Last login: Fri Nov 25 11:36:30 2022 from 10.122.16.11
[root@ceph-rbd-2 ~]#
[root@ceph-rbd-2 ~]# grep -i ERROR /var/log/messages | grep -v "RBD image has snapshots" | grep -v "has overlapping roots" | grep -v mgr-ceph | grep -v ceph-mgr | grep -v ".log"
[root@ceph-rbd-2 ~]# logout
Connection to ceph-rbd-2 closed.
[root@ceph-rbd-1 ~]# ssh ceph-rbd-3
root@ceph-rbd-3's password:
Last login: Fri Nov 25 11:36:56 2022 from 10.122.16.11
[root@ceph-rbd-3 ~]# grep -i ERROR /var/log/messages | grep -v "RBD image has snapshots" | grep -v "has overlapping roots" | grep -v mgr-ceph | grep -v ceph-mgr | grep -v ".log"
# 2. pg_autoscaler 可能有点问题,感觉一直在扩缩之间抖动
# grep -i ERROR /var/log/messages | grep -v "RBD image has snapshots"
Nov 20 03:51:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 03:51:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 03:52:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 03:52:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 03:53:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 03:53:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 03:58:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 03:58:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 03:59:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 03:59:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 04:00:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 04:00:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 04:01:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 04:01:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 04:02:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 04:02:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 04:03:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 04:03:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
Nov 20 04:04:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 6 has overlapping roots: {-12, -1}
Nov 20 04:04:58 ceph-rbd-1 ceph-mgr[729584]: [pg_autoscaler ERROR root] pool 7 has overlapping roots: {-12, -1, -2}
journalctl -u ceph-8c2b898a-0324-11ed-8b84-089204a58dfa@osd.22.service
网友评论