美文网首页
一次etcd踩坑记

一次etcd踩坑记

作者: 叛逆的机器人 | 来源:发表于2020-04-18 14:50 被阅读0次

最近项目中部分服务迁移到go,为了方便监控服务,采用了etcd。启动go服务的时候,去etcd获取一个租约,然后每隔一段时间,就去续约。go的etcd包是通过返回一个chan,通过chan可以获取到续约的情况。在项目中是通过select监听chan,如果失败则打印一条日志。因为需要监听的不只续约,还有键的变化,所以失败的时候并不会退出。

for {
    select {
    case aliveResp := <-aliveChan:
        if aliveResp == nil {
            log.Println("keep alive end")
        }
    case watchResp := <-watchChan:
        do somthing()
    }

在某个时刻,突然这个方法在逛打日志,由于go的chan关闭了,会一直返回nil,也就是这里陷入了死循环。而chan关闭的原因无非就是etcd服务挂了,或者处于那个时刻刚好处于选举,没有leader,响应不了。

查看了etcd服务,发现服务是正常运行的,那么原因就是第二个了。这个时候需要去查看etcd的日志确认一下。在线上etcd服务总共有三个节点,分别是102,105,104。下面我描述一下当时的场景。在时刻0的时候,102是leader,其他两个节点是follower,处于任期1。在时刻1的时候,105节点突然断开了网络连接,由于收不到leader的心跳,节点开始新的选举。节点总共选举了三次,因为连接断开了,所以收不到其他节点的投票。在时刻4的时候,102节点收到了三个投票请求,由于节点还处于任期内,所以拒绝了三个投票。在时刻5,104节点也拒绝了三个投票,此时,102节点因为收到了更高的任期4,所以变为follower。所以当前时刻,整个etcd服务并没有leader,也就是会拒绝所有请求,等到新的leader选举完成。而这个时刻,刚好go的服务去续约,因为etcd无法服务导致chan关闭,所以出现了开头的问题。通过go服务日志的时间与etcd服务日志的时间,可以确认这个猜想。

知道了问题,解决方案也很简单,在chan关闭后,通过定时去重新租约,获取新的chan。代码如下。

for {
    select {
    case aliveResp := <-aliveChan:
        if aliveResp == nil {
            log.Println("keep alive end")
            var err error
            _playRegister.leaseResp, _playRegister.aliveChan, err = serviceRegister(_playRegister.client, 5, _playRegister.statusKey)
            if err != nil {
                time.Sleep(1 * time.Second)
            } else {
                aliveChan = _playRegister.aliveChan
            }
        }
    case watchResp := <-watchChan:
        do somthing()
    }

任期内无法投票的原因,可以参见下面源码。


case m.Term > r.Term:
    // 消息的Term大于节点当前的Term
    lead := m.From
    if m.Type == pb.MsgVote || m.Type == pb.MsgPreVote {
        // 如果收到的是投票类消息
        // 当context为campaignTransfer时表示强制要求进行竞选
        force := bytes.Equal(m.Context, []byte(campaignTransfer))
        // 是否在租约期以内
        inLease := r.checkQuorum && r.lead != None && r.electionElapsed < r.electionTimeout
        if !force && inLease {
            // 如果非强制,而且又在租约期以内,就不做任何处理
            // 非强制又在租约期内可以忽略选举消息,见论文的4.2.3,这是为了阻止已经离开集群的节点再次发起投请求
            // If a server receives a RequestVote request within the minimum election timeout
            // of hearing from a current leader, it does not update its term or grant its vote
            r.logger.Infof("%x [logterm: %d, index: %d, vote: %x] ignored %s from %x [logterm: %d, index: %d] at term %d: lease is not expired (remaining ticks: %d)",
            r.id, r.raftLog.lastTerm(), r.raftLog.lastIndex(), r.Vote, m.Type, m.From, m.LogTerm, m.Index, r.Term, r.electionTimeout-r.electionElapsed)
            return nil
        }
        // 否则将lead置为空
        lead = None
    }
————————————————
版权声明:本文为CSDN博主「跨链技术践行者」的原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/shangsongwww/java/article/details/90287901

详细的日志在这里

[105]3d7c0fd6445ef729
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 is starting a new election at term 8
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 became candidate at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 9
raft2020/03/31 18:40:16 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 9
raft2020/03/31 18:40:16 INFO: raft.node: 3d7c0fd6445ef729 lost leader 6404974f67850bd9 at term 9
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 is starting a new election at term 9
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 became candidate at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 10
raft2020/03/31 18:40:18 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 10
2020-03-31 18:40:19.499809 W | rafthttp: lost the TCP streaming connection with peer 6404974f67850bd9 (stream MsgApp v2 reader)
2020-03-31 18:40:19.503224 E | rafthttp: failed to read 6404974f67850bd9 on stream MsgApp v2 (read tcp 192.168.10.105:51776->192.168.10.102:2380: i/o timeout)
2020-03-31 18:40:19.503242 I | rafthttp: peer 6404974f67850bd9 became inactive (message send to peer failed)
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 is starting a new election at term 10
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 became candidate at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 received MsgVoteResp from 3d7c0fd6445ef729 at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to 6404974f67850bd9 at term 11
raft2020/03/31 18:40:19 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977] sent MsgVote request to de352282a8b96da5 at term 11
2020-03-31 18:40:19.658495 W | rafthttp: lost the TCP streaming connection with peer de352282a8b96da5 (stream MsgApp v2 reader)
2020-03-31 18:40:19.658557 E | rafthttp: failed to read de352282a8b96da5 on stream MsgApp v2 (read tcp 192.168.10.105:19098->192.168.10.104:2380: i/o timeout)
2020-03-31 18:40:19.658567 I | rafthttp: peer de352282a8b96da5 became inactive (message send to peer failed)
2020-03-31 18:40:19.658656 W | rafthttp: lost the TCP streaming connection with peer de352282a8b96da5 (stream Message reader)
2020-03-31 18:40:19.909333 I | rafthttp: peer 6404974f67850bd9 became active
2020-03-31 18:40:19.909379 I | rafthttp: established a TCP streaming connection with peer 6404974f67850bd9 (stream MsgApp v2 reader)
2020-03-31 18:40:19.969068 I | rafthttp: peer de352282a8b96da5 became active
2020-03-31 18:40:19.969132 I | rafthttp: established a TCP streaming connection with peer de352282a8b96da5 (stream Message reader)
2020-03-31 18:40:19.969233 I | rafthttp: established a TCP streaming connection with peer de352282a8b96da5 (stream MsgApp v2 reader)
2020-03-31 18:40:20.509461 W | rafthttp: lost the TCP streaming connection with peer 6404974f67850bd9 (stream Message reader)
2020-03-31 18:40:20.509505 E | rafthttp: failed to read 6404974f67850bd9 on stream Message (read tcp 192.168.10.105:51774->192.168.10.102:2380: i/o timeout)
2020-03-31 18:40:20.509514 I | rafthttp: peer 6404974f67850bd9 became inactive (message send to peer failed)
2020-03-31 18:40:20.510346 I | rafthttp: peer 6404974f67850bd9 became active
2020-03-31 18:40:20.510361 I | rafthttp: established a TCP streaming connection with peer 6404974f67850bd9 (stream Message reader)
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 [term: 11] received a MsgVote message with higher term from 6404974f67850bd9 [term: 12]
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 became follower at term 12
raft2020/03/31 18:40:21 INFO: 3d7c0fd6445ef729 [logterm: 8, index: 977, vote: 0] cast MsgVote for 6404974f67850bd9 [logterm: 8, index: 978] at term 12
raft2020/03/31 18:40:21 INFO: raft.node: 3d7c0fd6445ef729 elected leader 6404974f67850bd9 at term 12
2020-03-31 18:40:21.583620 W | rafthttp: health check for peer de352282a8b96da5 could not connect: dial tcp 192.168.10.104:2380: connect: network is unreachable

[102]6404974f67850bd9
2020-03-31 18:40:19.910296 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
2020-03-31 18:40:19.910324 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:19 INFO: 6404974f67850bd9 [logterm: 8, index: 977, vote: 6404974f67850bd9] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 4)
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 [term: 8] received a MsgAppResp message with higher term from 3d7c0fd6445ef729 [term: 11]
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 became follower at term 11
raft2020/03/31 18:40:20 INFO: raft.node: 6404974f67850bd9 lost leader 6404974f67850bd9 at term 11
raft2020/03/31 18:40:20 INFO: 6404974f67850bd9 [term: 11] ignored a MsgAppResp message with lower term from de352282a8b96da5 [term: 8]
2020-03-31 18:40:20.511031 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:20.511054 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 is starting a new election at term 11
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 became candidate at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 received MsgVoteResp from 6404974f67850bd9 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 [logterm: 8, index: 978] sent MsgVote request to 3d7c0fd6445ef729 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 [logterm: 8, index: 978] sent MsgVote request to de352282a8b96da5 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 received MsgVoteResp from 3d7c0fd6445ef729 at term 12
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 has received 2 MsgVoteResp votes and 0 vote rejections
raft2020/03/31 18:40:21 INFO: 6404974f67850bd9 became leader at term 12
raft2020/03/31 18:40:21 INFO: raft.node: 6404974f67850bd9 elected leader 6404974f67850bd9 at term 12

[104]de352282a8b96da5
2020-03-31 18:40:19.978326 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:19.983180 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream Message writer)
2020-03-31 18:40:19.983202 W | rafthttp: closed an existing TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
2020-03-31 18:40:19.983213 I | rafthttp: established a TCP streaming connection with peer 3d7c0fd6445ef729 (stream MsgApp v2 writer)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:20 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] ignored MsgVote from 3d7c0fd6445ef729 [logterm: 8, index: 977] at term 8: lease is not expired (remaining ticks: 6)
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 [term: 8] received a MsgVote message with higher term from 6404974f67850bd9 [term: 12]
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 became follower at term 12
raft2020/03/31 18:40:21 INFO: de352282a8b96da5 [logterm: 8, index: 978, vote: 0] cast MsgVote for 6404974f67850bd9 [logterm: 8, index: 978] at term 12
raft2020/03/31 18:40:21 INFO: raft.node: de352282a8b96da5 lost leader 6404974f67850bd9 at term 12
raft2020/03/31 18:40:21 INFO: raft.node: de352282a8b96da5 elected leader 6404974f67850bd9 at term 12

相关文章

  • 一次etcd踩坑记

    最近项目中部分服务迁移到go,为了方便监控服务,采用了etcd。启动go服务的时候,去etcd获取一个租约,然后每...

  • [ANR Warning]onMeasure time too

    ConstraintLayout 踩坑记一次封装组合控件时的坑,我才用了集成 ConstraintLayout 来...

  • springboot集成swagger2深坑

    记录一次swagger2踩坑记,网上资料杂乱而不完整,自己踩的坑还算比较多,记录下自己的解决历程 一、首次来看看遇...

  • 移动端的头尾固定问题

    新起了个移动端项目,在头位固定问题上又踩了一次踩过的坑。爬起来之后弹弹土,乖乖的坐下来码字,把踩坑换来的经验教训记...

  • Android Material Design 踩坑记(2)

    Android Material Design 踩坑记(1) CoordinatorLayout Behav...

  • 记一次踩坑

    下面这张图有什么交互问题吗(右边的“人员分类”部分最多有20项)? 嗯,好像并没有太大问题。然而... 当开发完毕...

  • rancher kubernetes etcd crash

    记一次修复 Kubernetes etcd 挂掉的情况。 �由于etcd 中可能存在一些脏数据,导致 教训: et...

  • Deepin使用踩坑记

    1. 前言 很喜欢Deepin,奈何坑太多,不过不怕,踩过去~ 2. 踩坑记 2.1 Deepin重启后文件管理器...

  • SpringStreaming+Kafka

    摘自 :Spark踩坑记——Spark Streaming+Kafka [TOC] SpringStreaming...

  • 原生App植入React Native 踩坑记

    原生App植入React Native 踩坑记 为什么我踩坑了有一个需求要去可以在当前工程的feature/RN ...

网友评论

      本文标题:一次etcd踩坑记

      本文链接:https://www.haomeiwen.com/subject/pwovvhtx.html