美文网首页
Reverse Priority 的情况

Reverse Priority 的情况

作者: affe | 来源:发表于2020-05-18 12:00 被阅读0次

I was reading the raft implementation recently, and learned a lot from it. I met some problems which are very likely I understood something wrong. Would appreciate it if someone can help with my process.

Supposing a very normal 4-node cluster, with priority node 0 > 1 > 2 > 3

  1. initial state
    | node | currVotedFor | currTerm | role | lastParseResult |
    |-------|--------------|----------|-----------| -- |
    | node0 | node0 | 1 | leader | PASS |
    | node1 | node0 | 1 | follower | WAIT_TO_REVOTE |
    | node2 | node0 | 1 | follower | WAIT_TO_REVOTE |
    | node3 | node0 | 1 | follower | WAIT_TO_REVOTE |
  1. and node 0 goes down
node currVotedFor currTerm role lastParseResult
node1 node0 1 follower WAIT_TO_REVOTE
node2 node0 1 follower WAIT_TO_REVOTE
node3 node0 1 follower WAIT_TO_REVOTE
  1. node 1 timeout first, become a candidate. issues a vote request with term 1 (hasn't increase term yet), but it will be refused because node 2, 3 still believes there is a leader, so they would return REJECT_ALREADY_HAS_LEADER and do nothing. node 1 upon receiving these responses would reset timer, and stay in WAIT_TO_REVOTE state. Same thing happens to node2. after node 2 received REJECT_ALREADY_HAS_LEADER responses, the state would be
node currVotedFor currTerm role lastParseResult
node1 node1 1 candidate WAIT_TO_REVOTE
node2 node2 1 candidate WAIT_TO_REVOTE
node3 node0 1 follower WAIT_TO_REVOTE
  1. This way, only when node 3 timeout lastly it would request votes without getting any REJECT_ALREADY_HAS_LEADER response, and continue to WAIT_TO_VOTE_NEXT state.
    From the time it received 2 REJECT_ALREADY_VOTED( because node 1, 2 all voted for themselves). it would start a timer : lastVotedTime + random value between 300ms and 1000 ms (this value can be changed). Since this is a random value, it might be that node 3 eventually has the smallest timeout interval, and after the smallest timer expires,it increase its term and force node 1 and node 2 to increase their term too (set their needIncreaseTermImmediately). This way node 3 will be the final leader, which reversed the priority order.

This can be mitigated by giving node 3 a much larger timeout interval, but still we might look for something better..

Some thoughts: In raft we would usually first increase term and request vote, here node 1 requests votes without increase its term and I suppose this is implementing the pre-vote algorithm mentioned in the paper.

IMHO, pre-vote is for a potential candidate(like node1) to check if it is possible to pass the election(more up-to-date than a majority of nodes) before increasing term. In our current implementation, node 1 first requests votes with term 1, and other followers with same term would return REJECT_ALREADY_HAS_LEADER. In this case other followers didn't check if the incoming vote request is more up-to-date than followers themselves. If we can somehow subdivide the REJECT_ALREADY_HAS_LEADER response to something like

  • REJECT_ALREADY_HAS_LEADER_PREVOTE_ACCEPT
  • REJECT_ALREADY_HAS_LEADER_PREVOTE_REJECT

then node1 receives enough prevote_accept, it can increase its term and start a revote immediately. Otherwise it just remains in WAIT_TO_REVOTE process.

p.s. Actually I didn't find how the pre-vote was implemented in Dledger, wondering if there are some pre-vote design documentations? Thanks if anyone can give me some hints on pre-vote implementation in Dledger.

相关文章

网友评论

      本文标题:Reverse Priority 的情况

      本文链接:https://www.haomeiwen.com/subject/veryohtx.html