Reverse Priority 的情况

作者: affe | 来源:发表于2020-05-18 12:00 被阅读0次

Reverse Priority 的情况
OCLint使用
VUE学习----filters（过滤器）方法
Twig之数组倒序
宏定义中调用block的意义
第四周上：Priority Queue
heap
何凯文第九十五句
C++ STL priority_queue 使用说明
Week1: swirl教程 1: Manipulating D

I was reading the raft implementation recently, and learned a lot from it. I met some problems which are very likely I understood something wrong. Would appreciate it if someone can help with my process.

Supposing a very normal 4-node cluster, with priority node 0 > 1 > 2 > 3

initial state
| node | currVotedFor | currTerm | role | lastParseResult |
|-------|--------------|----------|-----------| -- |
| node0 | node0 | 1 | leader | PASS |
| node1 | node0 | 1 | follower | WAIT_TO_REVOTE |
| node2 | node0 | 1 | follower | WAIT_TO_REVOTE |
| node3 | node0 | 1 | follower | WAIT_TO_REVOTE |

and node 0 goes down

node	currVotedFor	currTerm	role	lastParseResult
node1	node0	1	follower	WAIT_TO_REVOTE
node2	node0	1	follower	WAIT_TO_REVOTE
node3	node0	1	follower	WAIT_TO_REVOTE

node 1 timeout first, become a candidate. issues a vote request with term 1 (hasn't increase term yet), but it will be refused because node 2, 3 still believes there is a leader, so they would return REJECT_ALREADY_HAS_LEADER and do nothing. node 1 upon receiving these responses would reset timer, and stay in WAIT_TO_REVOTE state. Same thing happens to node2. after node 2 received REJECT_ALREADY_HAS_LEADER responses, the state would be

node	currVotedFor	currTerm	role	lastParseResult
node1	node1	1	candidate	WAIT_TO_REVOTE
node2	node2	1	candidate	WAIT_TO_REVOTE
node3	node0	1	follower	WAIT_TO_REVOTE

This way, only when node 3 timeout lastly it would request votes without getting any REJECT_ALREADY_HAS_LEADER response, and continue to WAIT_TO_VOTE_NEXT state.
From the time it received 2 REJECT_ALREADY_VOTED( because node 1, 2 all voted for themselves). it would start a timer : lastVotedTime + random value between 300ms and 1000 ms (this value can be changed). Since this is a random value, it might be that node 3 eventually has the smallest timeout interval, and after the smallest timer expires,it increase its term and force node 1 and node 2 to increase their term too (set their needIncreaseTermImmediately). This way node 3 will be the final leader, which reversed the priority order.

This can be mitigated by giving node 3 a much larger timeout interval, but still we might look for something better..

Some thoughts: In raft we would usually first increase term and request vote, here node 1 requests votes without increase its term and I suppose this is implementing the pre-vote algorithm mentioned in the paper.

IMHO, pre-vote is for a potential candidate(like node1) to check if it is possible to pass the election(more up-to-date than a majority of nodes) before increasing term. In our current implementation, node 1 first requests votes with term 1, and other followers with same term would return REJECT_ALREADY_HAS_LEADER. In this case other followers didn't check if the incoming vote request is more up-to-date than followers themselves. If we can somehow subdivide the REJECT_ALREADY_HAS_LEADER response to something like

REJECT_ALREADY_HAS_LEADER_PREVOTE_ACCEPT
REJECT_ALREADY_HAS_LEADER_PREVOTE_REJECT

then node1 receives enough prevote_accept, it can increase its term and start a revote immediately. Otherwise it just remains in WAIT_TO_REVOTE process.

p.s. Actually I didn't find how the pre-vote was implemented in Dledger, wondering if there are some pre-vote design documentations? Thanks if anyone can give me some hints on pre-vote implementation in Dledger.