Error when restarting a bootstrap node that was in a cluster about go-dqlite HOT 6 CLOSED

canonical commented on July 18, 2024

Error when restarting a bootstrap node that was in a cluster

from go-dqlite.

Comments (6)

freeekanayaka commented on July 18, 2024

Restarting a bootstrapping dqlite node fails if it was previously part of a cluster.
Tested on 684b536 with https://pastebin.ubuntu.com/p/N5trdB2jNQ/ applied, on both 18.04 and 20.04 with go snap 1.13.11 and 1.14.3

Steps to reproduce
1/ start two nodes in a cluster
./dqlite-demo --api 127.0.0.1:5001 --db 127.0.0.1:4001
./dqlite-demo --api 127.0.0.1:5002 --db 127.0.0.1:4002 --join 127.0.0.1:4001

2/ stop both nodes
3/ start the bootstrap node again, it will fail to connect and spew out the following error message
Dqlite: WARN: no known leader address=127.0.0.1:4001 attempt=1

In general, this is expected. Since you have only 1 node up out of 2, you don't have a majority, so you need to start the other node too. But please read below as well.

Side note: after 30 seconds, two of the same logs appear each second, with different attempt numbers, e.g.:

2020/05/25 22:14:16 Dqlite: WARN: no known leader address=127.0.0.1:4001 attempt=98
2020/05/25 22:14:16 Dqlite: WARN: no known leader address=127.0.0.1:4001 attempt=128
2020/05/25 22:14:17 Dqlite: WARN: no known leader address=127.0.0.1:4001 attempt=99
2020/05/25 22:14:17 Dqlite: WARN: no known leader address=127.0.0.1:4001 attempt=129

This is because every 30 seconds there is an attempt to refresh the local cluster.yaml cache of cluster members. In order to do that a node first tries to find the leader (which causes the logs you see) then it asks the leader about current cluster members and updates its cluster.yaml (this doesn't happen since no leader is found).

Starting the slave node will provide the same log as well as indicating failed to establish network connection: dial tcp 127.0.0.1:4001 (even if we remove --join 127.0.0.1:4001 when starting it)

First note that "slave node" is not intrinsically a slave, it can become leader too after a restart.

Secondly, do you mean that you are seeing that when starting the second node alone or starting it together with the first node?

If you are starting the second node alone, the same as above applies: you don't have a quorum.

If you are starting the second node together with the first node, eventually an election should be run, both nodes should notice that there's a leader, possibly refresh their caches and finally get a working db connection.

The only little annoyance that I see here is that if you stop the first node right after the initial joining of the second node, before 30 seconds have elapsed, than when the first node is restarted it will not know yet about the second node and it can take about 30 seconds to get a working db connection (assuming that the second node is up). That's only for the very first restart, after that it will have learned about the new node, and subsequent restarts should be quick.

from go-dqlite.

aguadoenzo commented on July 18, 2024

This is because every 30 seconds there is an attempt to refresh the local cluster.yaml cache of cluster members. In order to do that a node first tries to find the leader (which causes the logs you see) then it asks the leader about current cluster members and updates its cluster.yaml (this doesn't happen since no leader is found).

Shouldn't the node assume the role of leader after 30 seconds then ? If it keeps expecting information about the leader from other nodes without taking leadership at some point, does that mean it's not possible to downscale a cluster from N nodes to 1 ?

Secondly, do you mean that you are seeing that when starting the second node alone or starting it together with the first node?

Both scenario yield the results you describe

from go-dqlite.

freeekanayaka commented on July 18, 2024

This is because every 30 seconds there is an attempt to refresh the local cluster.yaml cache of cluster members. In order to do that a node first tries to find the leader (which causes the logs you see) then it asks the leader about current cluster members and updates its cluster.yaml (this doesn't happen since no leader is found).

Shouldn't the node assume the role of leader after 30 seconds then ?

Not really, because the fact that the other node is not responding doesn't mean it's down. There could be a temporary network partition. If you have 2 nodes, both alive but that can't reach each other, and each node independently decides to become leader, you'll have split brain and your database will fork.

In order to have a leader, you need a majority of nodes to acknowledge the leader, see the Raft paper for details.

If it keeps expecting information about the leader from other nodes without taking leadership at some point, does that mean it's not possible to downscale a cluster from N nodes to 1 ?

If you want to downscale a cluster from N nodes to 1, you need to modify the cluster membership and remove nodes one at a time using the client.Remove() API. Each time you modify the membership you'll commit an entry in the Raft log, in other words you'll have consensus from a quorum that a node was removed and is no longer part of the cluster.

Simply shutting down N-1 nodes and leaving only 1 node running just means that a majority of your nodes is down and the cluster can't elect a leader or make progress.

So shutting down is not the same as removing.

Secondly, do you mean that you are seeing that when starting the second node alone or starting it together with the first node?

Both scenario yield the results you describe

I'm not sure to understand. So do you think there's a bug or not? If so, which one?

from go-dqlite.

freeekanayaka commented on July 18, 2024

Note that automatic membership management (such as transferring voting rights to other nodes when shutting down) is currently not implemented in the app package, but I plan to add to it.

At that point you'll have options to have behavior similar to LXD (https://lxd.readthedocs.io/en/latest/clustering/#voting-and-stand-by-members).

from go-dqlite.

aguadoenzo commented on July 18, 2024

If you want to downscale a cluster from N nodes to 1, you need to modify the cluster membership and remove nodes one at a time using the client.Remove() API. Each time you modify the membership you'll commit an entry in the Raft log, in other words you'll have consensus from a quorum that a node was removed and is no longer part of the cluster.

Thanks, that's the bit I missed

I'm not sure to understand. So do you think there's a bug or not? If so, which one?

Considering all of the above, it's working as intended. What's misleading it the lack of a Remove() method in the app package, but automatic membership management would remove the need for it if I understand it correctly. Closing the issue, and thanks for the clarification

from go-dqlite.

freeekanayaka commented on July 18, 2024

If you want to downscale a cluster from N nodes to 1, you need to modify the cluster membership and remove nodes one at a time using the client.Remove() API. Each time you modify the membership you'll commit an entry in the Raft log, in other words you'll have consensus from a quorum that a node was removed and is no longer part of the cluster.

Thanks, that's the bit I missed

I'm not sure to understand. So do you think there's a bug or not? If so, which one?

Considering all of the above, it's working as intended. What's misleading it the lack of a Remove() method in the app package, but automatic membership management would remove the need for it if I understand it correctly. Closing the issue, and thanks for the clarification

Yeah, the app package is pretty much work in progress and documentation is lacking.

You'll probably still need to use Remove() if you want to permanently remove a node. What it's going to be provided with automatic membership management is something slightly different: at the moment if you add N nodes, all of them will be voters, however with the new features you'll be able to say "I just want 3 voters and 2 stand-bys, and the rest of nodes are spare ones". What that means is that you'll replicate data only to 5 nodes, 3 of which will be voters 2 just hot stand-bys. If a voter is shutdown gracefully its role will be transferred to a stand-by. If a stand-by is shutdown gracefully its role will be transferred to a spare. This way you'll have less network traffic and disk writes while maintaining high availability.

from go-dqlite.

Error when restarting a bootstrap node that was in a cluster about go-dqlite HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent