The Problem When running replication in environment

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[Feature] Add option to prevent replica from replicating a empty DB if the primary comes back empty about valkey HOT 11 OPEN

ag-TJNII commented on July 21, 2024 3

[Feature] Add option to prevent replica from replicating a empty DB if the primary comes back empty

from valkey.

Comments (11)

madolson commented on July 21, 2024 2

I like this idea! I've seen issues in the past, especially with cluster mode, where a node ends up being marked as a primary with no data. It seems like it should likely be part of the handshake that the replica has data and expects this primary to still have data unless it sees the same replication ID (indicating the primary did an intentional flushall of its own data).

We will need to document the behavior in such a way that an admin knows that the cluster might not self-recover and may require operator intervention.

from valkey.

madolson commented on July 21, 2024 2

I'd think when replication is reconfigured to change which nodes are primaries and which are replicas that would flush out the old primary state data that could wedge up the cluster, but perhaps there's detail and nuance here you know of that I don't.

It's really just that an operator could have caused the flush. I think we might be able to detect this case as well. In either case, I'm going to through this in the roadmap.

from valkey.

PingXie commented on July 21, 2024 1

This is an unsolvable problem since we prioritize availability over consistency,

If we apply the argument of "prioritizing availability over consistency" to this situation, we shouldn't take this proposal but instead let the data flush happen. On the contrary, what is proposed here is muddying that stance even further. Users have some availability but not at the same level as they would expect (HA is lost). They also have no easy way to recover the data in the replica since it runs on a different replid. This proposal if implemented will always require admin intervention to fix the now stranded replicas.

I understand that the project was built with a bias towards availability but I think it has gone both too far and is also done inconsistently. A bit of generalization (since we are discussing replication here) but this practice in general is the reason why the cluster is really hard to manage and why we are still fixing fundamental bugs in it.

from valkey.

ag-TJNII commented on July 21, 2024

We will need to document the behavior in such a way that an admin knows that the cluster might not self-recover and may require operator intervention.

For the real-world cases I've been testing where the nodes come up empty I'd expect the new node to be promoted to the primary (either via Sentinel or via clustering), and the old node that came up empty should be reconfigured to be a replica and begin restoring the dataset from another node. In my testing this happens as long as I prevent the original replica(s) from clearing their data out via bad replication before the failover happens. I'd think when replication is reconfigured to change which nodes are primaries and which are replicas that would flush out the old primary state data that could wedge up the cluster, but perhaps there's detail and nuance here you know of that I don't.

from valkey.

PingXie commented on July 21, 2024

This option will always break the replication and require admin intervention, and until then the setup will be running in non-HA mode.

The core problem is that the primary didn't stay "down" long enough to trigger the election reliably. I think a correct solution should explore the below instead

A primary node needs to keep track of the number of replicas it has across restarts
A primary node needs to be able to get to business immediately after (re)start if it has no replicas but needs to wait long enough to ensure election happens.

In the cluster case, 1) can be achieved via nodes.conf and cluster_node_timeout can be used in 2). My knowledge about sentinel is limited but I guess sentinel would need to be responsible for both 1) and 2)

from valkey.

madolson commented on July 21, 2024

The core problem is that the primary didn't stay "down" long enough to trigger the election reliably.

This is an unsolvable problem since we prioritize availability over consistency, and your proposal is only a partial solution over a subset of the problems, and introduces new race conditions and more failure modes for operators to worry about. It's always possible the node with data becomes apparently unavailability, so we prioritize electing a node without data. This is somewhat shifting away from availability towards consistency from CAP.

from valkey.

madolson commented on July 21, 2024

[Deleted]

I just realized the case I was mentioning only applies to how AWS does it, so that's just a problem that we have. I think Ping is right that you just need to better protect nodes are coming back empty.

from valkey.

zuiderkwast commented on July 21, 2024

We prioritize availability over consistency to some extent, and I think that's fine as long as the impact is limited. It's fine that we use asynchronous replication and that we can lose a few writes when a primary crashes. WAIT-ing for all replicas can prevent the write from being lost in that case.

It's not OK though to lose the whole data set. I want the cluster to prevent this from happening without manual intervention. Ping's ideas make sense to me.

If I understand this correctly, the primary needs to lose its data while keeping its nodes.conf so it still has its place in the cluster. If so, I can't see why we can store some more info in nodes.conf to prevent this from happening.

Using the existing information in nodes.conf, the primary can see how many replicas it had before restart, so it could wait for a failover (Ping's suggestion). Should it do this only if it starts with no keys?
We could store the replication offset in nodes.conf to let the primary detect that it lost its data and trigger (or let it wait for) a failover.

Another possibility: If the replicas can detect that the primary came back empty-handed, they could escalate/gossip this to the cluster in some way to mark the node as FAIL and trigger a failover.

from valkey.

hwware commented on July 21, 2024

I think in the standlone mode, we already solve this problem through the parameter master-reboot-down-after-period mymaster in sentinel.conf

The steps:

Node 1 is the primary
Node 2 is replicating node 1
Node 1 drops
Node 1 comes back without it's DB, in an empty state
Node 2 reconnects and performs a full resync
Node 2 is now empty, having propagated the data loss

In real production, the gap between step 3 and step 4 can not be too long, otherwise user can not accept,
In fact, the gap between step 3 and 4 is reboot time, so master-reboot-down-after-period timing should solve it because replica could not be failover.

from valkey.

zuiderkwast commented on July 21, 2024

@hwware I don't understand how master-reboot-down-after-period can solve this in a Sentinel setup. If the primary has lost its data, it comes up fast. If it still has its data, it takes more time to loads its data and be available. It's the first case we want to prevent here.

from valkey.

hwware commented on July 21, 2024

@hwware I don't understand how master-reboot-down-after-period can solve this in a Sentinel setup. If the primary has lost its data, it comes up fast. If it still has its data, it takes more time to loads its data and be available. It's the first case we want to prevent here.

in the step 4, when node 1 come back without its db, master node has one signal "reboot". If master node has no "reboot" signal, it means 2 cases:

master node just lost connection or can not response to other nodes due to busy status, but master node should hold data
master node shutdown long enough time, i am not sure if client could tolerate master node long time not working.

Maybe master-reboot-down-after-period parameter can not solve all problems as in top comment, but at least it could solve subset of it.

from valkey.

[Feature] Add option to prevent replica from replicating a empty DB if the primary comes back empty about valkey HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent