Comments (8)
@zuiderkwast Correct, we use it as part of the health check as part of data_delay
https://github.com/valkey-io/valkey/blob/unstable/src/cluster_legacy.c#L4730. As long as we are updating that field in the new flow we shouldn't be introducing any new failure modes.
from valkey.
The use of pubsub as proxy for ping/pong is an anti-pattern IMO. I am for removing this coupling.
Just like to clarify we aren't using it for ping/pong messages, we were using it as a proxy for ping/pong message if we had it. If you have a high amount of pubsub traffic, it is what is preventing unexpected failovers.
from valkey.
Hi Harkrishn. For the "databus", you're thinking a separate port where all cluster nodes form another full mesh of connections? This port would also need to be distributed in the cluster bus using a ping extensions then?
Since we're talking about changing the cluster bus to "v2", a Raft cluster of some sort, I think we shouldn't make too large changes to the existing cluster design.
@zuiderkwast Thanks for the feedback.
I was thinking of a separate connection over the same port with a limited buffer size where the connection can be dropped if the threshold limit exceeds. I agree it will be a much larger change. A smaller payload size should go a long way helping with the issue, however we shouldn't treat Cluster V1 as maintenance mode. Cluster V2 will take atleast 2-3 major version to be well adopted, till then we should improve for the best of the users.
To protect the cluster bus from starvation, it's not enough to handle pubsub in a separate connection. Even its own thread will not completely solve that problem. It still shares resources with other traffic, not least network bandwidth. (For this, we could consider https://en.wikipedia.org/wiki/Differentiated_services, or the ToS field in IPv4, to mark the IP packets as high prio.)
This sounds interesting, will learn more about it.
from valkey.
@valkey-io/core-team Thoughts?
from valkey.
Hi Harkrishn. For the "databus", you're thinking a separate port where all cluster nodes form another full mesh of connections? This port would also need to be distributed in the cluster bus using a ping extensions then?
Since we're talking about changing the cluster bus to "v2", a Raft cluster of some sort, I think we shouldn't make too large changes to the existing cluster design.
I think a new lightweight cluster bus message is a small change that doesn't need any additional config. Just one bit in the clusterMsg to indicate the support for this.
Replication likewise: I think it's straight-forward. We'll add a version field to the REPLCONF command which can be used to detect if pubsub can be sent this way or not. #414
To protect the cluster bus from starvation, it's not enough to handle pubsub in a separate connection. Even its own thread will not completely solve that problem. It still shares resources with other traffic, not least network bandwidth. (For this, we could consider https://en.wikipedia.org/wiki/Differentiated_services, or the ToS field in IPv4, to mark the IP packets as high prio.)
from valkey.
I think a new lightweight cluster bus message is a small change that doesn't need any additional config. Just one bit in the clusterMsg to indicate the support for this.
+1, this seems like the way. It'll be a bit annoying because now we'll have basically two clustebus messages (the compact and full version). We'll also need to update a bunch of logic since we can use pubsub messages as proxies for ping/pong messages, which is no longer true since it's missing most of the payload. It seems like the right optimization to make though.
from valkey.
We'll also need to update a bunch of logic since we can use pubsub messages as proxies for ping/pong messages, which is no longer true since it's missing most of the payload.
@madolson Pubsub messages don't touch ping_sent
or pong_received
timestamps. They do touch the data_received
timestamp though:
/* Update the last time we saw any data from this node. We
* use this in order to avoid detecting a timeout from a node that
* is just sending a lot of data in the cluster bus, for instance
* because of Pub/Sub. */
if (sender) sender->data_received = now;
from valkey.
+1 on the lighter version of clusterMsg.
+1 on not investing too much into a new "cluster infra", aka "data bus", just yet. We had a few offline chats but we haven't looked deep into moving the existing cluster bus off of the main thread, which I think would bring more bang for the buck.
The use of pubsub as proxy for ping/pong is an anti-pattern IMO. I am for removing this coupling.
from valkey.
Related Issues (20)
- Implement new hash table
- Embed key and TTL in robj HOT 5
- [CRASH] Assertion failed: pthread_mutex_unlock HOT 9
- [BUG] slot stuck in importing state on replica after scale-up and rebalance HOT 14
- [NEW] Module support for Sentinels HOT 4
- [NEW][Feature Request] TTL for keys within the HASH data type HOT 2
- [BUG] Lua libs and functions aren't replicated on a cluster HOT 2
- [NEW] Output logs as JSON HOT 4
- Follow up on the capability flag during cluster meet
- [NEW] sentinel shouldn't pick slaves that are not synced with master #13533
- Stale `PONG` message causes incorrect `replicaof` updates leading to `replicaof` loops
- [Test issue] The tests don't correctly reflect the number of tests left HOT 4
- [BUG] Using more sentinels than io-threads causes high idle CPU usage on leader HOT 2
- Benchmark results for 8.0.0-rc2 vs 7.2.6 using single thread HOT 6
- [BUG] Please stop including tcmalloc.h via google/tcmalloc.h HOT 1
- [Release] 8.0 Release Status HOT 3
- [NEW] Add support for expiry of fields in hash key HOT 2
- [NEW] Offload corruption check to a separate thread during external data load
- [BUG] 8.0.0 broken by including a non-existing header: `zmalloc.c: fatal error: threads.h: No such file or directory` HOT 6
- [CRASH] When loading module-oss (RediSearch with Coordinator), Valkey Node crashes HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from valkey.