Git Product home page Git Product logo

Comments (8)

madolson avatar madolson commented on September 24, 2024 1

@zuiderkwast Correct, we use it as part of the health check as part of data_delay https://github.com/valkey-io/valkey/blob/unstable/src/cluster_legacy.c#L4730. As long as we are updating that field in the new flow we shouldn't be introducing any new failure modes.

from valkey.

madolson avatar madolson commented on September 24, 2024 1

The use of pubsub as proxy for ping/pong is an anti-pattern IMO. I am for removing this coupling.

Just like to clarify we aren't using it for ping/pong messages, we were using it as a proxy for ping/pong message if we had it. If you have a high amount of pubsub traffic, it is what is preventing unexpected failovers.

from valkey.

hpatro avatar hpatro commented on September 24, 2024 1

Hi Harkrishn. For the "databus", you're thinking a separate port where all cluster nodes form another full mesh of connections? This port would also need to be distributed in the cluster bus using a ping extensions then?

Since we're talking about changing the cluster bus to "v2", a Raft cluster of some sort, I think we shouldn't make too large changes to the existing cluster design.

@zuiderkwast Thanks for the feedback.

I was thinking of a separate connection over the same port with a limited buffer size where the connection can be dropped if the threshold limit exceeds. I agree it will be a much larger change. A smaller payload size should go a long way helping with the issue, however we shouldn't treat Cluster V1 as maintenance mode. Cluster V2 will take atleast 2-3 major version to be well adopted, till then we should improve for the best of the users.

To protect the cluster bus from starvation, it's not enough to handle pubsub in a separate connection. Even its own thread will not completely solve that problem. It still shares resources with other traffic, not least network bandwidth. (For this, we could consider https://en.wikipedia.org/wiki/Differentiated_services, or the ToS field in IPv4, to mark the IP packets as high prio.)

This sounds interesting, will learn more about it.

from valkey.

hpatro avatar hpatro commented on September 24, 2024

@valkey-io/core-team Thoughts?

from valkey.

zuiderkwast avatar zuiderkwast commented on September 24, 2024

Hi Harkrishn. For the "databus", you're thinking a separate port where all cluster nodes form another full mesh of connections? This port would also need to be distributed in the cluster bus using a ping extensions then?

Since we're talking about changing the cluster bus to "v2", a Raft cluster of some sort, I think we shouldn't make too large changes to the existing cluster design.

I think a new lightweight cluster bus message is a small change that doesn't need any additional config. Just one bit in the clusterMsg to indicate the support for this.

Replication likewise: I think it's straight-forward. We'll add a version field to the REPLCONF command which can be used to detect if pubsub can be sent this way or not. #414

To protect the cluster bus from starvation, it's not enough to handle pubsub in a separate connection. Even its own thread will not completely solve that problem. It still shares resources with other traffic, not least network bandwidth. (For this, we could consider https://en.wikipedia.org/wiki/Differentiated_services, or the ToS field in IPv4, to mark the IP packets as high prio.)

from valkey.

madolson avatar madolson commented on September 24, 2024

I think a new lightweight cluster bus message is a small change that doesn't need any additional config. Just one bit in the clusterMsg to indicate the support for this.

+1, this seems like the way. It'll be a bit annoying because now we'll have basically two clustebus messages (the compact and full version). We'll also need to update a bunch of logic since we can use pubsub messages as proxies for ping/pong messages, which is no longer true since it's missing most of the payload. It seems like the right optimization to make though.

from valkey.

zuiderkwast avatar zuiderkwast commented on September 24, 2024

We'll also need to update a bunch of logic since we can use pubsub messages as proxies for ping/pong messages, which is no longer true since it's missing most of the payload.

@madolson Pubsub messages don't touch ping_sent or pong_received timestamps. They do touch the data_received timestamp though:

    /* Update the last time we saw any data from this node. We
     * use this in order to avoid detecting a timeout from a node that
     * is just sending a lot of data in the cluster bus, for instance
     * because of Pub/Sub. */
    if (sender) sender->data_received = now;

from valkey.

PingXie avatar PingXie commented on September 24, 2024

+1 on the lighter version of clusterMsg.

+1 on not investing too much into a new "cluster infra", aka "data bus", just yet. We had a few offline chats but we haven't looked deep into moving the existing cluster bus off of the main thread, which I think would bring more bang for the buck.

The use of pubsub as proxy for ping/pong is an anti-pattern IMO. I am for removing this coupling.

from valkey.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.