Git Product home page Git Product logo

Comments (8)

hpatro avatar hpatro commented on August 23, 2024 1

@roshkhatri / @zuiderkwast WDYT ?

from valkey.

madolson avatar madolson commented on August 23, 2024 1

CLUSTERMSG_TYPE_UPDATE

I kind of think this one should stay. We're indicating state changes, it seems prudent to send our state of the world as well.

from valkey.

zuiderkwast avatar zuiderkwast commented on August 23, 2024 1

I believe the messages listed here are rare and don't make up a large percentage of the cluster bus traffic, so I don't think we need to prioritize these. The bulk of the cluster bus traffic is PING and PONG. If we can figure out a way to use light ping pong 🏓, then we're saving a lot of cluster bus overhead for large clusters.

A basic idea is that each node remembers what it sent to any other node (for example a hash of the content) and if it hasn't changed since the last ping/pong between the nodes, then the node can send a light PING or PONG, just as a keepalive message and to say that nothing has changed. Do you want to explore this idea?

from valkey.

madolson avatar madolson commented on August 23, 2024 1

The bulk of the cluster bus traffic is PING and PONG.

This biggest gain might be to send a message that says, "I have no new slot information changes from last time", which I think is the biggest part of the message (That's like 2kb right)? One problem will be that it'll still get rounded up to a full TCP packet.

from valkey.

roshkhatri avatar roshkhatri commented on August 23, 2024

At a high level, it does seem like a nice idea!
However for,

  1. CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST we do require currentEpoch, configEpoch and myslots.
  2. 'CLUSTERMSG_TYPE_FAILOVER_AUTH_REQUEST' would require mflags

This means we will be adding new field to the clusterMsgLight which will make it a bit large again or add these in some other way in the data field.

@zuiderkwast's idea is what we had discussed some time ago and I do agree with that too and would like to explore.

This biggest gain might be to send a message that says, "I have no new slot information changes from last time", which I think is the biggest part of the message (That's like 2kb right)? One problem will be that it'll still get rounded up to a full TCP packet.

Yeah this would he the best gain stating that I am alive and no changes since last time.

from valkey.

zuiderkwast avatar zuiderkwast commented on August 23, 2024

The bulk of the cluster bus traffic is PING and PONG.

This biggest gain might be to send a message that says, "I have no new slot information changes from last time", which I think is the biggest part of the message (That's like 2kb right)? One problem will be that it'll still get rounded up to a full TCP packet.

@madolson Yes, the slot bitmap is 2KB (16K bits). A TCP packet has 20 bytes overhead and an IPv4 header is another 20 bytes overhead. Then, Ethernet has somewhere around 128 bytes overhead I believe (source and target MAC addresses and some CRC-sum). We're at somewhere around 200 bytes in total. On top of this, TLS has some minimup packet size and VPNs and other stuff probably add more overhead. Did I miss anything?

Anyway, the light packet doesn't solve our scaling problem. The problem is still the all-to-all communication. In a raft cluster, nodes only communicate with the leader, except when the leader isn't reachable anymore. To focus on that might be a better idea.

from valkey.

roshkhatri avatar roshkhatri commented on August 23, 2024

A TCP packet has 20 bytes overhead and an IPv4 header is another 20 bytes overhead. Then, Ethernet has somewhere around 128 bytes overhead I believe (source and target MAC addresses and some CRC-sum). We're at somewhere around 200 bytes in total. On top of this, TLS has some minimup packet size and VPNs and other stuff probably add more overhead. Did I miss anything?

This will be a constant for every msg, right? The thing we can focus on is how we can reduce our overhead for every msg.

from valkey.

zuiderkwast avatar zuiderkwast commented on August 23, 2024

A TCP packet has 20 bytes overhead and an IPv4 header is another 20 bytes overhead. Then, Ethernet has somewhere around 128 bytes overhead I believe (source and target MAC addresses and some CRC-sum). We're at somewhere around 200 bytes in total. On top of this, TLS has some minimup packet size and VPNs and other stuff probably add more overhead. Did I miss anything?

This will be a constant for every msg, right? The thing we can focus on is how we can reduce our overhead for every msg.

Yes, the overhead per message is constant, but it's important to discuss it. It makes little sense to reduce a small message to even smaller (say from 20 to 10 bytes) because we would only save very little compare to all the overhead. If we can reduce a message from 2K to 400 including all protocol overhead, we save 80% so this is good.

We still have very many messages though and this will still be a bottleneck, because the number of ping-pong messages in a cluster grows exponentially with the size of the cluster (right?). If we save 80% of the size of each message, we can maybe achieve a 600 nodes cluster instead of 500 nodes, but if we can reduce the number of message to just grow linearly or quadratically with the size of the cluster, then we can scale to huge clusters for real. We could introduce features like non-voting nodes, and less active nodes that don't ping anyone but they just answer pings from other nodes. Or... we take the step to use a raft protocol.

from valkey.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.