Git Product home page Git Product logo

Comments (24)

hpatro avatar hpatro commented on May 25, 2024 3

1.Should functions be migrated together with slots?

I don't think slots should be associated with functions. functions should be treated as a resource at node level. On each node addition, all functions existing in the cluster/standalone primary should be applied.

Overall, I think we should treat functions, ACLs and config(s) as resource(s) which should be applied to all the nodes and build it in the cluster revamp which we have been talking for a while.

from valkey.

madolson avatar madolson commented on May 25, 2024 2

IIRC it was decided that functions are similar to configuration and that it's the admin's responsibility to define functions on all nodes. I don't know if it's a good decision.

Zhao can keep me honest, but this is the only time there was a real 3-2 decision amongst the old maintainers. (It was a more broad discussion on functions, but basically they said this is what Redis wanted).

I generally want to implement a new system that is distributed configuration, that supports functions, ACLS, configs and allows modules to also store distributed configuration. This can be built ontop of the existing clusterbus extensions, but will be in its best form when built with the cluster v2. (I see Viktor is dubious, but Ping and I are highly motivated to make this happen now)

from valkey.

madolson avatar madolson commented on May 25, 2024 2

I'm motivated, but I also realize we have a lot of things going on, and I want us to make releases more often. Can we make it a focus area for 9.0? (That means start prototyping now.)

My vision, is that we finish the needed refactoring work to make cluster a module for valkey 8. Then it becomes easy to iterate and prototype in a separate repo the new clustering work. Then someone can "opt-in" to the beta of it while Valkey 8 is out and about, and we can finalize it for 9.

from valkey.

PingXie avatar PingXie commented on May 25, 2024 2

We haven't finalized almost anything about v2. :)

True but my statement is correct too :-)

I think we need many different threads to discuss all these aspects. Shall we use a wiki for that? Or just many issues and sub-issues to discuss?

Yeah, I will create a dedicated issue for cluster v2. Let's start from the pain points in today's cluster design/implementation. It is important to have a consensus on the problem definition first.

from valkey.

daniel-house avatar daniel-house commented on May 25, 2024 1

Can anyone suggest a use case where we would actually prefer to have different implementations of the same function on different nodes? If not, then we have the answer to the question raised by this issue.

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

IIRC it was decided that functions are similar to configuration and that it's the admin's responsibility to define functions on all nodes. I don't know if it's a good decision.

The official admin tool is valkey-cli, so when managing a cluster (creating a cluster, adding nodes, etc.) perhaps valkey-cli should try to synchronous functions across nodes? That would at least match the current design decisions.

It's the same problem with ACL and other config in a cluster: They are expected to be the same throughout the cluster but the cluster doesn't guarantee this.

I'm not sure I believe in a complete rewrite of the cluster bus. Usually such projects are never finished. Can we try gossipping this information instead?

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

I'm motivated, but I also realize we have a lot of things going on, and I want us to make releases more often. Can we make it a focus area for 9.0? (That means start prototyping now.)

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

I think we shall start using some of the reserved bits in the clusterMsg to indicate support for new cluster bus messages. From there, we can add new message types and turn the cluster bus more into a proper raft cluster, step by step.

Easier to test, easier to get things right, if we do things in smaller steps.

from valkey.

PingXie avatar PingXie commented on May 25, 2024

Timeline aside, this is actually a problem that can ONLY be properly solved by cluster V2 because it gives you a strongly consistent (meta)data service that manages cluster-wide metadata like cluster topology, node configs, ACLs, and Functions, etc

Cluster V1 being eventually consistent (which is not 100% true by the way) and the coupling of data serving logic and topology management logic on the same node are the source of these oddities, IMO. I also think implementing a true raft cluster inside of the Valkey server is no less work than cluster V2 itself. The coupling shouldn't be there in the first place.

I am in favor of moving the entire cluster out of the engine and into its own module and this needs to be the precursor to the real cluster V2 work. That said, I don't have a clear idea on the complexity involved so I don't know if it can happen in Valkey 8, yet. Also agreed that building cluster V2 into the engine is a non-starter.

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

It's very good that we're having this discussion. We need to have all the information and aspects on the table to be able to make informed decisions.

Timeline aside

Beautiful, but I don't think we shall ignore time.

Just adding module APIs normally involves a lot of work just in terms of API design, testing and documentation. This makes adding APIs much more work than adding internal functions.

Which module API functions do we need to add and what will these APIs look like? Will the cluster module keep all the cluster state in its own memory, intercept commands and sometimes return redirects for them?

cluster V2 because it gives you a strongly consistent (meta)data service that manages cluster-wide metadata like cluster topology, node configs, ACLs, and Functions, etc

So we need to have module APIs for all of those, including some event mechanism so if a user or any other module changes any of these, so the cluster module can get these changes. (Not a very small and easy job.)

The coupling shouldn't be there in the first place.
Also agreed that building cluster V2 into the engine is a non-starter.

Who agreed on this? If we're using a raft library, we avoid that coupling. A node can still be a cluster node without serving any data. Agreed that moving things to a module is a guarantee when it comes to separation of concerns, but it is also much more work.

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

@daniel-house You're right, it's better that they are synchronized. The open question is how.

Can you suggest how to implement it within the current cluster solution that makes sure they are synchronized on all nodes?

from valkey.

daniel-house avatar daniel-house commented on May 25, 2024

OK, we agree that they should be synchronized.

I can't say I understand the current cluster solution, but I can see that we are having problems synchronizing hostnames (#304), so I have serious doubts that we can synchronize functions at this time. It seems to me that a good solution would also solve the hostnames issue and the problems raised in #114 . If anyone can point me to useful reading about the current cluster solution, please do.

I would appreciate any pointers to more information about cluster V2. I get the general impression that it will use Raft leader election, and would be perfect for this, hostnames, and the ultimate goals of #114 . If so, I would suggest punting all of these issues until after cluster V2 is implemented and assigning a very high priority to cluster V2.

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

If anyone can point me to useful reading about the current cluster solution, please do.

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

If we want to do something like the Redis cluster v2, then I believe we can have cluster V2 in a few years, say Valkey 10 or 11, maybe not even that. Rewriting something from scratch that can't be used until everything is in place is very likely to never be launched IMO. That's why I think Redis will fail. I hope we will choose a different, incremental approach. It's not that easy but I believe it's necessary. (We haven't really discussed it enough in the Valkey core team.)

from valkey.

daniel-house avatar daniel-house commented on May 25, 2024

If anyone can point me to useful reading about the current cluster solution, please do.

Thanks. I've looked at those in the past, and again today. They are indeed useful. (1) Are the docs up to date? (2) The docs seem too superficial for this purpose (proposing an extension to synchronize functions). Searching those docs for host doesn't lead to information about a hostname extension as described in #304 . I was hoping for the design or at least the discussion/issue and PR that lead to #304 .

The information about a gossip protocol is helpful, but which one?

  • The source code

Ha, ha. Very funny. And, sadly, all I really expected to find.

According to the cluster-spec:

For every node added in the gossip section the following fields are reported:

Node ID.
IP and port of the node.
Node flags.

Why didn't we add the hostname here instead of ping/pong?

I'll read some more from those docs and the code.

from valkey.

daniel-house avatar daniel-house commented on May 25, 2024

If we want to do something like the Redis cluster v2, then I believe we can have cluster V2 in a few years, say Valkey 10 or 11, maybe not even that. Rewriting something from scratch that can't be used until everything is in place is very likely to never be launched IMO. That's why I think Redis will fail. I hope we will choose a different, incremental approach. It's not that easy but I believe it's necessary. (We haven't really discussed it enough in the Valkey core team.)

This link is something I really needed. Thankyou.

So, you think cluster V2 will never happen. Is that the consensus? I will read it in detail. Even if it is far in the future it will help me understand today's pain points. Are there any places I should subscribe to so that I can follow any discussions of cluster V2?

How about this incremental approach? Link in a high quality implementation of the Raft protocol. Define a state-machine to manage with the Raft protocol. Only put one thing in it, a list of node-IDs with hostnames. After that works, add function definitions to the state-machine. We should then be able to incrementally extend the state-machine to cover everything needed for #114

Alternatively, use CRDT registers (simple strings) to follow the same incremental path.

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

So, you think cluster V2 will never happen.

That's not what I said. (I said that if we want to do it in a certain non-incremental way, then it will be difficult.)

Is that the consensus?

No.

from valkey.

PingXie avatar PingXie commented on May 25, 2024

I don't think we should reference back to the old cluster V2 design here in the Valkey forum. That work was started by Redis. We should take a fresh look at the problems again and start from scratch. The only connection is just the name "cluster V2". Please disregard the "Redis Cluster V2" design completely.

from valkey.

daniel-house avatar daniel-house commented on May 25, 2024

@PingXie Works for me. Is there a list of problems/requirements?

from valkey.

hpatro avatar hpatro commented on May 25, 2024

@PingXie Works for me. Is there a list of problems/requirements?

Few of the problems/requirements:

  • Support higher Scaling (Larger cluster size) - Currently Valkey cluster can't scale beyond 500 nodes beyond which the gossip protocol doesn't scale well. The new architecture should strive to handle much larger cluster size.
  • Support smaller cluster size - With the current design, only primary nodes can vote hence the ideal cluster would require atleast 3 primaries to form a quorum. The new design should take into consideration where each node (primary/replica) should be part of the quorum.
  • Centralized metadata store - Currently config, functions, acl needs to be applied on each node. This is painful and cumbersome during cluster setup as well as during scale out. Building a mechanism to be able to send those information once and being applied to all of the nodes would be ideal.
  • Decouple control and data transfer via cluster bus - Currently, the clusterbus carries both control data (health info, gossip info, etc) as well as pubsub data. With high pubsub data, the clusterbus can get overwhelmed and cause issue with the cluster health status and can delay cluster topology information consistency in the system.
  • Lower failover time - Due to the primary node only consensus mechanism in the current approach, failover in a shard can be delayed if there is a network partition and other primaries aren't reachable.

from valkey.

PingXie avatar PingXie commented on May 25, 2024

Thanks @hpatro.

Here are the problems that I think we need to solve in the current cluster:

  1. strong consistency (for cluster topology)

cluster topology is concerned with which nodes own which slots and primaryship. The current cluster implementation is not even eventually consistent by design because there are places where node epochs are bumped without consensus (trade-offs). This leads to increased complexity on the client side.

  1. better manageability (of global config/data)

This particular issue provides the exact context on this pain point

  1. more resilience (to stressful client workload)

Today, both the cluster bus and the client workload run on the same main thread. So a demanding client workload has the potential to starve the cluster bus and leads to unnecessary failover.

  1. higher scale

The V1 cluster is a mesh so the cluster gossip traffic is proportional to N^2, where N is the (data) nodes in the cluster. The practical limit of a V1 cluster is ~500 nodes.

from valkey.

daniel-house avatar daniel-house commented on May 25, 2024

I don't see how you would perform a rolling upgrade to a cluster that meets these requirements. Is that acceptable?

from valkey.

PingXie avatar PingXie commented on May 25, 2024

I don't see how you would perform a rolling upgrade to a cluster that meets these requirements. Is that acceptable?

We haven't finalized the v1 to v2 upgrade migration path, including whether such a need exists or not. A seamless migration path is always great but I can also see the two cluster implementations continue to evolve on their own for a long time. So in this sense, v2 is not an accurate moniker, strictly speaking.

from valkey.

zuiderkwast avatar zuiderkwast commented on May 25, 2024

We haven't finalized almost anything about v2. :)

I think we need many different threads to discuss all these aspects. Shall we use a wiki for that? Or just many issues and sub-issues to discuss?

I think we should really explore the possibilities for a running cluster to auto-upgrade to a new consensus mode. For example, the node with the highest epoch can decide to switch to a raft algorithm, when all known nodes are known to support it, and start sending out new messages on a new format. When such message reach the other nodes, they switch too. We can use the existing TCP connections. (We could use UDP but it's harder to get DTLS working (OpenSSL doesn't even have DTLS 1.3 yet) than to just use TCP and TLS. But that's a different discussion.)

This sketch ↑ is just to reason that it's theoretically possible. Then we can weigh how important it is, compared to safety, simplicity, separation and other aspects.

from valkey.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.