Git Product home page Git Product logo

Comments (5)

aembke avatar aembke commented on September 13, 2024

So currently Fred doesn't use replica nodes at all. I have a big TODO in the code to handle this in the future, but for now the library ignores them.

There are some big gotchas with Redis and replicas due to the fact that replication is asynchronous, and for my purposes at least at work this is a problem. Redis recently added the WAIT command to deal with some of this, but frankly I'm not a fan of that strategy. There are a number of distributed systems folks that have written better blog posts, etc, on this, so I won't repeat it here.

I do plan on adding replica node support for reads in the future, but it's not likely something I'm going to get to for a bit here. As you correctly point out this significantly complicates the failure mode scenarios. However, if you do need that you can always point a centralized client at a replica and ensure that you use read-only commands, however that's not a great solution.

However, if you use the sentinel interface you can automatically fail over all commands/connections to a replica. However that's mostly just dodging your question.

My initial thoughts on this from a prio standpoint is that pointing reads at replicas is only really useful when you use a cluster, and you're using cross-AZ replication, such that the cost of adding cluster nodes is quite high due to the multiple on your costs from the added replicas in different AZs. In that case it makes sense to try to direct read load to replicas assuming you're ok with occasional consistency issues. However, if you're using a centralized deployment and trying to use replication for load balancing purposes then you're almost always going to be better off by moving to a cluster. That's just my opinion though based on my experience, so take it with a grain of salt. That use case I outlined is very real for me at work, but seems a bit uncommon for most people, so I kicked out read replica support a ways in my plans for this library.

From an implementation standpoint here's the open questions I had noted for this, and why it's maybe more complicated than it looks.

  • Identifying commands as read vs write is pretty easy.
  • The connection management is complicated. You pointed out the failure mode scenario, but the happy path is also complicated.
  • There are potential consistency issues that come with this.
  • Cluster rebalancing becomes more complicated.

Consider the following scenario:

  • You're using Elasticache or something equivalent in one of the big cloud providers.
  • You're using cross-AZ replication for redundancy purposes. Therefore your costs are non trivial per added replica. Your bandwidth costs are also not negligible.
  • You have >=2 replicas per primary node where the 2 replicas are in different AZs.
  • At least one of the replicas is in an AZ that is closer to your primary node on the network, so you have a preferred failover order, and a preference on which replica should receive commands.

To handle all these use cases the client would need new interfaces for callers to not only enable read-only commands to go to replicas, but also some way to specify the order that replicas should receive commands. Or said another way, you may be using replication for failover purposes, or for load balancing. But it makes a big difference, and it's difficult to declare that information, especially in the face of changing cluster topologies when nodes fail or slots are rebalanced. It can also massively inflate your network usage per application node if you have multiple replicas per primary and/or a lot of primary nodes.

I should be clear though - if you use replicas behind a sentinel layer, or behind a well-managed cluster deployment layer (such as Elasticache, redis labs, or k8s), then everything will work. If a primary nodes goes down your infra should promote a replica to a primary node, and then this information will appear in the CLUSTER NODES response, and fred will handle it properly. The complexity I'm speaking of comes from trying to use a replica for more than failover purposes.

from fred.rs.

tsukit avatar tsukit commented on September 13, 2024

Thanks so much @aembke for the in-depth info! This is pretty help. Our internal clusters are currently sentinel-managed (soon to be running in cluster mode) and we use replicas to distribute reads as the most of the stacks we have are read heavy. When do you think will the ability to read from replicas (the the ability to handle its failure) be available? All ballpark estimate is highly appreciated.

from fred.rs.

aembke avatar aembke commented on September 13, 2024

So if you're using the sentinel interface the failover scenarios are handled today. The same for clusters assuming you're using some sort of management layer that can run the CLUSTER commands to promote a replica to a primary node such that it'll show up as a primary in subsequent CLUSTER NODES responses.

As far as sending reads to replicas without any failure taking place - that's probably a couple months out. My biggest focus after the upcoming 5.0.0 release will be on deep refactoring to make transactions easier to reason about, and then I'll likely start performance tuning work including this kind of automatically-send-reads-to-replicas work.

from fred.rs.

tsukit avatar tsukit commented on September 13, 2024

Thanks @aembke. Look forward to having that support.

from fred.rs.

aembke avatar aembke commented on September 13, 2024

Sounds good, when that's ready or nearing completion I'll tag you so you have a heads up. In the meantime I'm going to close this out and track this in a set of wiki pages that I'm working on for this repo.

from fred.rs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.