Continuation of <a class="issue-link js-issue-link" data-error-text="Failed to load ti

[NEW] Server driven slot migration about valkey HOT 12 OPEN

PingXie commented on July 2, 2024 1

[NEW] Server driven slot migration

from valkey.

Comments (12)

madolson commented on July 2, 2024 1

CLUSTER MIGRATE REPORT

This is what info is for, what exactly is the value of introducing a net new command for this? We might also take a hint and add the context to cluster-shards since it's extensible to include ongoing slot migrations.

CLUSTER MIGRATE CANCEL

Do we need to have the optional slots option? this requires an extra optional step of figuring what slots are being migrated, and then forces them to cancel those. I would prefer just CLUSTER MIGRATE CANCEL which cancels all ongoing migrations on the node, so that the end user can retry them.

queued_time

What is the benefit of supporting queues? Since have the concept of failed migrations (if the target is full), it seems to add complexity on the end user to queue up a bunch of slots. I guess my question of the queue is that can you do MIGRATE SLOTS 1 MIGRATE SLOTS 2 as two separate commands, or can you only create one queue at a time? I guess maybe the intention is queue up multiple different slots to different nodes?

from valkey.

PingXie commented on July 2, 2024

CLUSTER MIGRATE REPORT

This is what info is for, what exactly is the value of introducing a net new command for this? We might also take a hint and add the context to cluster-shards since it's extensible to include ongoing slot migrations.

We need multiple properties for each slot being migrated. For instance, on the source side, I am thinking: slot_number, target_shard_id, state (started/pending/failed), num_retries, queued_time, start_time, update_time. If we go with INFO, we would need to use a comma-separate list format (like how we return per command stats), which requires additional parsing work on the client side. I am not totally against that given these precedents but I am hoping to use a more structural format like RESP arrays. I think I can go either way.

CLUSTER MIGRATE CANCEL

Do we need to have the optional slots option? this requires an extra optional step of figuring what slots are being migrated, and then forces them to cancel those. I would prefer just CLUSTER MIGRATE CANCEL which cancels all ongoing migrations on the node, so that the end user can retry them.

Make sense.

queued_time

What is the benefit of supporting queues? Since have the concept of failed migrations (if the target is full), it seems to add complexity on the end user to queue up a bunch of slots. I guess my question of the queue is that can you do MIGRATE SLOTS 1 MIGRATE SLOTS 2 as two separate commands, or can you only create one queue at a time? I guess maybe the intention is queue up multiple different slots to different nodes?

I think the queuing proposal comes from the slot migration operation being async and non-blocking. This automatically opens the door to the client submitting multiple requests and the server picking its own preference, one at a time, or batch however it sees fit, which in turn means some slots might not get worked on after the migration request is accepted. That is why differentiating queued and started (or in_progress) is useful.

from valkey.

madolson commented on July 2, 2024

I am not totally against that given these precedents but I am hoping to use a more structural format like RESP arrays. I think I can go either way.

There has been discussion about introducing a way for info to be in RESP, I should go find it and open another issue. My understanding is clients already know how to parse info responses, so it should be okay to add the next field (sort of like how we show replicas as multiple items in the info). I suppose part of my question also stems from if we need queues. If we just have one "ongoing" batch, it's much easier to show the info.

I think the queuing proposal comes from the slot migration operation being async and non-blocking. This automatically opens the door to the client submitting multiple requests and the server picking its own preference, one at a time, or batch however it sees fit, which in turn means some slots might not get worked on after the migration request is accepted. That is why differentiating queued and started (or in_progress) is useful.

I suppose my view is that most applications would submit something more like one "batch" of slots to migrate from node A to node B, and they would poll for it's completion. Once it's completed they would send the second "batch". If you submit a queue of batches, you need to reason on the server side what to do if one of them fails. If we take as a tenet that we would like to keep Valkey to be simple, I would like to understand deeply why we think building the queues adds to the end user experience.

from valkey.

PingXie commented on July 2, 2024

There has been discussion about introducing a way for info to be in RESP, I should go find it and open another issue.

That is all I need to go with INFO.

My understanding is clients already know how to parse info responses, so it should be okay to add the next field (sort of like how we should replicas as multiple items in the queue).

There is a secondary parsing involved here to extract out these "sub-fields". Without a proper RESP format, this essentially degenerates into CLUSTER NODES, whose serialization and parsing are very error prone. Regardless, there will be changes on the client side to parse out the detailed information and for that I would like to go with RESP. Whether through a new command or not is immaterial to me (and agreed we should bias towards less commands).

I suppose my view is that most applications would submit something more like one "batch" of slots to migrate from node A to node B, and they would poll for it's completion.

Agreed.

Once it's completed they would send the second "batch".

This puts more state tracking burden on the client side for someone who submits multiple batches of requests without waiting, which is your question below.

If you submit a queue of batches, you need to reason on the server side what to do if one of them fails.

I think that you have an (implicit) assumption here about the implementation, i.e., the server will process ALL submitted slots at once. This might or might not be true; we discussed in the past that even with atomic slot migration, we might start migrating one slot at a time. Unless all the slots are processed as a single batch, you will have to reason about the same question. As part of this proposal, I would like to include bounded exponential retries to handle this failure and include num_retries, last_error, start_time, and the update_time for client observability.

If we take as a tenet that we would like to keep Valkey to be simple, I would like to understand deeply why we think building the queues adds to the end user experience.

I agree with the simplicity tenet but I think we need to look from an end-to-end perspective of "combined complexity". For instance, the current migration scheme is simple for the server but puts too much burden on the client side to the point that the overall complexity is very high.

from valkey.

madolson commented on July 2, 2024

I think that you have an (implicit) assumption here about the implementation, i.e., the server will process ALL submitted slots at once.

You could also be queueing up slots to multiple different nodes at the same nodes, which you also can't do atomically. On failure, do you drop the whole queue or do you move to the next node? For atomic slot migration we could do [...] the slots are processed as a single batch, which I think is a nice property, but can't be covered with this implementation.

As part of this proposal, I would like to include bounded exponential retries to handle this failure and include num_retries, last_error, start_time, and the update_time for client observability.

I don't think the server should own this. It feels like we're pushing a lot more state than is really necessary. If there is a bug here, you have to patch the engine to fix it. Putting it in tooling seems a lot safer.

For instance, the current migration scheme is simple for the server but puts too much burden on the client side to the point that the overall complexity is very high.

I don't think this is true. I know a lot of developers that just use the build in valkey-cli cluster migrate functionality, and it works very simply for them. They point it at the cluster, and it does the migration. If it fails, they check the result, update it, and try again. I think you're not coming at is from most users, but you are coming at it from a managed cloud provider who wanted to fully automate it and put retry logic around it.

from valkey.

madolson commented on July 2, 2024

Maybe the thing that would make it easier to decide on is what we want the user experience to look like in the ideal state.

from valkey.

PingXie commented on July 2, 2024

I think you're not coming at is from most users, but you are coming at it from a managed cloud provider who wanted to fully automate it and put retry logic around it.

No. Quite the opposite. I am thinking purely from the self-managed users here. The cli tool can hide some complexity but I don't think it can achieve things like auto-resuming a failed slot migration (due to fail-overs). It is also a hassle to constantly rerun the tool Vs fire-n-forget, for which I imagine the following contract: "once slots are submitted for migration, I no longer need to monitor the system constantly to either re-submit the failed request or fix the broken link myself; the system should either retry for me or park the system in a perpetually stable state so that availability is upheld (at the cost of performance in the failure case). "

from valkey.

madolson commented on July 2, 2024

The cli tool can hide some complexity but I don't think it can achieve things like auto-resuming a failed slot migration (due to fail-overs).

That is wrong though, it could resume a failed slot migration. It knows the topology and the intention, so it should be able to resume the operation. I don't understand the bias towards pushing more orchestration complexity into the engine. I would like the engine to be able to make sure it is in a stable state, but during failure modes it's not clear why the engine should continue to take action on its own.

from valkey.

PingXie commented on July 2, 2024

That is wrong though, it could resume a failed slot migration.

You will have to come back and restart the operation using the cli.

I would like the engine to be able to make sure it is in a stable state

Yes that is the first step.

but during failure modes it's not clear why the engine should continue to take action on its own.

It would require less human intervention - especially valuable in the self-managed case. That said, I think this is completely incremental work on top of the new command the atomic slot migration so I am fine with leaving it out of the first version and we can continue the brainstorming.

from valkey.

hwware commented on July 2, 2024

Sorry I just find this issue, it looks like Ping want to involve some features that allow the cluster do the resharding automatically. I will take carefully about this issue.

But I have one idea for the resharding rebalance: in real production, sometimes, some keys are very hot for reading and writing, and if these kinds of hot keys are located in the same shards or same slots, cluster has the ability to migrate these hot keys to cold slot or shard,

from valkey.

madolson commented on July 2, 2024

You will have to come back and restart the operation using the cli.

This assumes the CLI errors out instead of issuing the command again :) You are right if the error is on the server the CLI is running you have to try to issue it again. Such would also be true on whatever system is polling for the failure.

from valkey.

zuiderkwast commented on July 2, 2024

Current management with the cli is a hassle. valkey-cli --cluster fix is best effort and it makes some wild guesses. We could improve the cli though, but I want the command sequence to be simple, so it can be done from other tools without being error prone. Ideally just calling one command to migrate a slot.

I do want to have some self-healing built into the server/cluster itself. Leaving a migration half-ways is stable but not good if undetected. Resuming a slot migration becomes a non-issue though with atomic slot migration.

I'm skeptical to the queuing mechanism.

Why not make the command blocking? That seems to me like good user experience. Then migrating one by one is just a pipeline.

Or we can make the command be async and push (resp3) a message on error or completion.

To migrate slots to multiple destinations at the same time, we could allow syntax like

CLUSTER MIGRATE SLOT 500-599 TO shardid1 SLOT 600-699 TO shardid2

from valkey.

[NEW] Server driven slot migration about valkey HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent