Comments (19)
sure, understandable and understood
I just wanted to make a sign-of-life. I will describe the "incomplete record migration" in detail and reproducible way. And it is also totally fine if this edge case does not block release of 3.13.0 and slowly improved in followup patch versions (either by me or by others)
from rabbitmq-server.
The more I think about the problem, the more I believe we shouldn't use the new ID format. It duplicates data for no benefit. And while looking into this, I also discovered that the new ID format is already out there for shovels in 3.12.x and 3.11.x as you said.
We put so much effort into making sure the upgrade to Khepri was smooth for the past few years (and there are still rough edges as we can see), that I refuse to add code to federation to "fix" something that shouldn't be there in the first place.
The only acceptable solution to me is to find something to convert new IDs back to their original format in the shovel plugin.
from rabbitmq-server.
@gomoripeti can't we do exactly what we did to Shovels? It sounds like the same problem in a different place to me:
from rabbitmq-server.
@michaelklishin indeed this is the exact same problem. But this is not as bad because the id format change was not backported for federation, so all already released versions (except release candidates) still use the old id format. So there is a simpler solution (which is enabled by #10096): "Revert to old id format if khepri is disabled." My reason to prefer this one is because it took about 3 patch releases to get shovels right.
from rabbitmq-server.
@gomoripeti nonetheless, we did settle on a certain solution for Shovel. Mnesia ("Khepri is not enabled") will go extinct in the land of RabbitMQ in a single digit number of months.
The hard part is the upgrade path, and it cannot be avoided.
from rabbitmq-server.
sorry for the delayed response, finally I had time to spend on this topic
I agree that the upgrade path is hard. I see it like this
3.12 Mnesia -(step 1.)-> 3.13 Mnesia -(step 2.)-> 3.13 Khepri
- 3.12 Mnesa only supports old child id format
- 3.13 Mnesia only supported new child id format and there is no conversion in step 1.
after lot of testing and trial and error I changed my mind about the fix and submitting a PR to map the solution implemented for "shovels" - 3.13 Khepri only supports new child id format
In step 2. a record conversion from old child id format to new one was implemented in #10096
However during testing I found out that this is not complete, the supervisor state (in the process) still has the old child ids and this inconsistency leads to various failures after migrating to Khepri
I will submit a separate Issue to report this. Based on previous weeks I cannot commit to working on a fix for this unfortunately.
from rabbitmq-server.
Going straight from 3.12 to 3.13 with Khepri is possibly in theory but very unlikely for production systems, and we can document this fact.
I'm not sure what you mean by "based on previous weeks".
from rabbitmq-server.
Federation links are not that different from dynamic Shovels:
- They are started at node boot
- They have IDs and attach to a mirrored supervisor
- They store some of their state in the schema data store
Why specifically can't the solution we have for shovels work here?
Also, can you be more specific than the convertion is "not complete"? Claims like that do not help us ship 3.13 sooner in any way, so we either do it in a way that would be problematic for CloudAMQP, or CloudAMQP will miss out on this RabbitMQ version and then 4.0, and all the fundamental improvements that come with not using Mnesia at all in RabbitMQ.
from rabbitmq-server.
@gomoripeti I will be blunt: we need more specifics here and either decide to address it, or 3.13.0 will ship as is, and it will be up to the companies that host RabbitMQ as a service to find a solution.
3.13 cannot be delayed for a few more months.
from rabbitmq-server.
I opened #10440 to describe the "incomplete record migration" during Khepri migration issue.
This issue could track the 3.12 -> "3.13 Mnesia" upgrade part only, which should be addressed by PR #10416 .
from rabbitmq-server.
The 3.12 -> 3.13 Mnesia migration should be handled by #10453. See #10440 for future work related to Khepri and Federation, Shovel supervisor child IDs.
from rabbitmq-server.
Just to clarify a few points, which should make a solution easier to find:
Mnesia ("Khepri is not enabled") will go extinct in the land of RabbitMQ in a single digit number of months.
Very unlikely as it would be removed from RabbitMQ 4.1.0 at the earliest.
Going straight from 3.12 to 3.13 with Khepri is possibly in theory
No, it's not possible: you have to upgrade, then enable khepri_db
. To enable khepri_db
, all nodes in the cluster must be running and know about khepri_db
, like other feature flags.
I'm going to look at other issues and pull requests linked in this one.
from rabbitmq-server.
I have a local prototype that approaches the problem differently: instead of changing the ID format and breaking running supervision trees and causing issues during upgrade, the ID is left alone but is converted to a Khepri-compatible path only when we need it.
This relies on the fact that the Group
argument is always a module AFAICT, even if mirrored_supervisor
documents that it can be any term.
Diana is in holidays currently, so I can't run the idea by her for now. I will prepare a more complete patch and publish it to GitHub so you can take a look.
from rabbitmq-server.
Here is the branch:
https://github.com/rabbitmq/rabbitmq-server/tree/rework-mirrored_supervisor-child-id
I tested it lightly so far.
What do you think @gomoripeti?
Edit: I changed the branch above. It includes the commit from the previously mentionned branch, plus other commits to address the whole problem with both federation and shovels and a testcase that reproduces the scenario at the beginning of this issue.
from rabbitmq-server.
the ID is left alone but is converted to a Khepri-compatible path only when we need it
👍 👍 👍
from rabbitmq-server.
I am re-opening based on @dumbbell's comment
from rabbitmq-server.
Thanks all for looking at this.
-
On the administrative side this issue was addressing only the 3.13 on Mnesia case, and there was a PR from Michael merged addressing this. That's why this issue was closed. There is #10440 which tracks the Khepri path and friends problem.
-
the ID is left alone
This is fine for federation but shovel plugin has the same issue. And there the new id format is already released in 3.12 and even 3.11. So there might be deployments out there which already have running shovel workers with both old and new child id format. Then we have to think about how to convert the new id format back to the old one.
-
Given the current situation that versions supporting both old and new child id format on Mnesia are already merged for federation and already released for shovels, what do you all think about a solution:
Restart all mirrored supervisor workers at the start of khepri migration (from the init_copy_to_khepri callback) before copying of records starts. Either by using the assumption thatgroup
is a module or by a registration mechanism that Diana implemented.
from rabbitmq-server.
The work in progress to address that upgrade problem is tracked in pull request #10472.
from rabbitmq-server.
Fixed by #10472.
from rabbitmq-server.
Related Issues (20)
- Streams: consider allowing non-numerical values for publishingId HOT 3
- Log tls handshake timeouts HOT 1
- Include x-death header in Stream messages HOT 1
- 4.x: reduce default maximum message size further (e.g. to 64 or 50 MiB) HOT 13
- Publish `amqp_client` 3.13.x GA versions to hex
- Emit event if configured queue length is reached
- Don't close connection if channel without finished publish is closed HOT 3
- [Prometheus] Reduce number of series for per-exchange/per-queue metrics HOT 8
- Logout action does not fully work on management UI
- Consul peer discovery: nodes can leave behind a service record in case of an unresolvable address HOT 2
- Output "last write" time for Streams
- Peer Discovery with DNS record wont cluster HOT 2
- 3.13.2: "x-opt-rabbitmq-received-time" only available for new messages HOT 5
- Allow Khepri post-migration step to not delete certain files or dirs HOT 1
- 3.12.x: AMQP 1.0 connection authentication fails with an external HTTP auth service HOT 11
- k8s peer discovery : rabbitmq pods are booting as a standalone nodes on ipv6 environment. HOT 4
- prometheus.filter_aggregated_queue_metrics_pattern has no effect HOT 3
- Shovels cant be deleted if they dont connect HOT 2
- rabbit_heartbeat.erl compile error with erlang OTP 27 HOT 2
- x-death header count is no longer incremented from RabbitMQ version 3.13.0+ HOT 14
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rabbitmq-server.