Comments (5)
We saw a double-leader situation recently when a ZK server cycled, and we suspect it has something to do with https://issues.apache.org/jira/browse/CURATOR-696. That Curator Jira suggests a bug was introduced by https://issues.apache.org/jira/browse/CURATOR-644 (PR: apache/curator#430).
It seems possible that this did introduce a bug, since that changed the logic from doing reset()
always on reconnection (which would recreate the ephemeral znode) to doing getChildren()
, which would look for existing ones, and then only call reset()
if they could not be found.
We updated to Curator 5.4 some time ago, in #13302. So if this is indeed what’s going on, it has potentially been an issue since Druid 25.
What we saw specifically was this scenario:
-
OL 1 was leader prior to ZK connection loss
-
OL 1 reconnected to ZK and got a session id that we believe is a new session id (although we were not able to confirm that)
-
OL 1's LeaderLatch recipe checked the latch patch and saw an ephemeral znode there that it believed was its own, so it started leadership.
-
OL 2, 30s later, checked the latch path and saw no children at all (not even the one for OL 1). It created an ephemeral znode for itself, and started leadership.
We think what happened is that both OLs established new sessions, even though the old sessions hadn’t expired yet. Because the old sessions hadn’t expired yet, the old ephemeral znodes were still there upon reconnection. The old leader, OL 1, saw both old znodes there and assumed it was still leader. But because those znodes were associated with different sessions, they went away in 30s. When OL 2 noticed that, it assumed there was no active leader, so it became one and then we had two leaders.
from druid.
I commented on CURATOR-696 linking back here.
from druid.
@cryptoe can we re-open this issue since #16425 was reverted in #16445?
from druid.
Another observation is that this condition occurred during a ZK leader election change.
from druid.
@gianm Curator 5.7.0 includes the fix for https://issues.apache.org/jira/browse/CURATOR-696. I'm unsure when this version will be made available, but have asked here.
from druid.
Related Issues (20)
- Ingestion protobuf with SR leads to error: Next token wasn't a START_ARRAY, was[null] from url
- Druid pac4j extension failing during OIDC callback
- Can't use APPROX_QUANTILE/APPROX_QUANTILE_DS to calculate percentiles for sys.segments HOT 2
- Indexer unhealthy after 1 job deployment, querying datasource fails with Next token wasn't a START_ARRAY
- Druid docker Run failing
- Check interval range to avoid cases where year is inappropriately entered HOT 3
- Min() and Max() aggregate functions on string columns HOT 2
- Timestamp field datatype
- Query after pressing Enter to reduce database load and improve efficiency
- Got Interrupted while adding to the Queue
- The SegmentMetadata query returns the thetaSketch column type incorrectly in real-time ingestion range HOT 1
- Reduce the size of the druid tar. HOT 1
- Add diff function in the edit spec screen HOT 1
- Apache Druid Historical Problems HOT 1
- Add support for Kinesis Compression HOT 1
- Error 401 Unauthorized error when using LDAP authentication
- Druid shows null columns even though is not null is used along with other conditions
- Router `priority` strategy implementation does not match documentation
- Availability Zone Fault Tolerance - Documentation Proposal
- Support for enhanced IP data types
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from druid.