Comments (7)
@applike-ss can you please check whether this package https://github.com/dragonflydb/dragonfly/pkgs/container/dragonfly-weekly/191456274?tag=e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu solves the issue?
(it's built from #2731)
It looks to be running now with the redis-cluster implementation of the lettuce library.
@chakaz is right to say that the slave ids are still the same. It doesn't seem to be an issue currently for us though.
from dragonfly.
@applike-ss hey, thanks for the bug report!
(1) Can you confirm that you run Dragonfly in emulated cluster mode? (--cluster_mode=emulated
), as opposed to a real cluster
(2) Can you maybe create minimal repro instructions? I don't know flink nor lettuce I'm afraid
from dragonfly.
Looking more closely, I see that somehow Dragonfly has 2 replicas with the same node ID (c23888188f5d350b552aa8d8aa7ad40a05765b26
). That is impossibly unlikely to be actual 2 distinct replicas, so I wonder - did something bad happen during that time? Like a replica changed IP address for some reason?
Can you please try and see if this is the reason for the error (i.e. try running your app with a single replica in a good state)?
from dragonfly.
Ok, I think I got it. It looks like Redis replies with just 0A
(\n
) between lines, while Dragonfly replies with 0D0A
(\r\n
):
Dragonfly:
00000000: 3530 6134 3139 3933 3030 6564 3837 3263 50a4199300ed872c
00000010: 3066 3434 6563 3765 3639 3333 6661 3430 0f44ec7e6933fa40
00000020: 3961 3261 3831 6538 2031 3237 2e30 2e30 9a2a81e8 127.0.0
00000030: 2e31 3a36 3337 3940 3633 3739 206d 7973 .1:6379@6379 mys
00000040: 656c 662c 6d61 7374 6572 202d 2030 2030 elf,master - 0 0
00000050: 2030 2063 6f6e 6e65 6374 6564 2030 2d31 0 connected 0-1
00000060: 3633 3833 0d0a 6383..
Redis:
00000000: 3230 6337 6535 3939 6432 3665 6638 3133 20c7e599d26ef813
00000010: 3932 3861 3061 6138 6438 3865 3464 6262 928a0aa8d88e4dbb
00000020: 3838 3334 3730 3961 2031 3237 2e30 2e30 8834709a 127.0.0
00000030: 2e31 3a37 3030 3240 3137 3030 3220 6d61 .1:7002@17002 ma
00000040: 7374 6572 202d 2030 2031 3731 3034 3435 ster - 0 1710445
00000050: 3733 3237 3136 2033 2063 6f6e 6e65 6374 732716 3 connect
00000060: 6564 2031 3039 3233 2d31 3633 3833 0a ed 10923-16383.
[...]
We do that both in CLUSTER INFO
and in CLUSTER NODES
, but for some reason Redis replies with \r\n
for INFO
but with only \n
for NODES
🤷
Redis CLUSTER INFO
:
00000000: 636c 7573 7465 725f 7374 6174 653a 6f6b cluster_state:ok
00000010: 0d0a 636c 7573 7465 725f 736c 6f74 735f ..cluster_slots_
00000020: 6173 7369 676e 6564 3a31 3633 3834 0d0a assigned:16384..
00000030: 636c 7573 7465 725f 736c 6f74 735f 6f6b cluster_slots_ok
00000040: 3a31 3633 3834 0d0a 636c 7573 7465 725f :16384..cluster_
[...]
Anyway, for some reason flink/lettuce is probably sensitive to that, so we should be compatible.
I'm still curious about the 2 nodes with the same ID situation you got there though.
from dragonfly.
@applike-ss hey, thanks for the bug report!
(1) Can you confirm that you run Dragonfly in emulated cluster mode? (
--cluster_mode=emulated
), as opposed to a real cluster (2) Can you maybe create minimal repro instructions? I don't know flink nor lettuce I'm afraid
Regarding the emulated cluster mode, yes we do use that in our dragonfly test setup. Beforehand we used --cluster_mode=yes, but i didn't see a way to let the operator configure the cluster automatically so reverted back to emulated.
I confirmed that it is also really using it by:
- checking the args of the pod
- checking the error message from our redis cluster client changing from something like "cluster is not yet setup" to the given error above
All node ids of slaves are in fact still the same for the slaves. Not sure how dragonfly is handling/creating these (e.g. maybe it was intentional to signal some state like both being data-wise exactly the same).
I did now also check six other test clusters i did set up yesterday and every of them has the same node id for the two slaves (3 replica setup => 1 master, 2 slaves).
I am using the dragonfly-operator (https://raw.githubusercontent.com/dragonflydb/dragonfly-operator/v1.1.1/manifests/dragonfly-operator.yaml), kustomized to use a different namespace and with image docker.dragonflydb.io/dragonflydb/operator:v1.1.1
to create the dragonfly clusters.
This is an example resource, i used to spawn one of the clusters:
apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
name: dragonfly-app
spec:
image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.15.1
args:
- '--cache_mode'
- '--primary_port_http_enabled=true'
- '--cluster_mode=emulated'
snapshot:
cron: '*/5 * * * *'
persistentVolumeClaimSpec:
resources:
requests:
storage: 1Gi
accessModes:
- ReadWriteOnce
resources:
limits:
cpu: 100m
memory: 320Mi
requests:
cpu: 100m
memory: 320Mi
replicas: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- dragonfly-app
Let me know if i can provide any additional information that helps tracking this down.
from dragonfly.
@applike-ss can you please check whether this package https://github.com/dragonflydb/dragonfly/pkgs/container/dragonfly-weekly/191456274?tag=e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu solves the issue?
(it's built from #2731)
from dragonfly.
Looking more closely, I see that somehow Dragonfly has 2 replicas with the same node ID (
c23888188f5d350b552aa8d8aa7ad40a05765b26
). That is impossibly unlikely to be actual 2 distinct replicas, so I wonder - did something bad happen during that time? Like a replica changed IP address for some reason? Can you please try and see if this is the reason for the error (i.e. try running your app with a single replica in a good state)?
I filed #2734 for that ⬆️ issue.
from dragonfly.
Related Issues (20)
- random crash on mimalloc when df was shut down via signal (ctrl+c) HOT 1
- Horizontal Scaling HOT 1
- fix mimalloc with 32MiB segments
- Running DragonFly on Mac M1 - getting error HOT 7
- cluster_fuzzymigration test failure HOT 1
- FLUSHALL during slot migration causes assert failure HOT 1
- Memcached flags loses during load from snapshot
- implement CLIENT SETINFO
- Dragonfly (rarely) crashes on connection termination during migration HOT 1
- Search: Custom delimiter for tags HOT 2
- Search: FT.ALTER HOT 1
- Search: Escape sequences
- Keyspace notifications HOT 1
- acl small compatibility changes
- Dragonfly sometimes fails with OOM even when `--cache_mode=true` HOT 2
- Implement OPTOUT and NOLOOP for CLIENT TRACKING
- When build the dockerfile from tools/package i am not able to build it successfully HOT 7
- GEOSEARCH still "PARTIALLY SUPPORTED" HOT 2
- segment_allocator.cc:24] address_table_ map is growing too large: 4129 HOT 5
- Software error on replication HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dragonfly.