Git Product home page Git Product logo

Comments (7)

applike-ss avatar applike-ss commented on June 14, 2024 2

@applike-ss can you please check whether this package https://github.com/dragonflydb/dragonfly/pkgs/container/dragonfly-weekly/191456274?tag=e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu solves the issue?

(it's built from #2731)

It looks to be running now with the redis-cluster implementation of the lettuce library.

@chakaz is right to say that the slave ids are still the same. It doesn't seem to be an issue currently for us though.

from dragonfly.

chakaz avatar chakaz commented on June 14, 2024

@applike-ss hey, thanks for the bug report!

(1) Can you confirm that you run Dragonfly in emulated cluster mode? (--cluster_mode=emulated), as opposed to a real cluster
(2) Can you maybe create minimal repro instructions? I don't know flink nor lettuce I'm afraid

from dragonfly.

chakaz avatar chakaz commented on June 14, 2024

Looking more closely, I see that somehow Dragonfly has 2 replicas with the same node ID (c23888188f5d350b552aa8d8aa7ad40a05765b26). That is impossibly unlikely to be actual 2 distinct replicas, so I wonder - did something bad happen during that time? Like a replica changed IP address for some reason?
Can you please try and see if this is the reason for the error (i.e. try running your app with a single replica in a good state)?

from dragonfly.

chakaz avatar chakaz commented on June 14, 2024

Ok, I think I got it. It looks like Redis replies with just 0A (\n) between lines, while Dragonfly replies with 0D0A (\r\n):

Dragonfly:

00000000: 3530 6134 3139 3933 3030 6564 3837 3263  50a4199300ed872c
00000010: 3066 3434 6563 3765 3639 3333 6661 3430  0f44ec7e6933fa40
00000020: 3961 3261 3831 6538 2031 3237 2e30 2e30  9a2a81e8 127.0.0
00000030: 2e31 3a36 3337 3940 3633 3739 206d 7973  .1:6379@6379 mys
00000040: 656c 662c 6d61 7374 6572 202d 2030 2030  elf,master - 0 0
00000050: 2030 2063 6f6e 6e65 6374 6564 2030 2d31   0 connected 0-1
00000060: 3633 3833 0d0a                           6383..

Redis:

00000000: 3230 6337 6535 3939 6432 3665 6638 3133  20c7e599d26ef813
00000010: 3932 3861 3061 6138 6438 3865 3464 6262  928a0aa8d88e4dbb
00000020: 3838 3334 3730 3961 2031 3237 2e30 2e30  8834709a 127.0.0
00000030: 2e31 3a37 3030 3240 3137 3030 3220 6d61  .1:7002@17002 ma
00000040: 7374 6572 202d 2030 2031 3731 3034 3435  ster - 0 1710445
00000050: 3733 3237 3136 2033 2063 6f6e 6e65 6374  732716 3 connect
00000060: 6564 2031 3039 3233 2d31 3633 3833 0a    ed 10923-16383.
[...]

We do that both in CLUSTER INFO and in CLUSTER NODES, but for some reason Redis replies with \r\n for INFO but with only \n for NODES 🤷

Redis CLUSTER INFO:

00000000: 636c 7573 7465 725f 7374 6174 653a 6f6b  cluster_state:ok
00000010: 0d0a 636c 7573 7465 725f 736c 6f74 735f  ..cluster_slots_
00000020: 6173 7369 676e 6564 3a31 3633 3834 0d0a  assigned:16384..
00000030: 636c 7573 7465 725f 736c 6f74 735f 6f6b  cluster_slots_ok
00000040: 3a31 3633 3834 0d0a 636c 7573 7465 725f  :16384..cluster_
[...]

Anyway, for some reason flink/lettuce is probably sensitive to that, so we should be compatible.

I'm still curious about the 2 nodes with the same ID situation you got there though.

from dragonfly.

applike-ss avatar applike-ss commented on June 14, 2024

@applike-ss hey, thanks for the bug report!

(1) Can you confirm that you run Dragonfly in emulated cluster mode? (--cluster_mode=emulated), as opposed to a real cluster (2) Can you maybe create minimal repro instructions? I don't know flink nor lettuce I'm afraid

Regarding the emulated cluster mode, yes we do use that in our dragonfly test setup. Beforehand we used --cluster_mode=yes, but i didn't see a way to let the operator configure the cluster automatically so reverted back to emulated.

I confirmed that it is also really using it by:

  • checking the args of the pod
  • checking the error message from our redis cluster client changing from something like "cluster is not yet setup" to the given error above

All node ids of slaves are in fact still the same for the slaves. Not sure how dragonfly is handling/creating these (e.g. maybe it was intentional to signal some state like both being data-wise exactly the same).

I did now also check six other test clusters i did set up yesterday and every of them has the same node id for the two slaves (3 replica setup => 1 master, 2 slaves).

I am using the dragonfly-operator (https://raw.githubusercontent.com/dragonflydb/dragonfly-operator/v1.1.1/manifests/dragonfly-operator.yaml), kustomized to use a different namespace and with image docker.dragonflydb.io/dragonflydb/operator:v1.1.1 to create the dragonfly clusters.

This is an example resource, i used to spawn one of the clusters:

apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
  name: dragonfly-app
spec:
  image: docker.dragonflydb.io/dragonflydb/dragonfly:v1.15.1
  args:
    - '--cache_mode'
    - '--primary_port_http_enabled=true'
    - '--cluster_mode=emulated'
  snapshot:
    cron: '*/5 * * * *'
    persistentVolumeClaimSpec:
      resources:
        requests:
          storage: 1Gi
      accessModes:
        - ReadWriteOnce
  resources:
    limits:
      cpu: 100m
      memory: 320Mi
    requests:
      cpu: 100m
      memory: 320Mi
  replicas: 3
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: kubernetes.io/hostname
          labelSelector:
            matchExpressions:
              - key: app
                operator: In
                values:
                  - dragonfly-app

Let me know if i can provide any additional information that helps tracking this down.

from dragonfly.

romange avatar romange commented on June 14, 2024

@applike-ss can you please check whether this package https://github.com/dragonflydb/dragonfly/pkgs/container/dragonfly-weekly/191456274?tag=e8650ed2b4ebd550c966751dd33ebb1ac4f82b1f-ubuntu solves the issue?

(it's built from #2731)

from dragonfly.

chakaz avatar chakaz commented on June 14, 2024

Looking more closely, I see that somehow Dragonfly has 2 replicas with the same node ID (c23888188f5d350b552aa8d8aa7ad40a05765b26). That is impossibly unlikely to be actual 2 distinct replicas, so I wonder - did something bad happen during that time? Like a replica changed IP address for some reason? Can you please try and see if this is the reason for the error (i.e. try running your app with a single replica in a good state)?

I filed #2734 for that ⬆️ issue.

from dragonfly.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.