Git Product home page Git Product logo

Comments (23)

rhishikeshj avatar rhishikeshj commented on July 24, 2024

@aphyr And what do you know, just as I went to run the tests again hoping to send you a stack trace, they worked ! 🍻 🙂
I will try running them again to see if there is some instability. Other than that if you see any obvious steps that I have missed, do let me know.
I ll paste the SSH related exceptions here as soon as I encounter them :)

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

Here are some issues I faced while getting this MongoDB jepsen suite running locally with docker. Information about the code that I am using

Jepsen : commit a2bcad59f0df5bd39cea1e61d9b64376c479df9c (HEAD -> main)
MongoDB : commit 83548bb8e054170ecc4b8fda70390e40fcca5e30 (origin/master, origin/HEAD)

Initially I had an issue of not enough nodes (by default Jepsen starts 5 nodes in docker) as evident by this function jepsen.mongodb.db/shard-node-plan I fixed that by adding 2 more nodes.
Then I hit another roadblock, while installing mongoDB on each node, it error'd out saying that a required dependency can't be found, specifically libcurl3 So apparently, libcurl4 and libcurl3 don't work well together and in-spite of efforts I wasn't able to get libcurl3 and mongo running. So I changed the way Jepsen was installing MongoDB and followed the official documentation that installs Mongo 4.2. That worked.
But now I am still unable to run the tests as every time there seems to be some SSH related exception saying the control node cant reach the DB nodes.

I changed the installation instructions for MongoDB since the default instructions in setup! were error'ing out due to a libcurl3 dependency. Instructions that I have coded into setup! instead

(defn install!
  [test]
  "Installs MongoDB on the current node."
  (c/su
   (c/exec :mkdir :-p "/tmp/jepsen")
   (let [version (:version test)
         m-version (str/join "." (butlast (str/split "4.2.10" #"\.")))
         versioner #(keyword (str "mongodb-" %1 "=" version))]
     (c/exec :dpkg :--configure :-a)
     (c/exec :apt :-y :--fix-broken :install)
     ()
     (c/exec :apt-get :install :gnupg)
     (c/exec :wget :-qO :-
             (str "https://www.mongodb.org/static/pgp/server-" m-version ".asc")
             :| :apt-key :add :-)
     (c/exec :echo (str "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu bionic/mongodb-org/" m-version " multiverse") :| :tee (str "/etc/apt/sources.list.d/mongodb-org-" m-version ".list"))
     (c/exec :apt-get :update)
     (c/exec :apt-get :install :-y
             (versioner "org")
             (versioner "org-server")
             (versioner "org-shell")
             (versioner "org-mongos"))
     (c/exec :systemctl :daemon-reload))))

Some of the code here for example the --fix-broken stuff is for fixing some weird state that my debian nodes were going into. Please ignore.

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

Huh, okay... I can say that the test is designed for a specific version of debian--it's been a while since I poked my head into the docker and mongo tests, but this miiiight be due to a mismatch between those versions? The libcurl transition has been a real bear: some systems need 3, some 4, etc. etc.

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

If this change (the change in setup! to install MongoDB) works well, can I open a PR to submit this change ? What other kinds of tests do you require before taking contributions ? Any guides about other instructions for code contributions ?

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

I think it'd be good to figure out what version of Debian worked before, and what version it works with now, and to document that in the README, for starters! I do apologize, this was a rush job in my free time, and I wasn't as diligent about future-proofing things as I should have been!

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

So running it 5 times, caused 1 instance of the test suite crashing

com.mongodb.MongoSocketOpenException: Exception opening socket
        at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:70) ~[mongodb-driver-core-4.0.2.jar:na]
        at com.mongodb.internal.connection.InternalStreamConnection.open(InternalStreamConnection.java:127) ~[mongodb-driver-core-4.0.2.jar:na]
        at com.mongodb.internal.connection.DefaultServerMonitor$ServerMonitorRunnable.run(DefaultServerMonitor.java:131) ~[mongodb-driver-core-4.0.2.jar:na]
        at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method) ~[na:na]
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399) ~[na:na]
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242) ~[na:na]
        at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224) ~[na:na]
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403) ~[na:na]
        at java.base/java.net.Socket.connect(Socket.java:609) ~[na:na]
        at com.mongodb.internal.connection.SocketStreamHelper.initialize(SocketStreamHelper.java:63) ~[mongodb-driver-core-4.0.2.jar:na]
        at com.mongodb.internal.connection.SocketStream.initializeSocket(SocketStream.java:79) ~[mongodb-driver-core-4.0.2.jar:na]
        at com.mongodb.internal.connection.SocketStream.open(SocketStream.java:65) ~[mongodb-driver-core-4.0.2.jar:na]
        ... 3 common frames omitted
WARN [2020-12-03 16:40:00,246] main - jepsen.core Test crashed!
java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: Cannot write jepsen.control$session$fn__3025@54bb1068 as tag null
        at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[na:na]
        at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) ~[na:na]
        at clojure.core$deref_future.invokeStatic(core.clj:2300) ~[clojure-1.10.0.jar:na]
        at clojure.core$future_call$reify__8439.deref(core.clj:6974) ~[clojure-1.10.0.jar:na]
        at clojure.core$deref.invokeStatic(core.clj:2320) ~[clojure-1.10.0.jar:na]
        at clojure.core$deref.invoke(core.clj:2306) ~[clojure-1.10.0.jar:na]
        at clojure.core$map$fn__5851.invoke(core.clj:2753) ~[clojure-1.10.0.jar:na]
        at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.10.0.jar:na]
        at clojure.lang.LazySeq.seq(LazySeq.java:51) ~[clojure-1.10.0.jar:na]
        at clojure.lang.RT.seq(RT.java:531) ~[clojure-1.10.0.jar:na]
        at clojure.core$seq__5387.invokeStatic(core.clj:137) ~[clojure-1.10.0.jar:na]
        at clojure.core$dorun.invokeStatic(core.clj:3133) ~[clojure-1.10.0.jar:na]
        at clojure.core$dorun.invoke(core.clj:3133) ~[clojure-1.10.0.jar:na]
        at jepsen.store$save_1_BANG_.invokeStatic(store.clj:376) ~[jepsen-0.1.19.jar:na]
        at jepsen.store$save_1_BANG_.invoke(store.clj:372) ~[jepsen-0.1.19.jar:na]
        at jepsen.core$run_BANG_$fn__10005$fn__10012.invoke(core.clj:633) ~[jepsen-0.1.19.jar:na]
        at jepsen.core$run_BANG_$fn__10005.invoke(core.clj:619) ~[jepsen-0.1.19.jar:na]
        at jepsen.core$run_BANG_.invokeStatic(core.clj:605) ~[jepsen-0.1.19.jar:na]
        at jepsen.core$run_BANG_.invoke(core.clj:531) ~[jepsen-0.1.19.jar:na]
        at jepsen.cli$test_all_run_tests_BANG_$fn__10790.invoke(cli.clj:422) ~[jepsen-0.1.19.jar:na]
        at clojure.core$map_indexed$mapi__8533$fn__8534.invoke(core.clj:7308) ~[clojure-1.10.0.jar:na]
        at clojure.lang.LazySeq.sval(LazySeq.java:42) ~[clojure-1.10.0.jar:na]

I think the exceptions I was seeing earlier were of a similar nature

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

Ah, well that looks like there's a problem in the MongoDB setup process--it's not accepting connections. Likely a race condition between the code and MongoDB itself, if it's sporadic. Maybe there needs to be some additional health checks during db/setup!...

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

From the dockerfile, I can see that the docker image is based on this Debian docker image : https://github.com/jgoerzen/docker-debian-base-standard

I am not sure I understand when you say what version of Debian worked before, and what version it works with now. Do you mean what mongo debian distro, which the code is pulling from https://repo.mongodb.org/apt/debian/dists/stretch/mongodb-org/4.2/main/binary-amd64/ ?

If I can help with updating the README, do let me know I can do that.

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

Ah, well that looks like there's a problem in the MongoDB setup process--it's not accepting connections. Likely a race condition between the code and MongoDB itself, if it's sporadic. Maybe there needs to be some additional health checks during db/setup!...

Do you mean something like

echo 'db.runCommand("ping").ok' | mongo localhost:27017/test --quiet

To check if the mongo service is up and running ?

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

More that I'm not sure whether this ever worked with the Docker setup, and if you're having problems, it might be because this version of Jepsen and the version of Mongo it installs were intended to run on, say, Jessie, when the Docker env is giving you, say, Bullseye. I honestly forget, so much has happened this year. I'd love to go dig into this for you but I am scrambling to keep up with waaaay too much client stuff right now!

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

echo 'db.runCommand("ping").ok' | mongo localhost:27017/test --quiet

Maybe. I think the current code probably does its own health checks already... lemme check. Ah, yes, here it is:

; Wait for all nodes to be reachable
(.close (client/await-open node port))
(jepsen/synchronize test)

We've got blocking on individual node startup, blocking on cluster join, blocking on elections, blocking on the cluster, blocking on the primary. That is, apparently, not enough blocking! This isn't just you: Mongo's... historically been difficult to set up reliably.

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

More that I'm not sure whether this ever worked with the Docker setup, and if you're having problems, it might be because this version of Jepsen and the version of Mongo it installs were intended to run on, say, Jessie, when the Docker env is giving you, say, Bullseye. I honestly forget, so much has happened this year. I'd love to go dig into this for you but I am scrambling to keep up with waaaay too much client stuff right now!

Aah, that makes sense. FWIW, the debian version that the current jepsen's main branch sets up is buster.
If you do come across some pointers on what this was originally supposed to run on, let me know. I can look at the jepsen code as of the time this mongodb test suite was initially created. Maybe that can give some pointers on the debian version ?

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

Ooof, yeah, Again, I'm sorry. This is a holdover from an older time in Jepsen when Debian versions lasted (compared to the lifetime of a test) forever and were often cross-compatible: we never really established a convention around OS versioning. Now that people are trying to dredge up tests written n years ago (or even 7 months ago!), those assumptions don't always hold.

This is a good reminder to me to write more of that documentation, and start splitting out future jepsen.os.debian/os objects into specific versions.

It looks like this test uses jepsen 0.1.19, which... I think should be using Jessie. Jepsen 0.2.1 transitioned to Buster.

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

From this commit It seems the control node used ubuntu and the db nodes used stretch around the time these mongo tests were written.
Am I looking at this correctly ?

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

Oh, yeah, but that doesn't (and I am so sorry, I know this is confusing) mean this test was supposed to work with Docker. The docker directory was contributed by other people--I hadn't used it myself, and its maintainers drifted off to do other things, so it drifted behind. I test primarily using LXC and AWS, and was running Jessie at the time, I think. That's why this test was written for Jessie, and probably won't work with either the old or new docker setups, since they're for Stretch and Buster.

So, I think you've got two options here. One is if you get a Jessie environment going (are the mirrors still around?) you should be able to run the test as-is. The other is using Buster and figuring out how to port the test forward to Buster, which miiight be as simple as bumping the version of jepsen in project.clj to 0.2.1+.

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

Okay, I understand now. Thanks.

As regards the 2 options, I would say bringing the tests up to date is more fruitful in the longer run. I can give that a crack to see what else needs changing. Right off the bat, I think there are some code changes that might be needed.
Currently mongodb.clj seems to depend on [jepsen.generator.pure :as gen] which isn't there in jepsen/0.2.1

Strangely, in the source code, I see this namespace mentioned in the docs but only see it used in the dgraph code.
Where does this namespace come from in the latest jepsen code ?

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

Currently mongodb.clj seems to depend on [jepsen.generator.pure :as gen] which isn't there in jepsen/0.2.1

Ah, now THIS I actually have good docs for! https://github.com/jepsen-io/jepsen/releases/tag/0.2.0

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

(also be advised there's bug in 0.2.0 that might affect generators--best to jump straight to 0.2.1 I think)

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

Okay, so this morning I seem to be able to get the original SSH related exceptions rather frequently :

WARN [2020-12-04 03:17:54,150] jepsen node n4 - jepsen.control Encountered error with conn [:control "n4"]; reopening
java.lang.InterruptedException: sleep interrupted
        at java.base/java.lang.Thread.sleep(Native Method)
        at clj_ssh.ssh$ssh_exec.invokeStatic(ssh.clj:690)
        at clj_ssh.ssh$ssh_exec.invoke(ssh.clj:670)
        at clj_ssh.ssh$ssh.invokeStatic(ssh.clj:723)
        at clj_ssh.ssh$ssh.invoke(ssh.clj:699)
        at jepsen.control.SSHRemote.execute_BANG_(control.clj:331)
        at jepsen.control$ssh_STAR_$fn__3063.invoke(control.clj:172)
        at jepsen.control$ssh_STAR_.invokeStatic(control.clj:172)
        at jepsen.control$ssh_STAR_.invoke(control.clj:168)
        at jepsen.control$exec_STAR_.invokeStatic(control.clj:194)
        at jepsen.control$exec_STAR_.doInvoke(control.clj:191)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at clojure.core$apply.invokeStatic(core.clj:665)
        at clojure.core$apply.invoke(core.clj:660)
        at jepsen.control$exec.invokeStatic(control.clj:210)
        at jepsen.control$exec.doInvoke(control.clj:204)
        at clojure.lang.RestFn.invoke(RestFn.java:436)
        at jepsen.db$tcpdump$reify__3446.teardown_BANG_(db.clj:112)
        at jepsen.mongodb.db.ShardedDB.teardown_BANG_(db.clj:406)
        at jepsen.db$fn__3273$G__3269__3277.invoke(db.clj:11)
        at jepsen.db$fn__3273$G__3268__3282.invoke(db.clj:11)
        at clojure.core$partial$fn__5824.invoke(core.clj:2625)
        at jepsen.control$on_nodes$fn__3161.invoke(control.clj:430)

This is for node n4 but similar exceptions happen for all nodes.
A simple ssh n4 from the control node seems to work so there isn't an obvious problem with the docker cluster.
Any pointers for me to explore here ?

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

from mongodb.

Tsunaou avatar Tsunaou commented on July 24, 2024

@aphyr And what do you know, just as I went to run the tests again hoping to send you a stack trace, they worked ! 🍻 🙂
I will try running them again to see if there is some instability. Other than that if you see any obvious steps that I have missed, do let me know.
I ll paste the SSH related exceptions here as soon as I encounter them :)

Oh bro! It is exciting that you have delt with the problem that running mongodb jepsen test in docker-compose, even though the test may crash in some situations.
In my previous work, I rent some sever to run this test suite, which is expensive so I didn't go on.
You have done, and only done, two things to fix the bug right?

  1. Adding 2 more nodes
  2. Change the installation instructions in setup! function

I am interested to your work and it would be help if you could share you config and fixment. Thanks.

from mongodb.

aphyr avatar aphyr commented on July 24, 2024

I had a chance to go through the mongo code today and get everything fixed up for the lastest Jepsen and Debian Buster.

from mongodb.

rhishikeshj avatar rhishikeshj commented on July 24, 2024

@aphyr nice ! 😊 Would love to see that happen. Also I have opened a pull request making some of the changes for jepsen 0.2.1
Let me know if that's mergeable.

from mongodb.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.