Git Product home page Git Product logo

maelstrom's Introduction

Maelstrom

Two images generated by Maelstrom runs of a transactional workload: a Lamport diagram showing requests flowing to transaction coordinators and on to a KV store, and a serialization anomaly consisting of dependency edges forming a cycle between transactions.

Maelstrom is a workbench for learning distributed systems by writing your own. It uses the Jepsen testing library to test toy implementations of distributed systems. Maelstrom provides standardized tests for things like "a commutative set" or "a transactional key-value store", and lets you learn by writing implementations which those test suites can exercise. It's used as a part of a distributed systems workshop by Jepsen.

Maelstrom provides a range of tests for different kinds of distributed systems, built on top of a simple JSON protocol via STDIN and STDOUT. Users write servers in any language. Maelstrom runs those servers, sends them requests, routes messages via a simulated network, and checks that clients observe expected behavior. You want to write Plumtree in Bash? Byzantine Paxos in Intercal? Maelstrom is for you.

Maelstrom's tooling lets users experiment with simulated latency and message loss. Every test includes timeline visualizations of concurrency structure, statistics on messages exchanged, timeseries graphs to understand how latency, availability, and throughput respond to changing conditions, and Lamport diagrams so you can understand exactly how messages flow through your system. Maelstrom's checkers can verify sophisticated safety properties up to strict serializability, and generate intuitive, minimal examples of consistency anomalies.

Maelstrom can help you model how a system responds to different cluster sizes, network topologies, latency distributions, and faults like network partitions. Maelstrom also offers simulated services that you can use to build more complex systems.

It's built for testing toy systems, but don't let that fool you: Maelstrom is reasonably fast and can handle simulated clusters of 25+ nodes. On a 48-way Xeon, it can use 94% of all cores, pushing upwards of 60,000 network messages/sec.

Documentation

The Maelstrom Guide will take you through writing several different types of distributed algorithms using Maelstrom. We begin by setting up Maelstrom and its dependencies, write our own tiny echo server, and move on to more sophisticated workloads.

There are also several reference documents that may be helpful:

  • Protocol defines Maelstrom's "network" protocol, message structure, RPC semantics, and error handling.
  • Workloads describes the various kinds of workloads that Maelstrom can test, and define the messages involved in that particular workload.
  • Understanding Test Results explains how to interpret the various plots, data structures, and log files generated by a test.
  • Services discusses Maelstrom's built-in network services, which you can use as primitives in building more complex systems.

Design Overview

Maelstrom is a Clojure program which runs on the Java Virtual Machine. It uses Jepsen to generate operations, record a history of their results, and analyze what happens.

Writing "real" distributed systems involves a lot of busywork: process management, networking, and message serialization are complex, full of edge cases, and difficult to debug across languages. In addition, running a full cluster of virtual machines connected by a real IP network is tricky for many users. Maelstrom strips these problems away so you can focus on the algorithmic essentials: process state, transitions, and messages.

The "nodes" in a Maelstrom test are plain old binaries written in any language. Nodes read "network" messages as JSON from STDIN, write JSON "network" messages to STDOUT, and do their logging to STDERR. Maelstrom runs those nodes as processes on your local machine, and connects them via a simulated network. Maelstrom runs a collection of simulated network clients which make requests to those nodes, receive responses, and records a history of those operations. At the end of a test run, Maelstrom analyzes that history to identify safety violations.

This allows learners to write their nodes in whatever language they are most comfortable with, without having to worry about discovery, network communication, daemonization, writing their own distributed test harness, and so on. It also means that Maelstrom can perform sophisticated fault injection and trace analysis.

Maelstrom starts in maelstrom.core, which parses CLI operations and constructs a test map. It hands that off to Jepsen, which sets up the servers and Maelstrom services via maelstrom.db. Spawning binaries and handling their IO is done in maelstrom.process, and Maelstrom's internal services (e.g. lin-kv are defined in maelstrom.service. Jepsen then spawns clients depending on the workload (maelstrom.workload.*) and a nemesis (maelstrom.nemesis) to inject faults.

Messages between nodes are routed by maelstrom.net, and logged in maelstrom.journal. Clients send requests and parse responses via maelstrom.client, which also defines Maelstrom's RPC protocol.

At the end of the test, Jepsen checks the history using a checker built in maelstrom.core: workload-specific checkers are defined in their respective workload namespaces. Network statistics are computed in maelstrom.net.journal, and Lamport diagrams are generated by maelstrom.net.viz.

maelstrom.doc helps generate documentation based on maelstrom.client's registry of RPC types and workloads.

CLI Options

A full list of options is available by running java -jar maelstrom.jar test --help. The important ones are:

  • --workload NAME: What kind of workload should be run?
  • --bin SOME_BINARY: The program you'd like Maelstrom to spawn instances of
  • --node-count NODE-NAME: How many instances of the binary should be spawned?

To get more information, use:

  • --log-stderr: Show STDERR output from each node in the Maelstrom log
  • --log-net-send: Log messages as they are sent into the network
  • --log-net-recv: Log messages as they are received by nodes

To make tests more or less aggressive, use:

  • --time-limit SECONDS: How long to run tests for
  • --rate FLOAT: Approximate number of requests per second
  • --concurrency INT: Number of clients to run concurrently. Use 4n for 4 times the number of nodes.
  • --latency MILLIS: Approximate simulated network latency, during normal operations.
  • --latency-dist DIST: What latency distribution should Maelstrom use?
  • --nemesis FAULT_TYPE: A comma-separated list of faults to inject
  • --nemesis-interval SECONDS: How long between nemesis operations, on average

For broadcast tests, try

  • --topology TYPE: Controls the shape of the network topology Jepsen offers to nodes

For transactional tests, you can control transaction generation using

  • --max-txn-length INT: The maximum number of operations per transaction
  • --key-count INT: The number of concurrent keys to work with
  • --max-writes-per-key INT: How many unique write operations to generate per key.

SSH options are unused; Maelstrom runs entirely on the local node.

Troubleshooting

Running ./maelstrom complains it's missing maelstrom.jar

You probably cloned this repository or downloaded the source and didn't compile it. Download the compiled release tarball instead; you'll find it on the GitHub release page.

If you want to run directly from source, you'll need the Leiningen build system. Instead of ./maelstrom ..., run lein run ....

Raft node processes still alive after maelstrom run

You may find that node processes maelstrom starts are not terminating at the end of a run as expected. To address this, make sure that if the process passed as --bin forks off a new process, it also handles the process' termination.

Example

In bin/raft

#!/bin/bash

# Forks a new process.
java -jar target/raft.jar

In bin/raft

#!/bin/bash

# Replaces the shell without creating a new process.
exec java -jar target/raft.jar

License

Copyright ยฉ 2017, 2020--2022 Kyle Kingsbury, Kit Patella, & Jepsen, LLC

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

maelstrom's People

Contributors

aalekhpatel07 avatar aphyr avatar aroussel-data avatar avinassh avatar benbjohnson avatar bgreenlee avatar bilalsaad avatar dependabot[bot] avatar drpacman avatar dvguruprasad avatar ekzhang avatar igorperikov avatar jemc avatar llimllib avatar metanivek avatar mkcp avatar nezteb avatar nrkirby avatar philippgille avatar pjambet avatar rhishikesh-helpshift avatar rhishikeshj avatar simbo1905 avatar sitano avatar stevexuereb avatar thelortex avatar thornycrackers avatar viktaur avatar vjuranek avatar ziyaddin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maelstrom's Issues

Injecting Faults - nemesis

I'm experimenting with partitioning with Raft protocol. Can anyone help me on how to induce partial/simplex faults where only one way communication is possible?

I see the option --nemesis FAULTS #{}, a comma separated list of faults to inject.

How is this option used? And what I should give in the comma separated list? usually "--nemesis partition" is provided to induce faults

"Map" with random key ordering would lead to excessive failure in Datomic chapter

Specifically for this https://github.com/jepsen-io/maelstrom/blob/main/doc/05-datomic/02-shared-state.md#database-as-value and I'm mostly just wondering if this should be mentioned in the tutorial. (you might also encounter this issue when doing training :D)

I'm using Rust and since the builtin HashMap type could reorder keys when growing, it could cause lots of failure when used blindly in CAS.

For example I'd have more than half failed transactions even for very few operations:

 :stats {:valid? true,
         :count 22,
         :ok-count 7,
         :fail-count 15,

For longer runs it's even worse so I didn't bother to finish any. Looking at the message diagram points me to believe this is due to ordering (of HashMap keys):

image

Luckily there's also a built-in type which is sorted by its keys and it solved the issue for me, so I would get results like this: (same with yet another type)

 :stats {:valid? true,
         :count 2045,
         :ok-count 1924,
         :fail-count 121,

Which is more in-line with the results in the tutorial.

Would it be possible to SIGINT/SIGTERM processes instead of SIGKILLING them?

It seems like nodes are sent a sigkill right away at termination (via Process.destroyForcefully. This makes it a little tricky to use things like in-process sampling profilers because the process is killed before the profile can be written.

It would be nicer if the node had its stdin closed/have a sigint and then was given a few seconds to quit before being forcibly killed.

It is pretty easy for the maelstrom implementation in a language to watch for stdin closure and break out of the Run().

Error: Unable to access jarfile lib/maelstrom.jar

Hello, I hope you are doing well. I just started the Gossip Glomers Echo Challenge, but I am running into an error.
I have installed Maelstrom and its prerequisites but when I hit the command ./maelstrom test -w echo --bin ~/go/bin/maelstrom-echo --node-count 1 --time-limit 10, I get the following error Error: Unable to access jarfile lib/maelstrom.jar

Screenshot:
Screenshot from 2023-02-28 13-35-01

Kafka workload never finishes analyzing

Hello.

I'm attempting to vet my solution for Fly's distributed system challenge #5a but maelstrom seems stuck analyzing: it's been doing so for 45 minutes now and has produced 1.3GB of log output.

It's dumping what looks to be internal state relevant to plotting; you can see it at the tail end of the gist at https://gist.github.com/chronoslynx/6e8ac74b11159ab07d6ad7ca1304e5ef. The log was full of output like

INFO [2023-03-05 11:26:42,178] clojure-agent-send-off-pool-80 - jepsen.tests.kafka :row [:g [:title ok send by process 2
[[:send "9" [2 2]]]] [:text {:x 48, :y 14, :style background: #f4ce90;} 2]]

I've replicated the issue with:

  • maelstrom's 0.2.2 release
  • maelstrom's 0.2.3 release
  • 0.2.3

I can reliably reproduce this issue with my code at https://github.com/chronoslynx/gossip-glomers/blob/main/kafka/main.go

Realesed JAR doesn't work with Java 11

I'm getting this error when trying to run JAR from releases:

Exception in thread "main" java.lang.ExceptionInInitializerError
[...] here long clj stacktrace [...]
Caused by: java.lang.ClassNotFoundException: javax.xml.bind.DatatypeConverter
        at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
        at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
        at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:398)
        at clojure.lang.RT.classForName(RT.java:2168)
        at clojure.lang.RT.classForNameNonLoading(RT.java:2181)
        at org.httpkit.server$loading__5569__auto____6381.invoke(server.clj:1)
        at org.httpkit.server__init.load(Unknown Source)
        at org.httpkit.server__init.<clinit>(Unknown Source)
        ... 98 more

According to SO the problem is removal of this library in new Java versions: In Java 11 they are completely removed from the JDK.

Assert on dest for init_ok message

Hi, I try to use your workbench for my small project and started with implementing echo service.
I got an assertion that a destination is invalid
java.lang.AssertionError: Assert failed: Invalid dest for message {:dest "c0", :id 0, :body {:type "init_ok", :msg_id 1, :in_reply_to 1}, :src "n1"}. And after several attempts I just stuck. In response I use src from init message.

Here is a log from the run.
Initialisation:

2021-03-10 22:12:50,547{GMT}	INFO	[jepsen test runner] jepsen.core: Test version 8c8d9286e436a5ae9c23e6e8e490520582418823 (plus uncommitted changes)
2021-03-10 22:12:50,549{GMT}	INFO	[jepsen test runner] jepsen.core: Command line:
lein run test -w echo --bin testnode --nodes n1 --time-limit 5 --log-net-send --log-net-recv --log-stderr
2021-03-10 22:12:50,761{GMT}	INFO	[jepsen test runner] jepsen.core: Running test:
{:remote #jepsen.control.SSHRemote{:session nil}
 :log-net-send true
 :node-count nil
 :max-txn-length 4
 :concurrency 1
 :db
 #object[maelstrom.db$db$reify__16747
         "0x4c683777"
         "maelstrom.db$db$reify__16747@4c683777"]
 :max-writes-per-key 16
 :leave-db-running? false
 :name "echo"
 :logging-json? false
 :net-journal #object[clojure.lang.Atom "0x28bf7bea" {:status :ready, :val []}]
 :start-time
 #object[org.joda.time.DateTime "0x44881fcc" "2021-03-10T22:12:50.000+01:00"]
 :nemesis-interval 10
 :net
 #object[maelstrom.net$jepsen_adapter$reify__15560
         "0xe7423bb"
         "maelstrom.net$jepsen_adapter$reify__15560@e7423bb"]
 :client
 #object[maelstrom.workload.echo$client$reify__17341
         "0x270663e2"
         "maelstrom.workload.echo$client$reify__17341@270663e2"]
 :barrier
 #object[java.util.concurrent.CyclicBarrier
         "0x3987bfc8"
         "java.util.concurrent.CyclicBarrier@3987bfc8"]
 :log-stderr true
 :pure-generators true
 :ssh {:dummy? true}
 :rate 5
 :checker
 #object[jepsen.checker$compose$reify__8612
         "0x31084f37"
         "jepsen.checker$compose$reify__8612@31084f37"]
 :argv
 ("test"
  "-w"
  "echo"
  "--bin"
  "testnode"
  "--nodes"
  "n1"
  "--time-limit"
  "5"
  "--log-net-send"
  "--log-net-recv"
  "--log-stderr")
 :nemesis
 (jepsen.nemesis.ReflCompose
  {:fm {:start-partition 0,
        :stop-partition 0,
        :kill 1,
        :start 1,
        :pause 1,
        :resume 1},
   :nemeses [#unprintable "jepsen.nemesis.combined$partition_nemesis$reify__17019@1ececb56"
             #unprintable "jepsen.nemesis.combined$db_nemesis$reify__17000@a43a21e"]})
 :active-histories
 #object[clojure.lang.Atom "0xebe3561" {:status :ready, :val #{}}]
 :nodes ["n1"]
 :test-count 1
 :latency {:mean 0, :dist :constant}
 :bin "testnode"
 :generator
 (jepsen.generator.TimeLimit
  {:limit 5000000000,
   :cutoff nil,
   :gen (jepsen.generator.Any
         {:gens [(jepsen.generator.OnThreads {:f #{:nemesis}, :gen nil})
                 (jepsen.generator.OnThreads
                  {:f #object[clojure.core$complement$fn__5654
                              "0x3343997b"
                              "clojure.core$complement$fn__5654@3343997b"],
                   :gen (jepsen.generator.Stagger
                         {:dt 400000000,
                          :next-time nil,
                          :gen (jepsen.generator.EachThread
                                {:fresh-gen #object[maelstrom.workload.echo$workload$fn__17360
                                                    "0x14b3976"
                                                    "maelstrom.workload.echo$workload$fn__17360@14b3976"],
                                 :gens {}})})})]})})
 :log-net-recv true
 :os
 #object[jepsen.os$reify__2490 "0x79de9f90" "jepsen.os$reify__2490@79de9f90"]
 :time-limit 5
 :workload :echo
 :consistency-models [:strict-serializable]
 :topology :grid}

Log messages:

2021-03-10 22:12:50,800{GMT}	INFO	[jepsen test runner] jepsen.db: Tearing down DB
2021-03-10 22:12:50,813{GMT}	INFO	[jepsen test runner] jepsen.db: Setting up DB
2021-03-10 22:12:50,830{GMT}	INFO	[jepsen node n1] maelstrom.service: Starting services: (lin-kv lin-tso lww-kv seq-kv)
2021-03-10 22:12:50,837{GMT}	INFO	[jepsen node n1] maelstrom.db: Setting up n1
2021-03-10 22:12:50,840{GMT}	INFO	[jepsen node n1] maelstrom.process: launching testnode nil
2021-03-10 22:12:50,877{GMT}	INFO	[jepsen node n1] maelstrom.net: :send {:dest "n1", :body {:type "init", :node_id "n1", :node_ids ["n1"], :msg_id 1}, :src "c0", :id 0}
2021-03-10 22:12:50,884{GMT}	INFO	[n1 stdin] maelstrom.net: :recv {:dest "n1", :body {:type "init", :node_id "n1", :node_ids ["n1"], :msg_id 1}, :src "c0", :id 0}
2021-03-10 22:12:50,918{GMT}	INFO	[n1 stderr] maelstrom.process: 2021-03-10T22:12:50+0100 debug error : Get JSON {"dest":"n1","body":{"type":"init","node_id":"n1","node_ids":["n1"],"msg_id":1},"src":"c0","id":0}
2021-03-10 22:12:50,918{GMT}	INFO	[n1 stderr] maelstrom.process: 2021-03-10T22:12:50+0100 debug error : Response JSON {"dest":"c0","id":0,"body":{"type":"init_ok","msg_id":1,"in_reply_to":1},"src":"n1"}
2021-03-10 22:13:00,943{GMT}	INFO	[jepsen node n1] maelstrom.db: Tearing down n1
2021-03-10 22:13:00,970{GMT}	WARN	[n1 stdout] maelstrom.process: Error!
java.lang.AssertionError: Assert failed: Invalid dest for message {:dest "c0", :id 0, :body {:type "init_ok", :msg_id 1, :in_reply_to 1}, :src "n1"}
(get queues (:dest m))

Suggestion: A better API for `RPC` method in Go

I used the async RPC method in the broadcast challenge. One of the issues I faced was keeping track of the status of calls.

I wanted to maintain a list of messages I had sent. If the node had sent the message successfully, I would delete it from the list. If not, I would retry again.

Once I sent a message, I wanted to know if I had gotten any response or if the call had failed. The RPC method adds the auto-incremented message-id, but it does not return to the caller, but the returned status contains the message id. So it was painful to manage this.

Here is what I did:

  1. Generate a random ID and add this to HandlerFunc's state (the callback method)
  2. Send the message, save the ID in my local state
  3. When the callback is called, remove the ID from the state
  4. After X minutes, if my ID is still in the state, then resend the message

Sample code:

// this function gives a stateful HandlerFunc 
// which stores the msgId with it
//
// when this method is called, then it knows for which
// message it was called
rpcHandler := func(msgId string) maelstrom.HandlerFunc {
	return func(msg maelstrom.Message) error {
		// do something with the msgId
                // store.Remove(msgId)
		return nil
	}
}

// generate a msgId for this RPC call
msgId, _ := uuid.NewRandom()
// store this msgId in a map
// store.Save(msgId)
if err := n.RPC(nodId, msgBody, rpcHandler(msgId.String())); err != nil {
	return false
}

We can improve this by a great deal by making the RPC method return the msgId of the message it had sent. The change in the go library is minimal, but for the user, the code would be something like the following:

// the handler function does not require to be stateful anymore
someHandlerFunc := func(msg maelstrom.Message) error {
	var body map[string]any
	if err := json.Unmarshal(msg.Body, &body); err != nil {
		return err
	}
	// body: map[in_reply_to:1 type:broadcast_ok]
	if msgId, ok := body["in_reply_to"]; ok {
		// do something with the msgId
		// store.Remove(msgId)
	}
	return nil
}


if msgId, err := n.RPC(nodId, msgBody, someHandlerFunc); err != nil {
	return false
}
// store this msgId in a map
// store.Save(msgId)

The change in the library:

- func (n *Node) RPC(dest string, body any, handler HandlerFunc) error { ... }
+ func (n *Node) RPC(dest string, body any, handler HandlerFunc) (int, error) { ... }

It is a breaking change, but the developer experience this change provides makes it worth it. Are you open to considering such a change?

seq-kv reads never return final state for most recent write/cas

I'm doing the gossip glomers g-counter challenge at the moment in which you implement g-counter using seq-kv. Below is a spoiler of my solution, so just a warning for anyone who doesn't wish to be spoiled.

The solution I went for is that each node maintains its own sum under its own key in seq-kv, that means that if we have 3 nodes, then we will have 3 keys in seq-kv each storing the sum of its own node. On a background thread, each node repeatedly issues read requests for all 3 keys in seq-kv, and so my thinking was that after enough reads they should eventually get the latest value. This approach is also mentioned as one that should work in the other issue here.

What I am observing is that the most recent write or cas operation that occurs against any key in the seq-kv will never be read by other nodes in the system. I have attached at the bottom of the issue a log of the maelstrom output with --log-net-send turned on to show proof of this happening. If there is a better format to debug this data that you want me to upload, let me know. In this log I ran the g-counter workload against 2 nodes with a rate of 20 and a time limit of 1. The most recent cas/write message that occurs is: {:id 90, :src "n0", :dest "seq-kv", :body {:type "cas", :msg_id 28, :key "n0", :from 10, :to 14}}. After this, n1 sends a read for key "n0" 19 times, all of which return a value of 10 despite the fact that the cas sent from n0 changed it to 14.

I know according to the definition of sequential consistency that this is an allowed outcome, but I would have expected that eventually it would return the latest value. Sending noop cas's (e.g. from current value to current value) from all the nodes doesn't seem to fix this either. The solution I ended up getting to work was to set up another background thread on each node that just writes a random number to an unrelated key in seq-kv which seems to cause the reads to eventually return the latest values for each node.

maelstrom-seq-kv-no-progress.txt

`txn-list-append` workload fails to detect invalid return for read operations

I had a typo in my code which caused all txn_ok response to use "append" for all functions. Such as this:

INFO [2021-04-13 23:32:29,761] jepsen worker 0 - jepsen.util 0	:invoke	:txn	[[:append 224 6] [:r 226 nil] [:r 227 nil] [:r 226 nil]]
INFO [2021-04-13 23:32:29,761] jepsen worker 0 - jepsen.util 0	:ok	:txn	[[:append 224 6] [:append 226 [1 2 3 4 5 6 7 8 9]] [:append 227 [1 2 3 4 5 6 8 9]] [:append 226 [1 2 3 4 5 6 7 8 9]]]

However in this case maelstrom considers the run valid:

INFO [2021-04-13 23:32:29,998] jepsen worker 1 - jepsen.util 1	:invoke	:txn	[[:r 229 nil] [:r 229 nil]]
INFO [2021-04-13 23:32:29,998] jepsen worker 1 - jepsen.util 1	:ok	:txn	[[:append 229 [1 3 5 6]] [:append 229 [1 3 5 6]]]
INFO [2021-04-13 23:32:30,005] jepsen node n1 - maelstrom.db Tearing down n1
INFO [2021-04-13 23:32:30,005] jepsen node n0 - maelstrom.db Tearing down n0
...
 :workload {:valid? true},
 :valid? true}


Everything looks good! ใƒฝ(โ€˜ใƒผ`)ใƒŽ

I noticed this when I tried a multi-node test when only finished the single node part, so found it weird that I can successfully pass the test. When I fix my typo it now correctly identifies my single node code to fail for multi node test.

It seems this is a deeper issue in Jepsen/Elle? (or intended behaviour?)

Runing the demo js script does not work on Mac OS

Run Command: ./maelstrom test -w echo --bin demo/js/echo.js --time-limit 5

Error:

WARN [2023-12-05 15:35:38,456] jepsen test runner - jepsen.core Test crashed!
java.io.IOException: Cannot run program "/Users/user1/Workspace/maelstrom/demo/js/echo.js" (in directory "/var/folders/x5/jmppkfcx71zd921s1dtytdyw0000gn/T"): error=2, No such file or directory
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
	at maelstrom.process$start_node_BANG_.invokeStatic(process.clj:199)
	at maelstrom.process$start_node_BANG_.invoke(process.clj:168)
	at maelstrom.db$db$reify__16142.setup_BANG_(db.clj:34)
	at jepsen.db$fn__8729$G__8723__8733.invoke(db.clj:12)
	at jepsen.db$fn__8729$G__8722__8738.invoke(db.clj:12)
	at clojure.core$partial$fn__5908.invoke(core.clj:2642)
	at jepsen.control$on_nodes$fn__8599.invoke(control.clj:314)
	at clojure.lang.AFn.applyToHelper(AFn.java:154)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at dom_top.core$real_pmap_helper$build_thread__211$fn__212.invoke(core.clj:163)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
	at clojure.lang.RestFn.invoke(RestFn.java:425)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.RestFn.applyTo(RestFn.java:132)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
	at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
	at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
	... 31 common frames omitted
ERROR [2023-12-05 15:35:38,461] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.io.IOException: Cannot run program "/Users/user1/Workspace/maelstrom/demo/js/echo.js" (in directory "/var/folders/x5/jmppkfcx71zd921s1dtytdyw0000gn/T"): error=2, No such file or directory
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
	at maelstrom.process$start_node_BANG_.invokeStatic(process.clj:199)
	at maelstrom.process$start_node_BANG_.invoke(process.clj:168)
	at maelstrom.db$db$reify__16142.setup_BANG_(db.clj:34)
	at jepsen.db$fn__8729$G__8723__8733.invoke(db.clj:12)
	at jepsen.db$fn__8729$G__8722__8738.invoke(db.clj:12)
	at clojure.core$partial$fn__5908.invoke(core.clj:2642)
	at jepsen.control$on_nodes$fn__8599.invoke(control.clj:314)
	at clojure.lang.AFn.applyToHelper(AFn.java:154)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at dom_top.core$real_pmap_helper$build_thread__211$fn__212.invoke(core.clj:163)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
	at clojure.lang.RestFn.invoke(RestFn.java:425)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.RestFn.applyTo(RestFn.java:132)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
	at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
	at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
	... 31 common frames omitted

Server Config Estimation For Following Spec.

Hi, I am trying to create 5K nodes
./maelstrom test -w echo --bin maelstrom-echo --time-limit 300 --node-count 5000 --rate 5000 --concurrency 1n

I was running plain maelstrom-echo, and here was the following, resource utilization was getting reached, is there any estimation for this amount of node instances to run, what is the config required?

Here are the following logs and info about system .

java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
        at java.base/java.lang.Thread.start0(Native Method)
        at java.base/java.lang.Thread.start(Thread.java:798)
        at dom_top.core$real_pmap_helper.invokeStatic(core.clj:186)
        at dom_top.core$real_pmap_helper.invoke(core.clj:150)
        at jepsen.util$real_pmap.invokeStatic(util.clj:76)
        at jepsen.util$real_pmap.invoke(util.clj:69)
        at jepsen.control$on_nodes.invokeStatic(control.clj:311)
        at jepsen.control$on_nodes.invoke(control.clj:299)
        at jepsen.control$on_nodes.invokeStatic(control.clj:304)
        at jepsen.control$on_nodes.invoke(control.clj:299)
        at jepsen.core$run_BANG_$fn__13071$fn__13074.invoke(core.clj:395)
        at jepsen.core$run_BANG_$fn__13071.invoke(core.clj:394)
        at jepsen.core$run_BANG_.invokeStatic(core.clj:392)
        at jepsen.core$run_BANG_.invoke(core.clj:318)
        at jepsen.cli$single_test_cmd$fn__13951.invoke(cli.clj:396)
        at jepsen.cli$run_BANG_.invokeStatic(cli.clj:329)
        at jepsen.cli$run_BANG_.invoke(cli.clj:258)
        at maelstrom.core$_main.invokeStatic(core.clj:269)
        at maelstrom.core$_main.doInvoke(core.clj:267)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at maelstrom.core.main(Unknown Source)

I have modified the maelstrom command as following
exec java -Xss256M -Xms54G -Xmx54G -XX:ThreadStackSize=136

system specs 
ubunutu 20.04
Ram 60GB
Core 12
Disk Sapce 400G
/proc/sys/kernel# cat threads-max
470681

root@e2e-30-74:~# ulimit -aH core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 235340 max locked memory (kbytes, -l) 65536 max memory size (kbytes, -m) unlimited open files (-n) 1048576 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 235340 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

Chapter 5 error reply showing up in history as ":unknown"

Is this because I'm populating msg_id in the reply body?

2023-01-10 10:06:45,020{GMT}    INFO    [n0 stderr] maelstrom.process: 2023/01/10 10:06:45 Failed to process txn request {c4 n0 {txn 35 [[append 6 5] [append 11 13] [append 5 2] [r 6 <nil>]]}}: [[append 6 5] [append 11 13] [append 5 2] [r 6 [1 2 3 4 5]]], RPC txn-conflict error (due to service response code 22)

2023-01-10 10:06:45,020{GMT}    INFO    [n0 stderr] maelstrom.process: 2023/01/10 10:06:45 Wrote error reply: {"src":"n0","dest":"c4","body":{"type":"error","msg_id":99,"in_reply_to":35,"code":30,"text":"CAS failed"}}

2023-01-10 10:06:45,022{GMT}    INFO    [jepsen worker 0] jepsen.util: 0        :info   :txn    [[:append 6 5] [:append 11 13] [:append 5 2] [:r 6 nil]]        [:unknown "CAS failed"]

Update Jepsen to 0.3.4 from 0.3.1

Currently Maelstrom is using jepsen version 0.3.1, however since then there have been a number of performance and other improvements up to version 0.3.4.

I tried this locally already and it seems to work without any noticeable problems

Python client library

Hi there! Thanks so much for making Maelstrom. I just ran a small workshop at a systems reading group that I organize at Harvard, and we all worked on the Gossip Glomers challenges together on a shared remote computer. It was a lot of fun :)

image

I wrote a Python client library for Maelstrom (via asyncio) for the purposes of the workshop, since I did a quick poll, and most of the members of our reading group didn't know Go or JavaScript but were still interested in learning about distributed systems. I thought it might be helpful for others who are more familiar with Python as well. Would you be open to a pull request where I add this library to the demo/ folder for others to use?

https://gist.github.com/ekzhang/65310f0260c03d4ec340e0daf0ccb9d5#file-_maelstrom-py

Kafka workflow crashes with NullPointerException

If maelstrom gets a binary that ignores all the kafka commands, for example like this modified demo/clojure/echo.clj, then maelstrom fails with a NullPointerException rather than some useful errors.

For other workflows, eg broadcast/g-counter, maelstroms fails with some timeout exceptions that are more understandable.

Is this the correct way to fail for the kafka workflow?

$ lein run -- test -w kafka --bin demo/clojure/lenient_echo.clj # see gist
[...]
INFO [2023-09-13 10:24:40,229] jepsen node n1 - maelstrom.net Shutting down Maelstrom network
INFO [2023-09-13 10:24:40,233] jepsen test runner - jepsen.core Analyzing...
INFO [2023-09-13 10:24:40,294] clojure-agent-send-off-pool-5 - jepsen.tests.kafka Wrote /Users/ndr/coding/maelstrom-src/maelstrom/store/kafka/20230913T102325.079+0100/consume-counts.edn
WARN [2023-09-13 10:24:40,295] clojure-agent-send-off-pool-5 - jepsen.checker Error while checking history:
java.lang.NullPointerException: Cannot invoke "java.lang.Number.doubleValue()" because "x" is null
        at clojure.lang.Numbers.divide(Numbers.java:3899)
        at jepsen.util$nanos__GT_secs.invokeStatic(util.clj:386)
        at jepsen.util$nanos__GT_secs.invoke(util.clj:386)
        at clojure.core$update.invokeStatic(core.clj:6231)
        at clojure.core$update.invoke(core.clj:6223)
        at jepsen.tests.kafka$checker$reify__19272.check(kafka.clj:2080)
        at jepsen.checker$check_safe.invokeStatic(checker.clj:86)
        at jepsen.checker$check_safe.invoke(checker.clj:79)
        at jepsen.checker$compose$reify__10709$fn__10711.invoke(checker.clj:102)
        at clojure.core$pmap$fn__8552$fn__8553.invoke(core.clj:7089)
        at clojure.core$binding_conveyor_fn$fn__5823.invoke(core.clj:2047)
        at clojure.lang.AFn.call(AFn.java:18)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1623)
INFO [2023-09-13 10:24:40,428] jepsen test runner - jepsen.core Analysis complete
INFO [2023-09-13 10:24:40,434] jepsen results - jepsen.store Wrote /Users/ndr/coding/maelstrom-src/maelstrom/store/kafka/20230913T102325.079+0100/results.edn
INFO [2023-09-13 10:24:40,456] jepsen test runner - jepsen.core {:perf {:latency-graph {:valid? true},
        :rate-graph {:valid? true},
        :valid? true},
 :timeline {:valid? true},
 :exceptions {:valid? true},
 :stats {:valid? false,
         :count 67,
         :ok-count 0,
         :fail-count 11,
         :info-count 56,
         :by-f {:assign {:valid? false,
                         :count 11,
                         :ok-count 0,
                         :fail-count 11,
                         :info-count 0},
                :crash {:valid? false,
                        :count 7,
                        :ok-count 0,
                        :fail-count 0,
                        :info-count 7},
                :poll {:valid? false,
                       :count 20,
                       :ok-count 0,
                       :fail-count 0,
                       :info-count 20},
                :send {:valid? false,
                       :count 29,
                       :ok-count 0,
                       :fail-count 0,
                       :info-count 29}}},
 :availability {:valid? true, :ok-fraction 0.0},
 :net {:all {:send-count 70,
             :recv-count 70,
             :msg-count 70,
             :msgs-per-op 1.0447761},
       :clients {:send-count 70, :recv-count 70, :msg-count 70},
       :servers {:send-count 0,
                 :recv-count 0,
                 :msg-count 0,
                 :msgs-per-op 0.0},
       :valid? true},
 :workload {:valid? :unknown,
            :error "java.lang.NullPointerException: Cannot invoke \"java.lang.Number.doubleValue()\" because \"x\" is null\n at clojure.lang.Numbers.divide (Numbers.java:3899)\n    jepsen.util$nanos__GT_secs.invokeStatic (util.clj:386)\n    jepsen.util$nanos__GT_secs.invoke (util.clj:386)\n    clojure.core$update.invokeStatic (core.clj:6231)\n    clojure.core$update.invoke (core.clj:6223)\n    jepsen.tests.kafka$checker$reify__19272.check (kafka.clj:2080)\n    jepsen.checker$check_safe.invokeStatic (checker.clj:86)\n    jepsen.checker$check_safe.invoke (checker.clj:79)\n    jepsen.checker$compose$reify__10709$fn__10711.invoke (checker.clj:102)\n    clojure.core$pmap$fn__8552$fn__8553.invoke (core.clj:7089)\n    clojure.core$binding_conveyor_fn$fn__5823.invoke (core.clj:2047)\n    clojure.lang.AFn.call (AFn.java:18)\n    java.util.concurrent.FutureTask.run (FutureTask.java:317)\n    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1144)\n    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:642)\n    java.lang.Thread.run (Thread.java:1623)\n"},
 :valid? false}


Analysis invalid! (๏พ‰เฒฅ็›Šเฒฅ๏ผ‰๏พ‰ โ”ปโ”โ”ป

Java 1.8 support

The docs seem to suggest Java 1.8 is supported, but I get errors when attempting to use the pre-built binary with openjdk 1.8.0_362. This stackoverflow post suggests that perhaps the true minimum Java version is Java 11?

$ java -version
openjdk version "1.8.0_362"
OpenJDK Runtime Environment (build 1.8.0_362-b08)
OpenJDK 64-Bit Server VM (build 25.362-b08, mixed mode)
$ ./maelstrom 
Exception in thread "main" java.lang.UnsupportedClassVersionError: ch/qos/logback/classic/spi/LogbackServiceProvider has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0

Specialized nodes

I was interesting in modelling Apache BookKeeper, to compare the experience of my modelling it with TLA+. One slight difficulty with BK is that there are 3 node types: metadata_store (zk), bookie (storage node) and bk_client (serving layer where all the consensus logic is). The bk_clients are the only externally visible nodes of the protocol, all requests go to the bk_clients which in turn interact with the metadata_store and bookies.

Currently all nodes are treated the same by Maelstrom, meaning I will need to use proxying of messages, and use node ID to determine node type. When a node initializes I could ensure that the first is a metadata_store, the second two are bk_clients and the rest are bookies. Metadata_store and bookie nodes will need to proxy maelstrom messages to bk_client nodes. But this would mean adding some communication between nodes that does not exist in the real protocol, so each node would always be able to forward messages to the right node.

An alternative is allow node specialization, or at least, mark which nodes form the public API surface of the protocol, in order to avoid this proxying. For my purposes, simply restricting the public API calls to particular nodes would be enough. Using node ID to determine node type is not an issue for me.

Fly.io gossip glomers challenge 1 doesn't work with a binary file generated from a js file.

I'm using the following command to attempt the challenge 1:
./maelstrom test -w echo --bin "../node_solutions/echo/maelstrom-echo" node-count 1 --time-limit 10

What I observe is the following error (I've attached the relevant logs (zipped) for your reference too.
20240730T215850.374+0530.zip)

clojure.lang.ExceptionInfo: Node n1 crashed with exit status 1. Before crashing, it wrote to STDOUT:



And to STDERR:

node:events:515
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE: address already in use :::3001
    at Server.setupListenHandle [as _listen2] (node:net:1422:16)
    at listenInCluster (node:net:1470:12)
    at Server.listen (node:net:1558:7)
    at Function.listen (/snapshot/echo/node_modules/express/lib/application.js:635:24)
    at Object.<anonymous> (/snapshot/echo/out.js)
    at Module._compile (pkg/prelude/bootstrap.js:1926:22)
    at Module._extensions..js (node:internal/modules/cjs/loader:1166:10)
    at Module.load (node:internal/modules/cjs/loader:988:32)
    at Module._load (node:internal/modules/cjs/loader:834:12)
    at Function.runMain (pkg/prelude/bootstrap.js:1979:12)
Emitted 'error' event on Server instance at:
    at emitErrorNT (node:net:1449:8)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)
    at process.runNextTicks [as _tickCallback] (node:internal/process/task_queues:64:3)
    at Function.runMain (pkg/prelude/bootstrap.js:1980:13)
    at node:internal/main/run_main_module:17:47 {
  code: 'EADDRINUSE',
  errno: -48,
  syscall: 'listen',
  address: '::',
  port: 3001
}

Node.js v18.5.0

Full STDERR logs are available in /Users/aca123321/Desktop/Personal/Fly io distriibuted systems challenges/maelstrom/store/echo/20240730T215850.374+0530/node-logs/n1.log
        at slingshot.support$stack_trace.invoke(support.clj:201)
        at maelstrom.process$stop_node_BANG_.invokeStatic(process.clj:239)
        at maelstrom.process$stop_node_BANG_.invoke(process.clj:217)
        at maelstrom.db$db$reify__16142.teardown_BANG_(db.clj:75)
        at jepsen.db$fn__8744$G__8725__8748.invoke(db.clj:12)
        at jepsen.db$fn__8744$G__8724__8753.invoke(db.clj:12)
        at clojure.core$partial$fn__5908.invoke(core.clj:2642)
        at jepsen.control$on_nodes$fn__8599.invoke(control.clj:314)
        at clojure.lang.AFn.applyToHelper(AFn.java:154)
        at clojure.lang.AFn.applyTo(AFn.java:144)
        at clojure.core$apply.invokeStatic(core.clj:667)
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
        at clojure.lang.RestFn.applyTo(RestFn.java:142)
        at clojure.core$apply.invokeStatic(core.clj:671)
        at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
        at clojure.lang.RestFn.invoke(RestFn.java:408)
        at dom_top.core$real_pmap_helper$build_thread__211$fn__212.invoke(core.clj:163)
        at clojure.lang.AFn.applyToHelper(AFn.java:152)
        at clojure.lang.AFn.applyTo(AFn.java:144)
        at clojure.core$apply.invokeStatic(core.clj:667)
        at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
        at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at clojure.lang.AFn.applyToHelper(AFn.java:156)
        at clojure.lang.RestFn.applyTo(RestFn.java:132)
        at clojure.core$apply.invokeStatic(core.clj:671)
        at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
        at clojure.lang.RestFn.invoke(RestFn.java:397)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.base/java.lang.Thread.run(Thread.java:1583)```
        
               
        

Running test yields `NoSuchFileException`

I'm trying to package maelstrom for myself using Nix but using the packaged version always crashes with the following backtrace:

WARN [2023-04-14 10:39:06,433] jepsen test runner - jepsen.core Test crashed!
java.nio.file.NoSuchFileException: store/echo/20230414T103906.375+0200/test.jepsen
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:357)
        at jepsen.store.format$open.invokeStatic(format.clj:407)
        at jepsen.store.format$open.invoke(format.clj:402)
        at jepsen.core$run_BANG_.invokeStatic(core.clj:388)
        at jepsen.core$run_BANG_.invoke(core.clj:318)
        at jepsen.cli$single_test_cmd$fn__13951.invoke(cli.clj:396)
        at jepsen.cli$run_BANG_.invokeStatic(cli.clj:329)
        at jepsen.cli$run_BANG_.invoke(cli.clj:258)
        at maelstrom.core$_main.invokeStatic(core.clj:269)
        at maelstrom.core$_main.doInvoke(core.clj:267)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at maelstrom.core.main(Unknown Source)
ERROR [2023-04-14 10:39:06,434] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.nio.file.NoSuchFileException: store/echo/20230414T103906.375+0200/test.jepsen
        at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:106)
        at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
        at java.base/sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:181)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:298)
        at java.base/java.nio.channels.FileChannel.open(FileChannel.java:357)
        at jepsen.store.format$open.invokeStatic(format.clj:407)
        at jepsen.store.format$open.invoke(format.clj:402)
        at jepsen.core$run_BANG_.invokeStatic(core.clj:388)
        at jepsen.core$run_BANG_.invoke(core.clj:318)
        at jepsen.cli$single_test_cmd$fn__13951.invoke(cli.clj:396)
        at jepsen.cli$run_BANG_.invokeStatic(cli.clj:329)
        at jepsen.cli$run_BANG_.invoke(cli.clj:258)
        at maelstrom.core$_main.invokeStatic(core.clj:269)
        at maelstrom.core$_main.doInvoke(core.clj:267)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at maelstrom.core.main(Unknown Source)

I can't figure out what file is actually missing, nowhere can I find a reference to the given path and strace on a successful run doesn't mention it either.

Packaging maelstrom crates /bin/maelstrom with the following content (nix store paths have been minimsed to what matters):

#!/bin/bash

programDir="/lib"
cd "$programDir"
exec "/bin/java" -jar "$programDir/maelstrom.jar" "$@"

lib/maelstrom.jar is moved to /lib/maelstrom.jar.

Any ideas?

Wonky error when missing newlines between messages

I tried the "echo" challenge from the fly-io site. Good idea to have a simple echo things to sort out the basic setup.

So one program I tried was the following Racket code:

#lang racket/base

(require json)
(require racket/match)
(require racket/port)

(define (main)
  (for ([h (in-producer read-json eof)])
    (match h
      [(hash-table ('dest dest)
                   ('src src)
                   ('body body)
                   (k v) ...)
       (match body
         [(hash-table ('type "echo")
                      ('msg_id msg_id)
                      (bk bv))
          (write-json
           (hash
            'src dest
            'dest src
            'body (hash 'type "echo_ok"
                        'msg_id msg_id
                        'in_reply_to msg_id
                        bk bv)))]
         [(hash-table ('type "init")
                      ('msg_id msg_id)
                      (bk bv) ...)
          (write-json
           (hash 'src dest
                 'dest src
                 'body
                 (hash 'type "init_ok"
                       'in_reply_to msg_id)))]
         [_
          #f])]
      [_
       #f])
    (flush-output)))

(module+ main
  (main))

The above code results in the following error:

java.lang.AssertionError: Assert failed: Invalid dest for message #maelstrom.net.message.Message{:id 1, :src "n0", :dest "c0", :body {:in_reply_to 1, :type "init_ok"}}

I sorta guessed by reading the spec that making my code emit a newline after each message would follow the stated spec better and might just work. And it did! Simply adding (displayln "") before (flush-output) in the above code makes it work and pass the tests.

The way to run the above code is, assuming Racket 8.8 is installed, to generate an executable using raco exe echo.rkt, then running the resulting echo binary with maelstrom.

PS I don't feel too strongly about this thing and wouldn't mind if the issue is closed as unimportant

Maybe it was intentionally left out in the docs but was curious

First of all,
The docs and this project is a marvel of engineering. This weekend finishing reading
the docs, will give it some time and read again. Reads like a novel, wish all full blown
books on distributed systems would be written like this. Good job sir!

Was wondering, what is the main usage of this ? Do you expect it to be used by those who write
and test replicated state machines with this text based protocol ? Or there is a way that a can wire a
db client and test its advertised isolation levels ?

Appreciate your hard work,
Good luck!

'Invalid dest' error encountered when implementing echo server init message reply, when newline is not appended

I'm working through chapter 2 in the Guide, implementing the echo server in Go.

I'm getting an Invalid dest error when my echo server sends a reply to the init message.

I'm using the src value from the init message as the dest value in the reply, so this is an unexpected error.

INFO [2022-09-14 10:55:14,768] jepsen node n1 - maelstrom.net Starting Maelstrom network
INFO [2022-09-14 10:55:14,769] jepsen test runner - jepsen.db Tearing down DB
INFO [2022-09-14 10:55:14,770] jepsen test runner - jepsen.db Setting up DB
INFO [2022-09-14 10:55:14,772] jepsen node n1 - maelstrom.service Starting services: (lin-kv lin-tso lww-kv seq-kv)
INFO [2022-09-14 10:55:14,774] jepsen node n1 - maelstrom.db Setting up n1
INFO [2022-09-14 10:55:14,775] jepsen node n1 - maelstrom.process launching ../chapter2-echo/cmd/echo/echo []
INFO [2022-09-14 10:55:14,794] n1 stderr - maelstrom.process Received init message from Maelstrom: {c0 n1 {1 init n1 [n1]}}
INFO [2022-09-14 10:55:14,795] n1 stderr - maelstrom.process Sending init message reply to Maelstrom: {n1 c0 {1 1 init_ok}}
INFO [2022-09-14 10:55:24,794] jepsen node n1 - maelstrom.db Tearing down n1
INFO [2022-09-14 10:55:24,795] n1 stderr - maelstrom.process Starting echo server with node id: n1 and message counter: 1
WARN [2022-09-14 10:55:24,808] n1 stdout - maelstrom.process Error!
java.lang.AssertionError: Assert failed: Invalid dest for message #maelstrom.net.message.Message{:id 1, :src "n1", :dest "c0", :body {:msg_id 1, :in_reply_to 1, :type "init_ok"}}
(get queues (:dest m))

txn-rw-register: non complete response payload validation

Hi,

When working on the txn-rw-register workload I noticed that my responses to the txn operation weren't conformant to the protocol because I was leaving out some of the operations of the transaction. An example:

[2023-04-06T05:34:52.648Z]: "[recv][from:c6][type:txn][msg_id:1] {\"id\":5,\"src\":\"c6\",\"dest\":\"n0\",\"body\":{\"txn\":[[\"r\",8,null],[\"w\",8,1],[\"r\",9,null]],\"type\":\"txn\",\"msg_id\":1}}"
[2023-04-06T05:34:52.648Z]: "[send] {\"dest\":\"c6\",\"src\":\"n0\",\"body\":{\"msg_id\":1,\"type\":\"txn_ok\",\"txn\":[[\"r\",8,null]],\"in_reply_to\":1}}"

Int the example above, I replied to message 1 with only the "read" operation. However maelstrom reported that all was good at the end of the execution. Is this a bug or is the checker lenient?

Permission denied when running `--bin` binary

Hello ๐Ÿ‘‹ ,

I'm checking out the Fly distributed systems challenges, and for the "Echo" demo, after compiling the Go code and trying to run the maelstrom tests, I get this error:

[...]
INFO [2023-02-25 14:58:30,921] jepsen node n0 - maelstrom.net Starting Maelstrom network
INFO [2023-02-25 14:58:30,922] jepsen test runner - jepsen.db Tearing down DB
INFO [2023-02-25 14:58:30,923] jepsen test runner - jepsen.db Setting up DB
INFO [2023-02-25 14:58:30,925] jepsen node n0 - maelstrom.service Starting services: (lin-kv lin-tso lww-kv seq-kv)
INFO [2023-02-25 14:58:30,925] jepsen node n0 - maelstrom.db Setting up n0
INFO [2023-02-25 14:58:30,926] jepsen node n0 - maelstrom.process launching /home/johndoe/path/to/fly-dist-sys/1_echo []
INFO [2023-02-25 14:58:31,930] jepsen node n0 - maelstrom.net Shutting down Maelstrom network
WARN [2023-02-25 14:58:31,933] jepsen test runner - jepsen.core Test crashed!
java.io.IOException: Cannot run program "/home/johndoe/path/to/fly-dist-sys/1_echo" (in directory "/tmp"): error=13, Permission denied
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
	at maelstrom.process$start_node_BANG_.invokeStatic(process.clj:199)
	at maelstrom.process$start_node_BANG_.invoke(process.clj:168)
	at maelstrom.db$db$reify__16142.setup_BANG_(db.clj:34)
	at jepsen.db$fn__8729$G__8723__8733.invoke(db.clj:12)
	at jepsen.db$fn__8729$G__8722__8738.invoke(db.clj:12)
	at clojure.core$partial$fn__5908.invoke(core.clj:2642)
	at jepsen.control$on_nodes$fn__8599.invoke(control.clj:314)
	at clojure.lang.AFn.applyToHelper(AFn.java:154)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at dom_top.core$real_pmap_helper$build_thread__211$fn__212.invoke(core.clj:163)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
	at clojure.lang.RestFn.invoke(RestFn.java:425)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.RestFn.applyTo(RestFn.java:132)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: error=13, Permission denied
	at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
	at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
	at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
	... 31 common frames omitted
ERROR [2023-02-25 14:58:31,938] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.io.IOException: Cannot run program "/home/johndoe/path/to/fly-dist-sys/1_echo" (in directory "/tmp"): error=13, Permission denied
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)
	at maelstrom.process$start_node_BANG_.invokeStatic(process.clj:199)
	at maelstrom.process$start_node_BANG_.invoke(process.clj:168)
	at maelstrom.db$db$reify__16142.setup_BANG_(db.clj:34)
	at jepsen.db$fn__8729$G__8723__8733.invoke(db.clj:12)
	at jepsen.db$fn__8729$G__8722__8738.invoke(db.clj:12)
	at clojure.core$partial$fn__5908.invoke(core.clj:2642)
	at jepsen.control$on_nodes$fn__8599.invoke(control.clj:314)
	at clojure.lang.AFn.applyToHelper(AFn.java:154)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
	at clojure.lang.RestFn.applyTo(RestFn.java:142)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
	at clojure.lang.RestFn.invoke(RestFn.java:408)
	at dom_top.core$real_pmap_helper$build_thread__211$fn__212.invoke(core.clj:163)
	at clojure.lang.AFn.applyToHelper(AFn.java:152)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$with_bindings_STAR_.invokeStatic(core.clj:1990)
	at clojure.core$with_bindings_STAR_.doInvoke(core.clj:1990)
	at clojure.lang.RestFn.invoke(RestFn.java:425)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.RestFn.applyTo(RestFn.java:132)
	at clojure.core$apply.invokeStatic(core.clj:671)
	at clojure.core$bound_fn_STAR_$fn__5818.doInvoke(core.clj:2020)
	at clojure.lang.RestFn.invoke(RestFn.java:397)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.io.IOException: error=13, Permission denied
	at java.base/java.lang.ProcessImpl.forkAndExec(Native Method)
	at java.base/java.lang.ProcessImpl.<init>(ProcessImpl.java:314)
	at java.base/java.lang.ProcessImpl.start(ProcessImpl.java:244)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1110)
	... 31 common frames omitted

The compiled Go binary, as well as maelstrom belong to johndoe user and group.

I'm running maelstrom like this: ./maelstrom test -w echo --bin ~/path/to/fly-dist-sys/1_echo --node-count 1 --time-limit 10

I'm on Linux (Fedora).

Am I doing anything wrong? Should permissions be set differently?

txn-rw-register workload does not detect g0 write cycles under read-uncommitted consistency

Maelstrom does not detect g0 (write-cycle) for me, like in the following history:

t3:           ["r",0,1],["w",0,10],                               ["r",0,3],["w",0,11]
t2: ["w",0,1],["r",0,1],           ["r",0,10],["w",0,2],["w",0,3],["r",0,3]

So t2 observes the read value of 3 at operation index 5 (0-indexed), while it just previously wrote 2 and 3 on top of ww-conflict (10). It's a G0 and maelstrom ignored it.

$ go build && ~/.../maelstrom test -w txn-rw-register --bin ./maelstrom-txn --node-count 1 --time-limit 20 --rate 10 --concurrency 10n --consistency-models read-uncommitted --max-txn-length 20 --max-writes-per-key 10000000 --key-count 1

2023/03/29 12:11:11 Received {c3 n0 {"txn":[["w",0,1],["r",0,null],["r",0,null],["w",0,2],["w",0,3],["r",0,null],["w",0,4],["r",0,null],["r",0,null],["r",0,null],["w",0,5],["w",0,6],["r",0,null],["w",0,7],["w",0,8],["r",0,null],["w",0,9]],"type":"txn","msg_id":1}}
910670418 + 0 = 1 [#2]  - write key 0, value 1, tx 2
991888336 - 0 = 0 [#2]
2023/03/29 12:11:11 Received {c4 n0 {"txn":[["r",0,null],["w",0,10],["r",0,null],["w",0,11]],"type":"txn","msg_id":1}}
997560527 - 0 = 0 [#3]
8713188 + 0 = 10 [#3] - write key 0, value 10, tx 3 (ww-edge)
17116025 - 0 = 0 [#2]
60262526 + 0 = 2 [#2] - write key 0, value 2, tx 2 (ww-edge - cycle with 3)
77438615 + 0 = 3 [#2] - write key 0, value 3, tx 2 (ww-edge - cycle with 3)
90639108 - 0 = 0 [#2] - read key 0, value 3, tx 2 (wr-edge - cycle with 3)
...
2023/03/29 12:11:12 Sent {"src":"n0","dest":"c4","body":{"in_reply_to":1,"txn":[["r",0,1],["w",0,10],["r",0,3],["w",0,11]],"type":"txn_ok"}}
2023/03/29 12:11:12 Sent {"src":"n0","dest":"c3","body":{"in_reply_to":1,"txn":[["w",0,1],["r",0,1],["r",0,10],["w",0,2],["w",0,3],["r",0,3],["w",0,4],["r",0,11],["r",0,17],["r",0,19],["w",0,5],["w",0,6],["r",0,14],["w",0,7],["w",0,8],["r",0,34],["w",0,9]],"type":"txn_ok"}}

Here we have many cycles of tx 2 and tx3: w2(1), w3(10), w2(2), w2(3). MS could alone detect the g0 based only on the received responses going from that (1) every written value is unique (2) all writes in those 2 txs covered with reads.

`Invalid dest for message` on teardown

On teardown, maelstrom has an AssertionError when one of the nodes attempts to send to a node that was already torn down.
Seems like this assertion error shouldn't be happening / printed.

INFO [2023-04-13 01:55:14,033] jepsen node n2 - maelstrom.db Tearing down n2
WARN [2023-04-13 01:55:14,033] n4 stdout - maelstrom.process Error!
java.lang.AssertionError: Assert failed: Invalid dest for message #maelstrom.net.message.Message{:id 518901, :src "n4", :dest "n1", :body {:msg_id 108978, :type "broadcast", :message 36}}

Trying to set this up, but getting error=13, Permission denied

I download the echo.rb file and I am running

maelstrom test -w echo --bin echo.rb --time-limit 5
INFO [2023-02-23 18:21:56,703] jepsen node n1 - maelstrom.db Setting up n1
INFO [2023-02-23 18:21:56,703] jepsen node n1 - maelstrom.process launching echo.rb []
INFO [2023-02-23 18:21:57,715] jepsen node n1 - maelstrom.net Shutting down Maelstrom network
WARN [2023-02-23 18:21:57,716] jepsen test runner - jepsen.core Test crashed!
java.io.IOException: Cannot run program "/home/barry/programming/haskell/maelstromHs/echo.rb" (in directory "/tmp"): error=13, Permission denied
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1143)
	at java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1073)

I also tried sudo .... I also tried executing it directly from /tmp (same error).

I don't get permission denied from, e.g.,

cd /tmp
touch example.file

Not really sure what to do here.

Not clear how to troubleshoot "No matching clause: [:ok :invoke]"

I'm working through the Maelstrom Guide. For better or worse I decided to implement each exercise in Go rather than just use the Ruby code supplied in docs. I'm currently implementing the logic from Chapter 5 'Database As Value'.

I'm invoking maelstrom with these arguments:
./maelstrom test -w txn-list-append --bin "${GOPATH}/bin/datomic" --time-limit 10 --node-count 2 --rate 100 --log-stderr

I'm running into an error in results.edn that I'm not sure how to troubleshoot:
java.lang.IllegalArgumentException: No matching clause: [:ok :invoke]

My first thought was that that indicated there was an :invoke without a corresponding :ok. I found one missing:ok for each node -- with a :net :timeout instead. That seems to match the behavior described in the Database As Value section.

Any suggestions for how to troubleshoot this?

{:perf {:latency-graph {:valid? true},
        :rate-graph {:valid? true},
        :valid? true},
 :timeline {:valid? true},
 :exceptions {:valid? true},
 :stats {:valid? true,
         :count 853,
         :ok-count 851,
         :fail-count 0,
         :info-count 2,
         :by-f {:txn {:valid? true,
                      :count 853,
                      :ok-count 851,
                      :fail-count 0,
                      :info-count 2}}},
 :net {:all {:send-count 5120,
             :recv-count 5120,
             :msg-count 5120,
             :msgs-per-op 6.0023446},
       :clients {:send-count 1708, :recv-count 1708, :msg-count 1708},
       :servers {:send-count 3412,
                 :recv-count 3412,
                 :msg-count 3412,
                 :msgs-per-op 4.0},
       :valid? true},
 :workload {:valid? :unknown,
            :error "java.lang.IllegalArgumentException: No matching clause: [:ok :invoke]\n at elle.list_append$dirty_update_cases$fn__18117$fn__18122.invoke (list_append.clj:404)\n    clojure.lang.PersistentVector.reduce (PersistentVector.java:343)\n    clojure.core$reduce.invokeStatic (core.clj:6829)\n    clojure.core$reduce.invoke (core.clj:6812)\n    elle.list_append$dirty_update_cases$fn__18117.invoke (list_append.clj:397)\n    clojure.core$map$fn__5884.invoke (core.clj:2759)\n    clojure.lang.LazySeq.sval (LazySeq.java:42)\n    clojure.lang.LazySeq.seq (LazySeq.java:51)\n    clojure.lang.RT.seq (RT.java:535)\n    clojure.core$seq__5419.invokeStatic (core.clj:139)\n    clojure.core$apply.invokeStatic (core.clj:662)\n    clojure.core$mapcat.invokeStatic (core.clj:2787)\n    clojure.core$mapcat.doInvoke (core.clj:2787)\n    clojure.lang.RestFn.invoke (RestFn.java:423)\n    elle.list_append$dirty_update_cases.invokeStatic (list_append.clj:395)\n    elle.list_append$dirty_update_cases.invoke (list_append.clj:386)\n    elle.list_append$check.invokeStatic (list_append.clj:883)\n    elle.list_append$check.invoke (list_append.clj:848)\n    jepsen.tests.cycle.append$checker$reify__18396.check (append.clj:19)\n    jepsen.checker$check_safe.invokeStatic (checker.clj:81)\n    jepsen.checker$check_safe.invoke (checker.clj:74)\n    jepsen.checker$compose$reify__9656$fn__9658.invoke (checker.clj:97)\n    clojure.core$pmap$fn__8485$fn__8486.invoke (core.clj:7024)\n    clojure.core$binding_conveyor_fn$fn__5772.invoke (core.clj:2034)\n    clojure.lang.AFn.call (AFn.java:18)\n    java.util.concurrent.FutureTask.run (FutureTask.java:264)\n    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1136)\n    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:635)\n    java.lang.Thread.run (Thread.java:833)\n"},
 :valid? :unknown}

Default workload name not being set when not explicitly passed

Found a minor issue when running maelstrom against a binary without explicitly setting the workload -w option, the test throws a NullPointerException.

Probably the workload set here

(def test-opt-spec
  "Options for single tests."
  [[nil "--bin FILE"        "Path to binary which runs a node"
    :missing "Expected a --bin PATH_TO_BINARY to test"]

   ["-w" "--workload NAME" "What workload to run."
    :default "lin-kv"
    :parse-fn keyword
    :validate [workloads (cli/one-of workloads)]]
   ])

Is not setting the workload-name here

        workload-name (:workload opts)
        workload      ((workloads workload-name)
                       (assoc opts :nodes nodes, :net net))

Reproduction Steps:

Execute the command: ./maelstrom test --bin demobin
the error message which includes the main details:

ERROR [2023-09-09 21:22:36,041] main - jepsen.cli Oh jeez, I'm sorry, Jepsen broke. Here's why:
java.lang.NullPointerException: Cannot invoke "clojure.lang.IFn.invoke(Object)" because the return va        at maelstrom.core$maelstrom_test.invokeStatic(core.clj:62)
        at maelstrom.core$maelstrom_test.invoke(core.clj:53)
        at jepsen.cli$single_test_cmd$fn__13951.invoke(cli.clj:396)
        at jepsen.cli$run_BANG_.invokeStatic(cli.clj:329)
        at jepsen.cli$run_BANG_.invoke(cli.clj:258)
        at maelstrom.core$_main.invokeStatic(core.clj:269)
        at maelstrom.core$_main.doInvoke(core.clj:267)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at maelstrom.core.main(Unknown Source)

Expected Behavior:
The default workload, which is "lin-kv", should be used when the -w option is not explicitly provided. Furthermore, when I run the test with the explicit -w lin-kv argument, the test works. It appears there's an issue with either the default value behavior or the way it's being handled.

We can either fix or remove the lin-kv option from the default settings. I am open to contributing a fix through a pull request based on the direction you would like to take on this issue.

Compare-and-swap on seq-kv

Hi @aphyr. Someone recently posted a graph of messages against the seq-kv where two separate identity CAS operations succeed:

  • n0 for value 121
  • n1 for value 119

Is this legal under Sequential consistency? My understanding from the section in Services was that updates could only interact with past states if total-order is not violated.

tl;dr just wondering if this is a bug or if I'm misunderstanding something. Thanks!

image

Namesis partition issue?

Hello,

First of thanks for all your work.

I'm working on Gossip Glomers, challenge 3c. The main change is that we add --nemesis partition to test network partitions. When I test it against my solution for 3b, which doesn't handle network partitions, I'm getting an Everything looks good!.

That led me to think that perhaps, the --nemesis partition wasn't working correctly as, again, the test should have failed because my solution doesn't handle network partitions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.