Comments (7)
There's also the case when an action that returns a reference fails. In that case the subsequent actions that use the reference and have a precondition that the reference must exist in the model, will fail saying that the precondition was false.
from quickcheck-state-machine.
Crashes
If an operation does not complete for some reason (perhaps because it timed out or a critical component crashed) that operation has no completion time, and must, in general, be considered concurrent with every operation after its invocation. It may or may not execute.
A process with an operation is in this state is effectively stuck, and can never invoke another operation again. If it were to invoke another operation, it would violate our single-threaded constraint: processes only do one thing at a time.
Source: https://jepsen.io/consistency
from quickcheck-state-machine.
@stevana my understanding is that this #208 is a prerequisite for Jepsen-like tests right?
from quickcheck-state-machine.
@kderme: No, I don't think so. (It might make some tests easier to write, but shouldn't be necessary.)
from quickcheck-state-machine.
The above PR adds the ability to complete histories. To be able to complete a
history is needed when exceptions are thrown in the semantics,
because of e.g. request timeouts (as will happen with partitions). We
don't want to catch those exceptions because this would lead to a
non-deterministic model, because if a timeout happen for a write-like
command then we have no way to tell if it the request to the node
timed out (database didn't change) or if the response back from
the node timed out (database updated). And thus if we would catch this
exception we would need to update the model with both possibilities,
otherwise later read operations would fail.
What completing history does is that it simply appends a response to the
end of the history for all threads/pids/workers that crashed/threw an
exception. In the case of a write-like command the response is a simple
"Ack". Now because of the way of how linearisability works, it will try
all possible interleavings of this write-like command with all the later
read-like commands and check if there's a possible interleaving that is
consistent. So the linearisability checker handles/hides the
non-determinism of the model!
What's still not clear to me is: can we find bugs without completing, e.g.
maybe with partitions and some (simpler) version of accounting for this
non-determinism in the model? Or somehow controlling the fault injection in a more
precise way so that we know exactly if the timeout happened to or from
the node, and thus know if the database was updated or not?
from quickcheck-state-machine.
@kderme is currently working on adding an example which uses rqlite (a distributed version of sqlite that uses Raft for consensus), hopefully this can serve as a test bed for experimenting with distributed systems and fault injection. See the following work-in-progress branch.
As a first experiment, the idea is to try to trigger a stale read in the weak read consistency mode by either stopping and restarting nodes or by causing partitions (perhaps using blockade).
from quickcheck-state-machine.
What's still not clear to me is: can we find bugs without completing, e.g.
maybe with partitions and some (simpler) version of accounting for this
non-determinism in the model? [...]
I think the answer is yes and that there's a trade-off here:
- Simple model that doesn't account for faults, needs completion, will be slow
- Complex model that accounts for faults, doesn't need completion, will be fast(er)
I've also learned why Jepsen doesn't have a completion function, if an operation crashes it advances the model purely from the request. This isn't possible in our case as our transition function also involves the response.
Lets make things a bit more concrete with an example, consider a simple counter that starts at 0
and can be increment
ed:
thread 1, request: increment
thread 2, request: increment
thread 1, response: ok
thread 2, response: timeout
At this point the value of the counter could be 1
or 2
depending on if the request on the second thread timed out while going to the server (the counter didn't get updated), or if the response going back to the client timed out (the counter got updated). Let's continue execution:
thread 1, request: read
thread 3, request: read
thread 1, response: read -> 1
thread 3, response: read -> 2
This seems weird, how can read return two different values, without an increment
happening in between? Remember that crashing operations (e.g. the timeout on the second thread) are concurrent with all operations after it, so the effect of the increment
could happen between the responses of the read
s and thus making the history linearise.
Also note that the second thread cannot be used as that could break the "single-threaded constraint: processes only do one thing at a time", as per the comment above.
from quickcheck-state-machine.
Related Issues (20)
- `forAllCommands` generating fewer commands than expected HOT 3
- Exception while generating shrink-list
- simplify: impossible, because of the structure of linearise. HOT 7
- Use dot for visualization of counter examples HOT 13
- simplify: impossible, because of the structure of linearise HOT 2
- shrinking optimizations HOT 5
- Property Result in case of exceptions HOT 8
- Fix CI HOT 15
- Improve garbage-collection/cleaning-up HOT 20
- Add pretty print for CounterExample and improve test output HOT 1
- Use Dot drawing for sequential programs
- Convenient way for manually writing examples HOT 5
- Drop lts-11 and add lts-14 to Travis CI HOT 5
- Fix CI HOT 2
- New Release? HOT 3
- Use a generator to init models HOT 4
- Mock mismatch with semantics HOT 1
- Event as defined in Labelling is misleading HOT 1
- Read instance for Commands HOT 2
- Lower bound on text in latest revision HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from quickcheck-state-machine.