Test monadic programs using state machine based models

License: Other

Haskell 97.97% Makefile 0.09% TeX 1.94%

haskell-library quickcheck haskell testing property-based-testing

quickcheck-state-machine's Introduction

quickcheck-state-machine

quickcheck-state-machine is a Haskell library, based on QuickCheck, for testing stateful programs. The library is different from the Test.QuickCheck.Monadic approach in that it lets the user specify the correctness by means of a state machine based model using pre- and post-conditions. The advantage of the state machine approach is twofold: 1) specifying the correctness of your programs becomes less adhoc, and 2) you get testing for race conditions for free.

The combination of state machine based model specification and property based testing first appeard in Erlang's proprietary QuickCheck. The quickcheck-state-machine library can be seen as an attempt to provide similar functionality to Haskell's QuickCheck library.

Example

As a first example, let's implement and test programs using mutable references. Our implementation will be using IORefs, but let's start with a representation of what actions are possible with programs using mutable references. Our mutable references can be created, read from, written to and incremented:

data Command r
  = Create
  | Read      (Reference (Opaque (IORef Int)) r)
  | Write     (Reference (Opaque (IORef Int)) r) Int
  | Increment (Reference (Opaque (IORef Int)) r)

data Response r
  = Created (Reference (Opaque (IORef Int)) r)
  | ReadValue Int
  | Written
  | Incremented

When we generate actions we won't be able to create arbitrary IORefs, that's why all uses of IORefs are wrapped in Reference _ r, where the parameter r will let us use symbolic references while generating (and concrete ones when executing).

In order to be able to show counterexamples, we need a show instance for our actions. IORefs don't have a show instance, thats why we wrap them in Opaque; which gives a show instance to a type that doesn't have one.

Next, we give the actual implementation of our mutable references. To make things more interesting, we parametrise the semantics by a possible problem.

data Bug = None | Logic | Race
  deriving Eq

semantics :: Bug -> Command Concrete -> IO (Response Concrete)
semantics bug cmd = case cmd of
  Create        -> Created     <$> (reference . Opaque <$> newIORef 0)
  Read ref      -> ReadValue   <$> readIORef  (opaque ref)
  Write ref i   -> Written     <$  writeIORef (opaque ref) i'
    where
    -- One of the problems is a bug that writes a wrong value to the
    -- reference.
      i' | bug == Logic && i `elem` [5..10] = i + 1
         | otherwise                        = i
  Increment ref -> do
    -- The other problem is that we introduce a possible race condition
    -- when incrementing.
    if bug == Race
    then do
      i <- readIORef (opaque ref)
      threadDelay =<< randomRIO (0, 5000)
      writeIORef (opaque ref) (i + 1)
    else
      atomicModifyIORef' (opaque ref) (\i -> (i + 1, ()))
    return Incremented

Note that above r is instantiated to Concrete, which is essentially the identity type, so while writing the semantics we have access to real IORefs.

We now have an implementation, the next step is to define a model for the implementation to be tested against. We'll use a simple map between references and integers as a model.

newtype Model r = Model [(Reference (Opaque (IORef Int)) r, Int)]

initModel :: Model r
initModel = Model []

The pre-condition of an action specifies in what context the action is well-defined. For example, we can always create a new mutable reference, but we can only read from references that already have been created. The pre-conditions are used while generating programs (lists of actions).

precondition :: Model Symbolic -> Command Symbolic -> Logic
precondition (Model m) cmd = case cmd of
  Create        -> Top
  Read  ref     -> ref `member` map fst m
  Write ref _   -> ref `member` map fst m
  Increment ref -> ref `member` map fst m

The transition function explains how actions change the model. Note that the transition function is polymorphic in r. The reason for this is that we use the transition function both while generating and executing.

transition :: Eq1 r => Model r -> Command r -> Response r -> Model r
transition m@(Model model) cmd resp = case (cmd, resp) of
  (Create, Created ref)        -> Model ((ref, 0) : model)
  (Read _, ReadValue _)        -> m
  (Write ref x, Written)       -> Model (update ref x model)
  (Increment ref, Incremented) -> case lookup ref model of
    Just i  -> Model (update ref (succ i) model)

update :: Eq a => a -> b -> [(a, b)] -> [(a, b)]
update ref i m = (ref, i) : filter ((/= ref) . fst) m

Post-conditions are checked after we executed an action and got access to the result.

postcondition :: Model Concrete -> Command Concrete -> Response Concrete -> Logic
postcondition (Model m) cmd resp = case (cmd, resp) of
  (Create,        Created ref) -> m' ! ref .== 0 .// "Create"
    where
      Model m' = transition (Model m) cmd resp
  (Read ref,      ReadValue v)  -> v .== m ! ref .// "Read"
  (Write _ref _x, Written)      -> Top
  (Increment _ref, Incremented) -> Top

Next we have to explain how to generate and shrink actions.

generator :: Model Symbolic -> Maybe (Gen (Command Symbolic))
generator (Model []) = Just (pure Create)
generator model      = Just $ frequency
  [ (1, pure Create)
  , (4, Read  <$> elements (map fst model))
  , (4, Write <$> elements (map fst model) <*> arbitrary)
  , (4, Increment <$> elements (domain model))
  ]

shrinker :: Model Symbolic -> Command Symbolic -> [Command Symbolic]
shrinker _ (Write ref i) = [ Write ref i' | i' <- shrink i ]
shrinker _ _             = []

To stop the generation of new commands, e.g., when the model has reached a terminal or error state, let generator return Nothing.

Finally, we show how to mock responses given a model.

mock :: Model Symbolic -> Command Symbolic -> GenSym (Response Symbolic)
mock (Model m) cmd = case cmd of
  Create      -> Created   <$> genSym
  Read ref    -> ReadValue <$> pure (m ! ref)
  Write _ _   -> pure Written
  Increment _ -> pure Incremented

(mock is a hack to make it possible for responses to have multiple reference, and an experiment which maybe one day will let us create mocked APIs. See issue #236 for further details.)

To be able to fit the code on a line we pack up all of them above into a record.

sm :: Bug -> StateMachine Model Command IO Response
sm bug = StateMachine initModel transition precondition postcondition
           Nothing generator shrinker (semantics bug) mock

We can now define a sequential property as follows.

prop_sequential :: Bug -> Property
prop_sequential bug = forAllCommands sm' Nothing $ \cmds -> monadicIO $ do
  (hist, _model, res) <- runCommands sm' cmds
  prettyCommands sm' hist (checkCommandNames cmds (res === Ok))
    where
      sm' = sm bug

If we run the sequential property without introducing any problems to the semantics function, i.e. quickCheck (prop_sequential None), then the property passes. If we however introduce the logic bug problem, then it will fail with the minimal counterexample:

> quickCheck (prop_sequential Logic)
*** Failed! Falsifiable (after 12 tests and 2 shrinks):
Commands
  { unCommands =
      [ Command Create [ Var 0 ]
      , Command (Write (Reference (Symbolic (Var 0))) 5) []
      , Command (Read (Reference (Symbolic (Var 0)))) []
      ]
  }

Model []

   == Create ==> Created (Reference (Concrete Opaque)) [ 0 ]

Model [+_×_ (Reference Opaque)
          0]

   == Write (Reference (Concrete Opaque)) 5 ==> Written [ 0 ]

Model [_×_ (Reference Opaque)
         -0
         +5]

   == Read (Reference (Concrete Opaque)) ==> ReadValue 6 [ 0 ]

Model [_×_ (Reference Opaque) 5]

PostconditionFailed "AnnotateC \"Read\" (PredicateC (6 :/= 5))" /= Ok

Recall that the bug problem causes the write of values i `elem` [5..10] to actually write i + 1. Also notice how the diff of the model is displayed between each action.

Running the sequential property with the race condition problem will not uncover the race condition.

If we however define a parallel property as follows.

prop_parallel :: Bug -> Property
prop_parallel bug = forAllParallelCommands sm' $ \cmds -> monadicIO $ do
  prettyParallelCommands cmds =<< runParallelCommands sm' cmds
    where
      sm' = sm bug

And run it using the race condition problem, then we'll find the race condition:

> quickCheck (prop_parallel Race)
*** Failed! Falsifiable (after 26 tests and 6 shrinks):
ParallelCommands
  { prefix =
      Commands { unCommands = [ Command Create [ Var 0 ] ] }
  , suffixes =
      [ Pair
          { proj1 =
              Commands
                { unCommands =
                    [ Command (Increment (Reference (Symbolic (Var 0)))) []
                    , Command (Read (Reference (Symbolic (Var 0)))) []
                    ]
                }
          , proj2 =
              Commands
                { unCommands =
                    [ Command (Increment (Reference (Symbolic (Var 0)))) []
                    ]
                }
          }
      ]
  }
┌─────────────────────────────────────────────────────────────────────────────────────────────────┐
│ [Var 0] ← Create                                                                                │
│                                                         → Created (Reference (Concrete Opaque)) │
└─────────────────────────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────┐ │
│ [] ← Increment (Reference (Concrete Opaque)) │ │
│                                              │ │ ┌──────────────────────────────────────────────┐
│                                              │ │ │ [] ← Increment (Reference (Concrete Opaque)) │
│                                              │ │ │                                → Incremented │
│                                              │ │ └──────────────────────────────────────────────┘
│                                → Incremented │ │
└──────────────────────────────────────────────┘ │
┌──────────────────────────────────────────────┐ │
│ [] ← Read (Reference (Concrete Opaque))      │ │
│                                → ReadValue 1 │ │
└──────────────────────────────────────────────┘ │

AnnotateC "Read" (PredicateC (1 :/= 2))

As we can see above, a mutable reference is first created, and then in parallel (concurrently) we do two increments of said reference, and finally we read the value 1 while the model expects 2.

Recall that incrementing is implemented by first reading the reference and then writing it, if two such actions are interleaved then one of the writes might end up overwriting the other one -- creating the race condition.

We shall come back to this example below, but if your are impatient you can find the full source code here.

How it works

The rough idea is that the user of the library is asked to provide:

a datatype of actions;
a datatype model;
pre- and post-conditions of the actions on the model;
a state transition function that given a model and an action advances the model to its next state;
a way to generate and shrink actions;
semantics for executing the actions.

The library then gives back a bunch of combinators that let you define a sequential and a parallel property.

Sequential property

The sequential property checks if the model is consistent with respect to the semantics. The way this is done is:

generate a list of actions;
starting from the initial model, for each action do the the following:
1. check that the pre-condition holds;
2. if so, execute the action using the semantics;
3. check if the the post-condition holds;
4. advance the model using the transition function.
If something goes wrong, shrink the initial list of actions and present a minimal counterexample.

Parallel property

The parallel property checks if parallel execution of the semantics can be explained in terms of the sequential model. This is useful for trying to find race conditions -- which normally can be tricky to test for. It works as follows:

generate a list of actions that will act as a sequential prefix for the parallel program (think of this as an initialisation bit that setups up some state);
generate two lists of actions that will act as parallel suffixes;
execute the prefix sequentially;
execute the suffixes in parallel and gather the a trace (or history) of invocations and responses of each action;
try to find a possible sequential interleaving of action invocations and responses that respects the post-conditions.

The last step basically tries to find a linearisation of calls that could have happend on a single thread.

More examples

Here are some more examples to get you started:

The water jug problem from Die Hard 3 -- this is a simple example of a specification where we use the sequential property to find a solution (counterexample) to a puzzle from an action movie. Note that this example has no meaningful semantics, we merely model-check. It might be helpful to compare the solution to the Hedgehog solution and the TLA+ solution;
The Tower of Hanoi puzzle -- this example uses property based testing in a very similar manner to the Die Hard example to find a solution to the classic Tower of Hanoi puzzle;
Mutable reference example -- this is a bigger example that shows both how the sequential property can find normal bugs, and how the parallel property can find race conditions;
Circular buffer example -- another example that shows how the sequential property can find help find different kind of bugs. This example is borrowed from the paper Testing the Hard Stuff and Staying Sane [PDF, video]. For a more direct translation from the paper, see the following variant which uses the C FFI;
The union-find example -- an imperative implementation of the union-find algorithm. It could be useful to compare the solution to the one that appears in the paper Testing Monadic Code with QuickCheck [PS], which the Test.QuickCheck.Monadic module is based on;
Ticket dispenser example -- a simple example where the parallel property is used once again to find a race condition. The semantics in this example uses a simple database file that needs to be setup and cleaned up. This example also appears in the Testing a Database for Race Conditions with QuickCheck and Testing the Hard Stuff and Staying Sane [PDF, video] papers;
CRUD webserver where create returns unique ids example -- create, read, update and delete users in a postgres database on a webserver using an API written using Servant. Creating a user will return a unique id, which subsequent reads, updates, and deletes need to use. In this example, unlike in the last one, the server is setup and torn down once per property rather than generate program;
Bookstore example -- another database application, that uses simple SQL queries to manage a bookstore. It is based on a case study in Erlang from online version of Fred Hebert's PropEr Testing book;
Process registry example -- an example often featured in the Erlang QuickCheck papers. This example shows how one can tag the specification with which requirements are covered and then generate (minimal) examples of test cases that cover each requirement, as shown in the How well are your requirements tested? [PDF] and Understanding Formal Specifications through Good Examples [PDF, video] papers.

All properties from the examples can be found in the Spec module located in the test directory. The properties from the examples get tested as part of Travis CI.

To get a better feel for the examples it might be helpful to git clone this repo, cd into it, fire up stack ghci --test, load the different examples, e.g. :l test/CrudWebserverDb.hs, and run the different properties interactively.

Real world examples

More examples from the "real world":

IOHK are using a state machine models in several places. For example here is a test of a mock file system that they in turn use to simulate file system errors when testing a blockchain database. The following blog post describes their tests in more detail;
Wire are using a state machine model to test the lower bound of running threads in their push notification system;
Adjoint's (now abandoned?) implementation of the Raft consensus algorithm, contains state machine tests combined with fault injection (node and network failures).

How to contribute

The quickcheck-state-machine library is still very experimental.

We would like to encourage users to try it out, and join the discussion of how we can improve it on the issue tracker!

License

BSD-style (see the file LICENSE).

quickcheck-state-machine's People

Contributors

Stargazers

Watchers

Forkers

neo4reo lysxia abailly rjmh adetokunbo kderme mrbliss rdanitz abonie adinapoli isaacr24 duog stevana

quickcheck-state-machine's Issues

Fix `TicketDispenser.prop_parallelBad` using machinery from `MutualReference.Prop.prop_shrinkForkMinimal`

See PR #124.

Testing asynchronous APIs

Work in progress.

Consider the following example, a simple server that when sent a number
replies by sending the sum of all numbers it has been sent so far (asynchronously,
say on a message bus).

data Action v :: * -> * where
  Add :: Int -> Action v ()
  Sum :: Action v Int

Model = Int

next m (Add i) = m + i
next m Sum     = m

post m (Add i) _ = True
post m Sum     i = i == m

So far so good, but what is the semantics for Sum?

sem (Add i) = apiCallToEndpoint i server
sem Sum     = ?

It doesn't really make sense, the Sum action is created by the
environment (message bus).

Here's an idea, don't generate Sum actions. Instead let the user
provide a function:

environmentEvents :: TChan History -> IO ()

That has access to the history channel, and can listen to the bus and
add Sum events. It's important that the user adds a InvocationEvent
for Sum immediately after it sees a ResponseEvent for Add on the
channel -- otherwise linearisation might fail, e.g.:

|--- Add 1 ---| |--- Add 2 ---|
                                 |-- Sum 1 --|

The ResponseEvent for Sum should be added when the actual message
appears on the message bus.

One clear downside of this approach is that it exposes an implementation
detail, the history channel, to the user. I'm also not sure what to do
about pids (we can't/shouldn't have multiple pending invocations on the
same pid -- that would break sequentiality of the thread).

Towards liveness and fairness guarantees

From POPL 2018: Reducing Liveness to Safety in First-Order Logic.

Parallel shrinking breaks pre-conditions

See prop_parallelBad in ticket dispenser example.

Possible solution: bring back some kind of static analysis, possible [Untyped cmd ConstIntRef] -> Bool as a pre-condition in the parallel run property. This exposes ConstIntRef to the user though...

Clean up tests

The test target the the main cabal file isn't used, remove?
Run the tests in parallel?

quickcheck-state-machine-0.4.0 test suite failure

The output is very verbose, so I won't include it here, but the end result, as seen on the Stackage server is:

4 out of 17 tests failed (13.74s)

Towards refinement

Refinement is used by the Z/Event-B/TLA+ people as a way to not have to model the whole system in full detail at once. The father of Z/Event-B, Jean-Raymond Abrial, makes the analogy that if you think of a model as the blueprint (in the architecture/engineering sense), then refinement is a way to zoom in and get a more detailed view of the system.

A nice demo of how this plays out in Event-B is given by Abrial in the following video. I've started to port the example from the demo in the example/bank branch in order to better understand it.

I suspect that the necessary bits of the puzzle can be read off from the notion of morphism/(linear) simulation between indexed containers/interaction structures, but I don't see exactly how it all fits together yet.

I'll use this issue for notes. Any thoughts or comments are of course welcome!

Composition of specifications

How can we combine specifications?

See chapter 10 in Lamport's Specifying systems, and Conjoining Specifications for examples, and PR #132 for a start.

Make new type for programs

Instead of having a program be of type [Untyped cmd ref] we could have a new type

data Program cmd ref where
  Done :: Program cmd ref
  Let :: IntRef -> cmd ('Reference ix) ref -> Program cmd ref -> Program cmd ref
  Action :: cmd ('Response t) ref -> Program cmd ref -> Program cmd ref

I'm not yet sure if having two different constructors Let and Action is worth it yet, in case one more often makes a case distinction on the result of returns then it wouldn't be that bad.

Either way the important thing is the IntRef in the Let constructor, so instead of:

Generating the new variables in liftSem
Guess (correctly) the names that will be generated in liftGen
Remap all variable names in the liftShrink

We make liftGen add the names (and they are now really names and not just be a de Bruijn level), and then have:

liftSem just uses that name when it adds things to the environment
liftShrink only cares about removing commands that uses the reference (no need to re-index names)

Remove `Write` and `Copy` from mutable reference example?

As they are not really essential...

Introduce a newtype for `[IntRefed cmd]`

..., and replace showFork :: (a -> String) -> Fork a -> String and showIntRefedList :: (ShowCmd cmd, HasResponse cmd) => [IntRefed cmd] -> String by proper show instances.

Then we could probably get rid forAllShrinkShow also.

Can shrink be typed?

Question raised by @Danten.

Rewrite the union/find example using STRefs rather than IORefs

This doesn't work under the current API.

Make the notion of valid parallel programs less strict

@Danten pointed out that the current way validParallelProgram is implemented doesn't accept programs such as:

x <- new
y <- new
---------------------
write x 0 | read y 
          | delete y

Improve the possibilities of collecting stats

Ideas on how we can improve stats collecting:

Upload a package to Hackage

A couple of things to do before:

Show instances produce invalid expressions.

The Show instance for Program drops Internal and Symbolic constructors from the output, which produces invalid expressions, e.g.

[FillSmall (Var 8), ...]

Now that we pretty print the history of a run, instead of the program, we should fix the above and change the Read instances accordingly.

Add a support for global invariants

It's quite common that we want some invariant to hold through out the execution of a program. To achieve this we currently we need to do something like:

postcondition model act resp =  invariant model && case act of ...

It would be nice if the state machine record had a maybe invariant field which would do this check inside the library, rather than by the user.

Actions that return two references can't be expressed

If we have an action like:

Apa :: Action (Ref1, Ref2)

then the transition function will have access to v (Ref1, Ref2) rather than the much more useful (v Ref1, v Ref2).

Not sure how to solve this. It would be nice if failing semantics, i.e. when transition has access to Either err (v Ref) could also be made a special case of the solution.

Factor out hfunctor and friends into its own package

As per discussion in #161.

Make equality of `Reference` independent of the contents

We have an instance (Eq1 v, Eq a) => Eq (Reference v a) , which requires that the underlying reference type be comparable (Eq a). That happens to be fine with IORef but in general reference types such as mutable vectors do not have such an instance. A workaround is a wrapper that carries a unique identifier for each reference (see here for example), but could qc-state-machine integrate that somehow?

Can we avoid `fixPid` in `liftGenFork`?

Do some profiling and add benchmarking

Avoid having two different state machines (in the generator and in the model)

We probably want to merge StateMachineModel and Generator somehow.This might involve dropping having access to the response in the transition function.

Linearising histories which create references in parallel is broken

As witnessed by running quickCheck $ prop_parallel None:

Couldn't linearise:

Prefix:
     
Parallel:
  1. New --> ()0
     Write (Ref 0) (-1) --> ()
  2. New --> ()0
     Read (Ref 0) --> 0

See #8 for an attempt to solve this problem; the solution isn't satisfactory though, as it requires the model to be a monoid.

Another possible way to solve this might be to use gensymed strings, rather than integers, as references. This will make comparing programs more difficult though (alpha-equivalence).

It might also be possible to be abstract in what references are, and merely provide the user with an interface to write the pre-/post-conditions with. This interface could then be swapped out by the library while linearising to account for which pid is doing the lookup.

Introduce a newtype for pids

Less boilerplate when defining generators

In Erlang QuickCheck generation is specified in two parts, first a weight function that assigns a frequency to a constructor given a model:

weight :: model Symbolic -> String -> Int

Erlang uses an atom for the constructor, above we simply use a string (perhaps this can be improved).

The second part is that we specify how to generate all arguments to an action. Let's introduce a concrete example:

data Action v :: * -> * where
  Create :: Int -> Action v Ref
  Use     :: Reference v Ref -> String -> Action v ()

type Model v = [(Reference v Ref, String)]

Create has an integer as argument and Use has a Reference v Ref and a string. So the specification would be something like:

genCreateArgs :: Model Symbolic -> Gen Int
genCreateArgs _ = arbitrary

genUseArgs :: Model Symbolic -> (Gen (Referenve v Ref), Gen String)
genUseArgs m = (elements (map fst m), arbtrary)

The two are then put together to create:

gen :: Model Symbolic -> Gen (Untyped Action)
gen m = frequency
  [ (weight m "Create", Untyped <$> (Create <$> genCreateArgs m))
  , (weight m "Use"   , Untyped <$> (uncurry Use <$> genUseArgs m))
  ]

This last part can be template Haskelled away using the first two parts.

Thoughts?

Remove excessive newlines from output

Only rerun when shrinking in the parallel property helper

In the parallel property helper we currently run the semantics 10 times.

In Finding race conditions in Erlang with QuickCheck and PULSE its mentioned that they only rerun when shrinking:

Our solution to this is almost embarrassing in its simplicity:
instead of running each test only once, we run it many times, and
consider a test case to have passed only if it passes every time we
run it. We express this concisely using a new form of QuickCheck
property, ?ALWAYS(N,Prop), which passes if Prop passes N times
in a row [4] . Now, provided the race condition we are looking for is
reasonably likely to be provoked by test cases in which it is present,
then ?ALWAYS(10,...) is very likely to provoke it—and so tests
are unlikely to succeed “by chance” during the shrinking process.
This dramatically improves the effectiveness of shrinking, even for
quite small values of N. While we do not always obtain minimal
failing tests with this approach, we find we can usually obtain a
minimal example by running QuickCheck a few times.
[4]: In fact we need only repeat tests during shrinking.

How can we define this always combinator?

Support commands with more than one variable/reference

The API currently assumes that there is only one type of reference. We should generalise this to a family of references.

Towards a EDSL for defining concise models

PR #171 introduces a small language, inspired by Z/Event-B and TLA+, for specifying concise and showable models.

The, perhaps naive, idea that I have is that it will allows us to more easily port examples from Event-B and TLA+ to our setting. I say naive because the settings are obviously very different. They use classical meta-theory where as we have to be more constructive (unless we want to implement a theorem prover), and it shows in that many operations in the PR are partial.

I'd be curious to hear what other people think about this topic!

Towards the possibility of mocking parts of the system under test

In the paper Modelling of Autosar Libraries for Large Scale Testing a mocking framework is used to simplify the testing process. The framework is described in the paper An expressive semantics of mocking [PDF].

It seems the mocking part also uses a model, a pre-condition and a transition function, but instead of semantics and post-conditions it has a so called callout function. This callout function is what constructs the mocked return value from the model and the arguments to the mocked action.

How exactly this works and how it can be fit into our library is still not clear to me...

bracketP doesn't work

Current source of bracketP

return . prop never fails, so the finalizer is never run. Using bracketOnError seems also strange. Most of the time we want to clean up resources regardless of whether the bracketed operation fails (use bracket instead). Ultimately I don't think bracketP on properties is actually needed, it operates at the wrong level of abstraction.

A consequence of its current behavior is that the CrudWebserver examples spawn servers without killing them. When the setup action is run (once per test case), I believe the following happens:

the first call binds a constant port (8080, 8081 or 8082 depending on the test case);
in all the others calls the new server dies because the first one is still living and using that port, but setup still returns happily.
- CrudWebServerDb doesn't check that failure.
- The check in CrudWebServerFile has a race condition: the server thread may be polled just before it fails, so we see Nothing and assume the server is fine.
Thus tests continue talking with the first server, which carries data from previous runs. Tests still succeed because the preconditions in the model prevent reading data that the current test case has not (over)written.

I think there are two options here, and in both cases a simple bracket would be sufficient:

Start and stop a new server for each run. A better place for this would be inside the StateMachine runner, rather than a bracket around the toplevel prop_* properties.
Spawn a global server and reuse it. One must export an action to initialize it before properties are run.

Any preference for fixing the CrudWebserver examples?

Can we show results when printing sequential counterexamples?

I.e. print Operations rather than just commands?

Trying to understand the TicketDispenser example

Is GH the appropriate forum to ask newbie questions?

Looking at the ticket dispenser example I have the following questions:

Which part of this is the QC test and which part is the core implementation? I'm guessing these lines are the core implementation of the ticket dispenser, right?
Looking at the way the core implementation has been written, and what's written in the README as well, i.e. a datatype of commands;, does it basically mean that the app needs to be structured like an Elm/React app?
I'm assuming this is the simplified model against which the real model is being tested, right? What is the phantom type refs for?
Why is a complicated GADT required for the app-wide Action data model? Is it because an Action is conceptually a representation of a function call which can accept any data-type and return any data-type? Which also brings me to the next question, what if TakeTicket could accept the number of tickets the user wants to take? What would the Action type look like, in that case?
Any further explanation of what the type signature of semantics means and why is it so complicated?

Question: echo model behaving in an unexpected way

I made a model from this system under test (SUT). The basic idea is that whatever is input to the system is returned afterwards.

The sequential property passes, as I expected, however the parallel property fails. For instance I see the following failure:

Echo
 │ ┌────────────┐
 │ │ [] ← In "" │
 │ │  → ErrFull │
 │ └────────────┘
  implements echo (parallel) FAILED [1]

Failures:

  test/EchoSpec.hs:42:5:
  1) Echo implements echo (parallel)
       Falsifiable (after 3 tests):
         ParallelCommands
           { prefix = Commands { unCommands = [] }
           , suffixes =
               [ Pair
                   { proj1 = Commands { unCommands = [] }
                   , proj2 =
                       Commands { unCommands = [ Command (In "") (fromList []) ] }
                   }
               ]
           }

  To rerun use: --match "/Echo/implements echo (parallel)/"

it seems that the SUT reports that the buffer of size one is full even though no input action is observed in the counter example.

I spent several hours trying to pinpoint the source of the problem without any luck. I read the examples in this repository without finding hints either.

Any help is greatly appreciated.

What should the API look like?

Currently the user needs to provide:

a GADT of well-typed commands
a model
pre-/post-conditions and a model transition function
a generator of untyped commands without references
a shrinker for an untyped command
semantics of well-typed commands
a bunch of common instances for the commands; show, read, eq, ord, functor, etc
a function which identifies which commands return a reference/variable

Given the above, the library gives a sequential and a parallel property back to the user.

While this seems to be quite a minimal set of things we can expect from the user, there are still a couple of problems:

the Functor and Read instances for well-typed commands can't be derived, and Functor requires an unsafeCoerce, see #16. These problems didn't exist before, when commands were untyped, see #9, is it worth the trouble?
the types of the property helper functions (sequentialProperty and parallelProperty) could be simpler, perhaps using classes or records to pack up functionality?

Ideas on how we can fix these problems? Are there other problems? Are there better design choices?

Generalise to N threads in the parallel property

Currently only two parallel workers are used in the parallel property, some bugs might require more things to happen in parallel...

Shrink list of commands more efficiently

See how shrinkList does it.

Show instance for Internal has "Untyped" as part of the output

E.g.:

ParallelProgram (Program {unProgram = [Internal (Untyped New) [...]

It seems like we have Show (Untyped act) => Show (Internal act) somewhere that causes this problem.

Can we keep `Pid` internal to the library?

The user shouldn't have to know about them...

Yet another idea how commands could look like

While looking at the die hard example I noticed that a lot of the functions had to do a lot of unnecessary pattern matching just to refine the type. So I have an idea.

What if we change the interface (again) to be as follows:

ix :: *
cmd :: *
uses :: cmd ~> [ix] 
returns :: cmd ~> Response ix

And then we have Untyped :: Sing cmd - > Env (uses @@ cmd) - > Untyped where Env does whatever we would normally go with refs.

I hope this would achieve a couple of things

Allow the constraints to be constant, which would allow definitions to not pattern match unnecessary on the command.
Not force the user of the library to define IxFunctor and the rest of the type classes we invented.
The generator and shrink function we need from the user only operates on cmd which allows us to completely hide the Untyped type from the user.

I'm sure I'm missing something we're this new scheme is more complicated but since I'm on a mobile I don't know where that would be.

Extra note, we could split cmd further and have a type function for the arguments that will be used for the command (and the only thing that will shrink). The only reason I have so far for this is that we could write a "generic cheat" function for the property that tests that shrink produces sub sequences. Not a biggie so could still be in the cmd type for now.

Towards Jepsen-like tests

Jepsen is a framework for testing distributed systems. It has been used to find
bugs in many systems, e.g. Kafka, Cassandra, RabbitMQ.

It is also based on performing randomly generate actions from multiple threads
and then using linearisation to ensure correctness. However it is different in
that it has a thread, called the Nemesis, which is solely dedicated to fault
injection. Fault injections include skewed clocks, gc/io pauses (killall -s
STOP/CONT), lost packets, timeouts, and network partitions between the
distributed servers.

In order to account for the faults, Jepsen has a richer notion of history than
ours which includes the possibily of operations failing or timing out. When an
action failes, we know for sure that it did not change the state of the system,
where as if an operation timed out we don't know for sure (the state could have
changed, but the server ack that didn't reach us).

It would be neat if we could perform similar tests. Exactly how this is supposed
to work is still not clear. Some answers can perhaps be found in the original
linearizabiliy paper or in the many blog posts and talks by "Aphyr", the author
of Jepsen, some of which are linked to in the README.

Perhaps a good start would be to only focus on failed actions to begin with.
This issue seems to pop up in some of our examples already, see discussion in
#159, and does not require a nemesis thread.

A first step might be to change the type of Semantics to account for possible
failures:

data Result resp err = Ok resp | Fail err

type Semantics act m err = forall resp. act Concrete resp -> m (Result resp err)

(This can later be extended to deal with timeouts.)

The ResponseEvent constructor will need to be changed accordingly, and
linearise as well. I guess the right thing to do in linearise is to not update
the model and not check the post-condition. We could also change the
post-condition make assertions about err in addition to resp, but maybe this
is a refinement we can make when we see the need for it.

Thoughts?

The threadDelay in manyStep
replicateM 10 in the parallel property

Template Haskell away the H{Functor, Foldable, Traversable} instances

Maybe break out the H* stuff into its own package also?

The following seems useful:

advancedtelematic / quickcheck-state-machine Goto Github PK