Git Product home page Git Product logo

optimistic_mutex's Introduction

Mutexlib

A menagerie of locks for all occassions.

  • Optimistic_mutex. A wrapper for Stdlib.Mutex, which skips the call into pthreads, when the mutex is available. Tends to be 60-65% faster than Stdlib.Mutex on a single-core benchmark. If there's contention, it falls back onto standard library's conditional variable to avoid spinning (and that's more costly).
  • Ticket_spinlock. Simple ticket spinlock with two atomic variables (hence less contention but no try_lock function). Performs surprisingly well.
  • Mcs_spinlock. Classic lock based on a linked list, that lets all waiters spin on disjoint memory regions. Generally disappointing. Tends to be the worst of all locks on longer benchmark (I suspect GC: it needs node rotation). It dominates other only very specific workloads (e.g. ultra-overcommited ones or very bursty).
  • Array_spinlock. Similar story to the Mcs, although tends to be a bit better in the average case.
  • Naive_spinlock. Unfair and contentious spinlock.

Benchmarks

Some benchmarks highlighting strengths and weaknesses of the structures in this repo. You should probably read these carefully if you plan to replace Stdlib.Mutex with something else.

All benchmarks have been done on x86.

Basic lock&unlock

Single domain.

./_build/default/bench/bench.exe -d 1
[Stdlib.Mutex           ] time median: 16.53 ns/op
[Mcs_spinlock           ] time median: 64.94 ns/op
[Ticket_spinlock        ] time median: 6.27 ns/op
[Array_spinlock         ] time median: 6.86 ns/op
[Optimistic_mutex       ] time median: 5.77 ns/op
[Naive_spinlock         ] time median: 7.85 ns/op

Optimistic case. Most implementations are significantly faster than the standard library. Exception being Mcs_spinlocks. Seems that the extra allocations tip the scale. I have not investigated much as with the 256-domain lock Array_spinlock provides pretty much the same benefits.

Lock&unlock under contention

Two domains.

./_build/default/bench/bench.exe -d 2
[Stdlib.Mutex           ] time median: 95.14 ns/op
[Mcs_spinlock           ] time median: 165.07 ns/op
[Ticket_spinlock        ] time median: 64.45 ns/op
[Array_spinlock         ] time median: 146.82 ns/op
[Optimistic_mutex       ] time median: 167.49 ns/op
[Naive_spinlock         ] time median: 62.55 ns/op

Optimistic_mutex drops off as the chances of hitting fast path are small, and there's the extra cost of trying it and two mutex cycles due to conditional variable.

It's a bit surprising that Array_spinlock loses this much in comparison with Ticket_spinlock.

Lock&unlock under heavy contention

Three domains.

./_build/default/bench/bench.exe -d 3
[Stdlib.Mutex           ] time median: 146.76 ns/op
[Mcs_spinlock           ] time median: 459.69 ns/op
[Ticket_spinlock        ] time median: 231.27 ns/op
[Array_spinlock         ] time median: 265.67 ns/op
[Optimistic_mutex       ] time median: 315.90 ns/op
[Naive_spinlock         ] time median: 174.12 ns/op

I suspect that we start seeing that spinlocks cannot wake each other up. All of the non-naive spinlocks are fair, and if a domain in line is suspended, everyone waits while scheduler rolls the dice. Thus, despite massive contention, Naive_spinlock is the one that fares to Stdlib.Mutex.

Suspend/resume mechanism would help both approaches: the fair implementation may force wake the domain in line, while naive may put some to sleep to manage contention better.

Heavily overcommited workload

32 domains (on 16-core processor), lock&unlock x100.

./_build/default/bench/bench.exe -d 32 -c 100
[Stdlib.Mutex           ] time median: 166165.72 ns/op
[Optimistic_mutex       ] time median: 462070.82 ns/op
[Mcs_spinlock           ] time median: 5280.67 ns/op
[Ticket_spinlock        ] time median: 6665.81 ns/op
[Array_spinlock         ] time median: 4301.18 ns/op

Spinlocks perform well in such a scenario. Presumably because critical section is nonexistent and the cost of suspending and resuming threads outweighs the benefits.

Longer critical section

Why not spinlock all the things then?

4 domains, each cycle does 5000 units of work (a single unit on a single core takes around 20ns).

./_build/default/bench/bench.exe -d 4 -c 1000 -w 5000
[Stdlib.Mutex           ] time median: 251305.88 ns/op
[Optimistic_mutex       ] time median: 250947.59 ns/op
[Mcs_spinlock           ] time median: 412627.22 ns/op
[Ticket_spinlock        ] time median: 414631.16 ns/op
[Array_spinlock         ] time median: 418065.67 ns/op

As soon as we add some work in the critical section and there's contention, spinlocks' performance tanks. That's because OS may is more likely to suspend domain inside critical section, and does not know that others cannot make progress. It's much less of a problem if other waiters are not competing for CPU.

Note, the above benchmark is still somewhat well-behaved because there's more free CPU cores than domains. For some strongly overcommited workloads spinlocks degrade far further (e.g. 10-20x slower).

Contributions

Contributions are welcome. For some ideas, it'd be nice to have:

  • RWLock (e.g. just using the pthreads one). They are super useful in a lot of backend work.
  • Lower-level thread suspend/resume primitives, like here. These would let us build locks that are both faster than stdlib's mutex and reliable.

Install

opam pin add optimistic_mutex [email protected]:bartoszmodelski/optimistic_mutex.git

optimistic_mutex's People

Contributors

bartoszmodelski avatar

Stargazers

Vesa Karvonen avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.