tokio-rs / loom Goto Github PK
View Code? Open in Web Editor NEWConcurrency permutation testing tool for Rust.
License: MIT License
Concurrency permutation testing tool for Rust.
License: MIT License
I'm getting a hanging test that uses loom::fuzz
(testing code that uses channels / AtomicBool
), and I'm wondering if it's a problem in my code, or just a current limitation of the library.
I've replaced all my uses of Arc
, Mutex
and spawn
with loom's versions.
Reproducer:
#[cfg(test)]
extern crate loom;
use std::sync::atomic::Ordering;
#[cfg(test)]
use loom::sync::atomic::{AtomicBool, AtomicU64, AtomicUsize};
#[cfg(test)]
mod tests {
use super::*;
use loom::thread;
#[test]
fn basic_usage_multi_threaded() {
let mut b = loom::model::Builder::new();
b.log = true;
b.preemption_bound = Some(4);
b.check(|| {
println!("---------------------------------> NEW ITERATION");
let a = Box::leak(Box::new(AtomicU64::new(0)));
let a1 = &*a;
let a2 = &*a;
let a3 = &*a;
let send = thread::spawn(move || {
a1.fetch_add(1, Ordering::SeqCst);
a1.fetch_add(1, Ordering::SeqCst);
});
let recv1 = thread::spawn(move || {
loop {
if a2.load(Ordering::SeqCst) < 2 {
thread::yield_now();
} else {
break
}
}
});
let recv2 = thread::spawn(move || {
loop {
if a3.load(Ordering::SeqCst) < 2 {
thread::yield_now();
} else {
break
}
}
});
send.join().unwrap();
recv1.join().unwrap();
recv2.join().unwrap();
});
}
}
I am not sure whether the panic-in-panic comes from loom
or from our code running an atomic operation in a Drop
implementation which triggers the loom
panic, but here is the information gathered with gdb:
(gdb) bt
#0 0x000055555567fe5a in std::panicking::rust_panic_with_hook () at /rustc/311376d30dc1cfa622142a9f50317b1e0cb4608a/src/libcore/fmt/mod.rs:316
#1 0x000055555567fad2 in std::panicking::continue_panic_fmt () at src/libstd/panicking.rs:384
#2 0x000055555567fa1f in std::panicking::begin_panic_fmt () at src/libstd/panicking.rs:339
#3 0x0000555555630452 in loom::rt::path::Path::branch_thread (self=0x7ffff6ee5008, execution_id=..., seed=...) at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/rt/path.rs:129
#4 0x000055555563cfc8 in loom::rt::execution::Execution::schedule (self=0x7ffff6ee5000) at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/rt/execution.rs:184
#5 0x00005555556509d6 in loom::rt::branch::{{closure}} (execution=0x7ffff6ee5000) at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/rt/mod.rs:53
#6 0x000055555562d2af in loom::rt::scheduler::Scheduler::with_execution::{{closure}} (state=0x7ffff6ee4dc8) at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/rt/scheduler.rs:45
#7 0x0000555555666d05 in scoped_tls::ScopedKey<T>::with (self=0x5555558ee0a8 <loom::rt::scheduler::STATE>, f=...)
at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:189
#8 0x000055555562d1a8 in loom::rt::scheduler::Scheduler::with_execution (f=...) at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/rt/scheduler.rs:45
#9 0x00005555556511c4 in loom::rt::execution (f=...) at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/rt/mod.rs:118
#10 0x000055555565028e in loom::rt::branch (f=...) at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/rt/mod.rs:51
#11 0x000055555565b54e in loom::rt::object::Id::atomic_rmw (self=..., f=..., success=core::sync::atomic::Ordering::SeqCst, failure=core::sync::atomic::Ordering::SeqCst)
at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/rt/object.rs:219
#12 0x0000555555642f99 in loom::sync::atomic::atomic::Atomic<T>::try_rmw (self=0x7ffff7fc6058, f=..., success=core::sync::atomic::Ordering::SeqCst, failure=core::sync::atomic::Ordering::SeqCst)
at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/sync/atomic/atomic.rs:68
#13 0x0000555555642b16 in loom::sync::atomic::atomic::Atomic<T>::rmw (self=0x7ffff7fc6058, f=..., order=core::sync::atomic::Ordering::SeqCst)
at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/sync/atomic/atomic.rs:60
#14 0x0000555555642086 in loom::sync::atomic::int::AtomicU64::fetch_sub (self=0x7ffff7fc6058, val=1, order=core::sync::atomic::Ordering::SeqCst)
at /home/ekleog/.cargo/git/checkouts/loom-e4d92781492531cc/5cf2626/src/sync/atomic/int.rs:75
#15 [non-loom code]
Assertion that triggers:
assert!(
self.branches.len() < self.max_branches,
"actual = {}",
self.branches.len()
);
Value of variables:
(gdb) p self.branches
$1 = alloc::vec::Vec<loom::rt::path::Branch> {buf: alloc::raw_vec::RawVec<loom::rt::path::Branch, alloc::alloc::Global> {ptr: core::ptr::unique::Unique<loom::rt::path::Branch> {pointer: 0x7ffff0009e80,
_marker: core::marker::PhantomData<loom::rt::path::Branch>}, cap: 1024, a: alloc::alloc::Global}, len: 1000}
(gdb) p self.max_branches
$2 = 1000
It looks like reducing the number of atomic operations makes the issue disappear.
If this assertion is indeed supposed to fire (eg. if it means that we call loom
incorrectly, or that we have too many branches), maybe it would be possible to improve the error message, or even to have a println!
before it so that it does not end up eaten by panic-in-panic?
Would it be feasible to raise the MAX_THREADS limit? Perhaps it can by a typenum provided as a type parameter of the runtime, or a crate-level config variable? I need more threads so that I can do the "choice" operator as described in #84 (comment)
Thanks!
Even with state space reduction, exhaustive execution can take way too long. It should be possible to bound the search space.
Doing so correctly requires implementing an algorithm similar to that described in Bounded partial-order reduction.
minimal repro:
use loom::thread;
fn main() {
loom::model(|| {
thread::spawn(|| ()).join().unwrap();
});
}
$ RUSTFLAGS='-Zsanitizer=address' cargo build
Updating crates.io index
Compiling semver-parser v0.7.0
Compiling cc v1.0.60
Compiling libc v0.2.77
Compiling log v0.4.11
Compiling cfg-if v0.1.10
Compiling scoped-tls v1.0.0
Compiling semver v0.9.0
Compiling rustc_version v0.2.3
Compiling generator v0.6.22
Compiling loom v0.3.5
Compiling loom-asan-test v0.1.0 (/Users/parasyte/other-projects/loom-asan-test)
Finished dev [unoptimized + debuginfo] target(s) in 8.19s
$ lldb target/debug/loom-asan-test
(lldb) target create "target/debug/loom-asan-test"
Current executable set to '/Users/parasyte/other-projects/loom-asan-test/target/debug/loom-asan-test' (x86_64).
(lldb) r
Process 40279 launched: '/Users/parasyte/other-projects/loom-asan-test/target/debug/loom-asan-test' (x86_64)
Completed in 1 iterations
==40279==WARNING: ASan is ignoring requested __asan_handle_no_return: stack type: default top: 0x7ffeefc00000; bottom 0x000104106000; size: 0x7ffdebafa000 (140728557608960)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
=================================================================
==40279==ERROR: AddressSanitizer: stack-use-after-scope on address 0x000104107b28 at pc 0x0001003ccd4d bp 0x000104107600 sp 0x000104106dc0
READ of size 168 at 0x000104107b28 thread T0
#0 0x1003ccd4c in wrap_memmove+0x16c (librustc-nightly_rt.asan.dylib:x86_64+0x17d4c)
Address 0x000104107b28 is a wild pointer.
SUMMARY: AddressSanitizer: stack-use-after-scope (librustc-nightly_rt.asan.dylib:x86_64+0x17d4c) in wrap_memmove+0x16c
Shadow bytes around the buggy address:
0x100020820f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020820f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020820f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020820f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020820f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x100020820f60: f1 f1 f1 f1 f8[f8]f2 f2 00 00 f3 f3 00 00 00 00
0x100020820f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020820f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020820f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020820fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x100020820fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
Shadow gap: cc
2020-09-21 01:30:56.687976-0700 loom-asan-test[40279:9567333] ==40279==WARNING: ASan is ignoring requested __asan_handle_no_return: stack type: default top: 0x7ffeefc00000; bottom 0x000104106000; size: 0x7ffdebafa000 (140728557608960)
2020-09-21 01:30:56.688357-0700 loom-asan-test[40279:9567333] False positive error reports may follow
2020-09-21 01:30:56.688365-0700 loom-asan-test[40279:9567333] For details see https://github.com/google/sanitizers/issues/189
2020-09-21 01:30:56.688371-0700 loom-asan-test[40279:9567333] =================================================================
2020-09-21 01:30:56.688377-0700 loom-asan-test[40279:9567333] ==40279==ERROR: AddressSanitizer: stack-use-after-scope on address 0x000104107b28 at pc 0x0001003ccd4d bp 0x000104107600 sp 0x000104106dc0
2020-09-21 01:30:56.688383-0700 loom-asan-test[40279:9567333] READ of size 168 at 0x000104107b28 thread T0
2020-09-21 01:30:56.688388-0700 loom-asan-test[40279:9567333] #0 0x1003ccd4c in wrap_memmove+0x16c (librustc-nightly_rt.asan.dylib:x86_64+0x17d4c)
2020-09-21 01:30:56.688393-0700 loom-asan-test[40279:9567333]
2020-09-21 01:30:56.688399-0700 loom-asan-test[40279:9567333] Address 0x000104107b28 is a wild pointer.
2020-09-21 01:30:56.688405-0700 loom-asan-test[40279:9567333] SUMMARY: AddressSanitizer: stack-use-after-scope (librustc-nightly_rt.asan.dylib:x86_64+0x17d4c) in wrap_memmove+0x16c
2020-09-21 01:30:56.688410-0700 loom-asan-test[40279:9567333] Shadow bytes around the buggy address:
2020-09-21 01:30:56.688442-0700 loom-asan-test[40279:9567333] 0x100020820f10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688447-0700 loom-asan-test[40279:9567333] 0x100020820f20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688452-0700 loom-asan-test[40279:9567333] 0x100020820f30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688472-0700 loom-asan-test[40279:9567333] 0x100020820f40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688477-0700 loom-asan-test[40279:9567333] 0x100020820f50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688503-0700 loom-asan-test[40279:9567333] =>0x100020820f60: f1 f1 f1 f1 f8[f8]f2 f2 00 00 f3 f3 00 00 00 00
2020-09-21 01:30:56.688508-0700 loom-asan-test[40279:9567333] 0x100020820f70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688514-0700 loom-asan-test[40279:9567333] 0x100020820f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688519-0700 loom-asan-test[40279:9567333] 0x100020820f90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688525-0700 loom-asan-test[40279:9567333] 0x100020820fa0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688530-0700 loom-asan-test[40279:9567333] 0x100020820fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
2020-09-21 01:30:56.688535-0700 loom-asan-test[40279:9567333] Shadow byte legend (one shadow byte represents 8 application bytes):
2020-09-21 01:30:56.688541-0700 loom-asan-test[40279:9567333] Addressable: 00
2020-09-21 01:30:56.688547-0700 loom-asan-test[40279:9567333] Partially addressable: 01 02 03 04 05 06 07
2020-09-21 01:30:56.688552-0700 loom-asan-test[40279:9567333] Heap left redzone: fa
2020-09-21 01:30:56.688557-0700 loom-asan-test[40279:9567333] Freed heap region: fd
2020-09-21 01:30:56.688561-0700 loom-asan-test[40279:9567333] Stack left redzone: f1
2020-09-21 01:30:56.688567-0700 loom-asan-test[40279:9567333] Stack mid redzone: f2
2020-09-21 01:30:56.688572-0700 loom-asan-test[40279:9567333] Stack right redzone: f3
2020-09-21 01:30:56.688577-0700 loom-asan-test[40279:9567333] Stack after return: f5
2020-09-21 01:30:56.688582-0700 loom-asan-test[40279:9567333] Stack use after scope: f8
2020-09-21 01:30:56.688601-0700 loom-asan-test[40279:9567333] Global redzone: f9
2020-09-21 01:30:56.688605-0700 loom-asan-test[40279:9567333] Global init order: f6
2020-09-21 01:30:56.688629-0700 loom-asan-test[40279:9567333] Poisoned by user: f7
2020-09-21 01:30:56.688633-0700 loom-asan-test[40279:9567333] Container overflow: fc
2020-09-21 01:30:56.688638-0700 loom-asan-test[40279:9567333] Array cookie: ac
2020-09-21 01:30:56.688651-0700 loom-asan-test[40279:9567333] Intra object redzone: bb
2020-09-21 01:30:56.688657-0700 loom-asan-test[40279:9567333] ASan internal: fe
2020-09-21 01:30:56.688676-0700 loom-asan-test[40279:9567333] Left alloca redzone: ca
2020-09-21 01:30:56.688681-0700 loom-asan-test[40279:9567333] Right alloca redzone: cb
2020-09-21 01:30:56.688704-0700 loom-asan-test[40279:9567333] Shadow gap: cc
==40279==ABORTING
Process 40279 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
frame #0: 0x00007fff6b0ad33a libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill:
-> 0x7fff6b0ad33a <+10>: jae 0x7fff6b0ad344 ; <+20>
0x7fff6b0ad33c <+12>: movq %rax, %rdi
0x7fff6b0ad33f <+15>: jmp 0x7fff6b0a7629 ; cerror_nocancel
0x7fff6b0ad344 <+20>: retq
Target 0: (loom-asan-test) stopped.
(lldb) bt
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
error: need to add support for DW_TAG_base_type '()' encoded with DW_ATE = 0x7, bit_size = 0
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT
* frame #0: 0x00007fff6b0ad33a libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007fff6b169e60 libsystem_pthread.dylib`pthread_kill + 430
frame #2: 0x00007fff6b034808 libsystem_c.dylib`abort + 120
frame #3: 0x0000000100417176 librustc-nightly_rt.asan.dylib`__sanitizer::Abort() + 70
frame #4: 0x0000000100416ab4 librustc-nightly_rt.asan.dylib`__sanitizer::Die() + 196
frame #5: 0x00000001003fe954 librustc-nightly_rt.asan.dylib`__asan::ScopedInErrorReport::~ScopedInErrorReport() + 420
frame #6: 0x00000001003fe1ee librustc-nightly_rt.asan.dylib`__asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) + 1198
frame #7: 0x00000001003ccd6c librustc-nightly_rt.asan.dylib`wrap_memmove + 396
frame #8: 0x00007fff6b197b84 libunwind.dylib`unw_init_local + 33
frame #9: 0x00007fff6b198563 libunwind.dylib`unwind_phase2 + 41
frame #10: 0x00007fff6b19be79 libunwind.dylib`_Unwind_Resume + 51
frame #11: 0x0000000100067fed loom-asan-test`generator::yield_::yield_::h44b204ac82d88f36(v=<unavailable>) at yield_.rs:0:1
frame #12: 0x000000010002955a loom-asan-test`loom::rt::scheduler::spawn_threads::_$u7b$$u7b$closure$u7d$$u7d$::_$u7b$$u7b$closure$u7d$$u7d$::h78704182e038d361((null)=closure-0 @ 0x0000000104108220) at scheduler.rs:138:56
frame #13: 0x00000001001193b6 loom-asan-test`generator::gen_impl::GeneratorImpl$LT$A$C$T$GT$::init_code::_$u7b$$u7b$closure$u7d$$u7d$::h9b912bd353b5bf3b at gen_impl.rs:308:21
frame #14: 0x00000001000e54fc loom-asan-test`generator::stack::StackBox$LT$F$GT$::call_once::h30684f411b2ff6b2(data=0x0000000104108e80) at mod.rs:135:13
frame #15: 0x0000000100121407 loom-asan-test`generator::stack::Func::call_once::h4daec637ff184991(self=Func @ 0x0000000104108620) at mod.rs:117:9
frame #16: 0x0000000100138f4d loom-asan-test`generator::gen_impl::gen_init::_$u7b$$u7b$closure$u7d$$u7d$::h83bb20a1b69e36a1 at gen_impl.rs:513:9
frame #17: 0x000000010012b57c loom-asan-test`core::ops::function::FnOnce::call_once::h1e673fc2980aed3f((null)=closure-0 @ 0x0000000104108820, (null)=<unavailable>) at function.rs:227:5
frame #18: 0x0000000100126e1a loom-asan-test`std::panicking::try::do_call::hff3b691df2061b12(data="๏ฟฝ\x8f\x10\x04\x01") at panicking.rs:381:40
frame #19: 0x0000000100127b9d loom-asan-test`__rust_try + 29
frame #20: 0x0000000100126b18 loom-asan-test`std::panicking::try::hf89589518d2a7178(f=closure-0 @ 0x0000000104108b20) at panicking.rs:345:19
frame #21: 0x000000010012b2a1 loom-asan-test`std::panic::catch_unwind::hd2813c5b6faf7255(f=closure-0 @ 0x0000000104108ba8) at panic.rs:382:14
frame #22: 0x00000001001387f8 loom-asan-test`generator::gen_impl::gen_init::hc5eedddfe516064c((null)=0, f=0x0000000104108fa0) at gen_impl.rs:527:25
AFAICT, this is caused by the way the generator
crate is used. The error can be worked around by enabling the detect_stack_use_after_return
option:
$ RUSTFLAGS='-Zsanitizer=address' ASAN_OPTIONS=detect_stack_use_after_return=1 cargo run
Compiling semver-parser v0.7.0
Compiling cc v1.0.60
Compiling libc v0.2.77
Compiling log v0.4.11
Compiling semver v0.9.0
Compiling rustc_version v0.2.3
Compiling generator v0.6.22
Compiling loom v0.3.5
Compiling loom-asan-test v0.1.0 (/Users/parasyte/other-projects/loom-asan-test)
Finished dev [unoptimized + debuginfo] target(s) in 6.77s
Running `target/debug/loom-asan-test`
Completed in 1 iterations
==42679==WARNING: ASan is ignoring requested __asan_handle_no_return: stack type: default top: 0x7ffee4596000; bottom 0x00010fdb5000; size: 0x7ffdd47e1000 (140728168484864)
False positive error reports may follow
For details see https://github.com/google/sanitizers/issues/189
The above warning is also printed on Linux, even without the macOS workaround.
Given I think it is not reasonably possible to make it a repr(transparent)
u64
without weird behavior arising, maybe just document this fact somewhere prominent, eg. in the README's Limitations section?
Add support for atomic::fence
.
#[test]
fn send_on_closed_channel() {
loom::model(|| {
let (sender, receiver) = loom::sync::mpsc::channel();
std::mem::drop(receiver);
sender.send(()).unwrap_err();
})
}
This simple test above fails with:
thread 'send_on_closed_channel' panicked at 'assertion failed: `(left == right)`
left: `0`,
right: `1`: Messages leaked', /.../loom/src/rt/mpsc.rs:142:9
Is this correct/intentional? There is no memory leak going on here, the value that fails to be sent is correctly dropped.
EDIT: This happens on 0.3.3
and latest version in git. I have not tried older versions.
Loom requires must be able to run a very large number of permutations. Currently, there is significant overhead in managing the required metadata to check each execution.
There are two vectors by which performance can be improved:
Most of the overhead is related to allocating and deallocating memory. Switching to an arena should significantly improve performance.
Each execution of the program is independent. It should be possible to use multiple threads to run the tests.
@josevalim kindly pointed me to another paper on stateless model checking: Optimal dynamic partial order reduction. We probably should investigate to see how it fits.
I have a library that uses std::thread::park()
and Thread::unpark()
for parking one thread when it must wait for another thread to reach a certain point. These tests deadlock under loom
. I assume simply because the execution order it's currently evaluating demand that the thread waiting in park()
make progress, and not the other ones. And this will obviously never happen.
I have currently gotten the tests to pass by conditionally replacing std::thread::park()
with loom::thread::yield_now()
. However, I'm not fully sure this tests what I want it to test ๐ค
Since I don't know anything about loom
internals etc I will put it like this: Would it make sense to expose loom::thread::park
and potentially a corresponding loom::thread::Thread
?
Add an option for the scheduler to randomly pick a branch instead of doing an exhaustive exploration.
In this mode, the fuzzer would iterate for a max duration, taking a random path each time.
loom::lazy_static!{
static ref ID: usize = 0x42;
}
loom::thread_local!{
static BAR: Bar = Bar;
}
struct Bar;
impl Drop for Bar {
fn drop(&mut self) {
let _ = &*ID;
}
}
fn main() {
loom::model(|| {
BAR.with(|_| ());
});
}
output:
thread 'main' panicked at 'attempted to access lazy_static during shutdown', /loom-0.3.4/src/rt/lazy_static.rs:43:9
Maybe we should be ensured that drop of lazy_static!
occurs after thread_local!
.
Everything is in the title :) As far as I can see on docs.rs the Atomic<>
type is not public, meaning that these cannot be implemented outside of loom
.
loom
panics on the following test case:
lazy_static::lazy_static! {
static ref A: AtomicUsize = AtomicUsize::new(0);
}
loom::thread_local! {
static B: usize = A.load(Relaxed);
}
#[test]
fn load_after_lazy_static() {
loom::model(|| {
let t1 = thread::spawn(|| {
B.try_with(|h| *h).unwrap_or_else(|_| A.load(Relaxed));
});
let t2 = thread::spawn(|| {
B.try_with(|h| *h).unwrap_or_else(|_| A.load(Relaxed));
});
t1.join().unwrap();
t2.join().unwrap();
});
}
Reporting:
~~~~~~~~ THREAD 1 ~~~~~~~~
~~~~~~~~ THREAD 0 ~~~~~~~~
~~~~~~~~ THREAD 2 ~~~~~~~~
thread 'main' panicked at '
Causality violation: Concurrent load and mut accesses.
created: tests/atomic.rs:10:33
with_mut: thread #1 @ tests/atomic.rs:10:33
load: thread #2 @ tests/atomic.rs:14:23
atomic.rs:10
is the call to AtomicUsize::new
for static ref A
line, and atomic.rs:14
is the load
in static B
.
I think this code should be fine?
While trying to use loom on a project of mine, I got a message like the following:
running 1 test
thread panicked while panicking. aborting.
error: test failed, to rerun pass '--lib'
Caused by:
process didn't exit successfully: `/home/restioson/IdeaProjects/loom_reproduce_sigill/target/release/deps/loom_reproduce_sigill-e63fb4bdfa431379` (signal: 4, SIGILL: illegal instruction)
Note on the SIGILL: rust-lang/rust#52633
This behaviour is exhibited across release, debug, nightly, and stable. I am running Linux, specifically Pop!_OS 20.04 LTS x86_64, with a kernel version of 5.4.0-7634-generic.
I attempted to debug it using cargo-with and gdb (cargo with "rust-gdb" -- test
). I found that the most useful breakpoint to set was std::panicking::panic_count::increment
, as the backtraces from here were often helpful.
Attempted reproduction:
This revealed that the first panic was line 14 of the example, in loom_reproduce_sigill::DoSomething::do_something
, when a try_lock
result is unwrapped. The second panic was in loom::sync::RwLock::write, at line 83 of rwlock.rs. This comes from the following expect
: self.data.try_write().expect("loom::RwLock state corrupt")
. This is called from the drop impl of Droppable. I found a note in #115 that sometimes panics in drops can cause this to occur. I think this may be the same issue, but the drop-panic comes from loom itself. The backtraces of the panics are in a gist here and MVP and reproduction is available here: https://github.com/Restioson/loom_reproduce_sigill/blob/master/src/lib.rs.
In the project where this was discovered, the first panic was the assertion that fails when a deadlock is found, and the second was where loom called set_active
with a None
value from the schedule. This is obviously very different so I'm probably barking up the wrong tree with the reproduction honestly. The original panics are here which may be of more use. Additionally, the code for it is here. TL;DR I think the reproduction effort failed since the panic is different :(. I can try later to minimise the original while keeping the same panics if that'd be helpful.
It seems like the second panic in the original project is also from a Drop impl (loom's Arc), although this time it is quite a while away from it, inside of loom::rt::thread::Set::active_mut
, which is passed None
, which it then unwraps. This might be fixable or not, I have no idea. What's interesting here, though, is that if you uncomment the commented lines in tests/loom.rs
, it doesn't SIGILL anymore and panics once only. I thought this might be due to the early drop of rx2, but if you remove rx2 it still SIGILLs. You can also remove the drop of tx and as far as I can tell, the second panic remains the same.
Instead of logging a large header, what do you think about creating a separate
tracing
span representing each iteration, with the iteration number as a field? That way, everything recorded during that iteration will be annotated with the iteration number.This isn't necessarily a blocker for merging this PR, so it might be worth doing in a follow-up branch.
Follow-up from #91.
I think ideally, it would be fantastic if we had a span structure like this for all events loom records:
test{name=my_loom_test}:execution{iter=15}:thread{id=1}: something happened...
(note that recording the test name may be difficult without asking the user to provide it, or providing a test macro...)
Data race is undetected in code below (curiously, with additional release
operations the issue is reported as expected):
use loom::sync::Arc;
use loom::sync::atomic::AtomicUsize;
use loom::sync::atomic::Ordering::*;
use loom::thread;
fn main() {
loom::model(|| {
let a = Arc::new(AtomicUsize::new(0));
let b = a.clone();
let thread = thread::spawn(move || {
unsafe { a.unsync_load() };
});
// b.store(1, Release);
b.store(1, Relaxed);
// b.store(1, Release);
thread.join().unwrap();
});
}
I get
---- test::list stdout ----
thread 'test::list' panicked at 'index out of bounds: the len is 7 but the index is 17293822569102704640', /Users/jrmuizel/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.3.4/src/rt/object.rs:361:25
when running
use std::ptr;
use loom::sync::atomic::AtomicPtr;
use loom::sync::atomic::Ordering::{Acquire, Relaxed, Release};
use loom::thread;
struct List {
next: AtomicPtr<List>,
prev: AtomicPtr<List>,
}
impl List {
fn new() -> Box<Self> {
Box::new(List {
next: AtomicPtr::new(ptr::null_mut()),
prev: AtomicPtr::new(ptr::null_mut())
})
}
}
fn len(mut next: *mut List) -> i32 {
let mut count = 0;
while !next.is_null() {
count += 1;
next = unsafe { (*next).next.load(Acquire) };
}
count
}
fn del(node: *const List) {
unsafe {
let next = (*node).next.load(Acquire);
let prev = (*node).prev.load(Acquire);
(*prev).next.store(next, Release);
if !next.is_null() {
(*next).prev.store(prev, Release);
}
}
}
#[test]
fn list() {
loom::model(|| {
let mut first = List::new();
let mut second = List::new();
let mut third = List::new();
first.next.store(second.as_mut(), Release);
second.prev.store(first.as_mut(), Release);
second.next.store(third.as_mut(), Release);
third.prev.store(second.as_mut(), Release);
let jh = thread::spawn(move || {
del(third.as_ref());
});
del(second.as_ref());
jh.join().unwrap();
let count = len(first.as_mut());
assert_eq!(1, count);
});
}
With a crate with src/lib.rs
containing
#[cfg(test)]
extern crate loom;
mod tests {
#[test]
fn basic_usage() {
loom::model(|| { });
}
}
and Cargo.toml
containing
[package]
name = "mypackage"
version = "0.1.0"
authors = []
edition = "2018"
[dev-dependencies]
loom = "0.2.4"
[profile.dev]
opt-level = 2
compiled with rustc nightly 2019-09-03 (2019-07-18 has the same behavior), trying to run cargo test
leads to SIGILL.
Some more information I gathered:
Thread 2 "tests::basic_us" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7ffff6e4d700 (LWP 26606)]
0x00005555555b4f0c in generator::rt::Context::new (size=<optimized out>) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.17/src/rt.rs:87
87 stack: Stack::new(size),
(gdb) bt
#0 0x00005555555b4f0c in generator::rt::Context::new (size=<optimized out>) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.17/src/rt.rs:87
#1 0x00005555555b3b63 in generator::gen_impl::GeneratorImpl<A,T>::new (size=4096) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.17/src/gen_impl.rs:103
#2 generator::gen_impl::Gn<A>::new_opt (size=4096, f=...) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.17/src/gen_impl.rs:69
#3 generator::gen_impl::Gn<A>::new (f=...) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.17/src/gen_impl.rs:61
#4 0x00005555555af6c1 in loom::rt::scheduler::spawn_threads::{{closure}} () at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.2.5/src/rt/scheduler.rs:111
#5 core::iter::adapters::map_fold::{{closure}} (elt=<optimized out>, acc=<optimized out>) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/iter/adapters/mod.rs:659
#6 core::iter::traits::iterator::Iterator::fold::ok::{{closure}} (x=<optimized out>, acc=<optimized out>) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/iter/traits/iterator.rs:1813
#7 core::iter::traits::iterator::Iterator::try_fold (self=<optimized out>, f=..., init=<optimized out>) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/iter/traits/iterator.rs:1694
#8 core::iter::traits::iterator::Iterator::fold (self=..., init=<optimized out>, f=...) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/iter/traits/iterator.rs:1816
#9 <core::iter::adapters::Map<I,F> as core::iter::traits::iterator::Iterator>::fold (self=..., init=<optimized out>, g=<error reading variable: Cannot access memory at address 0x40>)
at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/iter/adapters/mod.rs:692
#10 0x00005555555ad8c2 in core::iter::traits::iterator::Iterator::for_each (self=..., f=...) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/iter/traits/iterator.rs:616
#11 <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::spec_extend (self=0x7ffff6e4be98, iterator=...) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/liballoc/vec.rs:1962
#12 <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T,I>>::from_iter (iterator=...) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/liballoc/vec.rs:1945
#13 <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter (iter=...) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/liballoc/vec.rs:1832
#14 core::iter::traits::iterator::Iterator::collect (self=...) at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/iter/traits/iterator.rs:1478
#15 loom::rt::scheduler::spawn_threads (n=4) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.2.5/src/rt/scheduler.rs:109
#16 loom::rt::scheduler::Scheduler::new (capacity=4) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.2.5/src/rt/scheduler.rs:31
#17 0x000055555556f388 in loom::model::Builder::check (self=0x7ffff6e4c688, f=...) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.2.5/src/model.rs:135
#18 0x000055555556f278 in loom::model::model (f=...) at /home/ekleog/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.2.5/src/model.rs:198
#19 0x000055555556d446 in neobuffer::tests::basic_usage () at neobuffer/src/lib.rs:7
#20 0x00005555555680ea in neobuffer::tests::basic_usage::{{closure}} () at neobuffer/src/lib.rs:6
#21 0x0000555555568bde in core::ops::function::FnOnce::call_once () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/ops/function.rs:227
#22 0x000055555557c06f in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/liballoc/boxed.rs:922
#23 0x00005555555c45ea in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:80
#24 0x000055555559686e in std::panicking::try () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/panicking.rs:275
#25 std::panic::catch_unwind () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/panic.rs:394
#26 test::run_test::run_test_inner::{{closure}} () at src/libtest/lib.rs:1413
#27 0x00005555555718e5 in std::sys_common::backtrace::__rust_begin_short_backtrace () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/sys_common/backtrace.rs:77
#28 0x0000555555575ae5 in std::thread::Builder::spawn_unchecked::{{closure}}::{{closure}} () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/thread/mod.rs:470
#29 <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/panic.rs:315
#30 std::panicking::try::do_call () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/panicking.rs:296
#31 0x00005555555c45ea in __rust_maybe_catch_panic () at src/libpanic_unwind/lib.rs:80
#32 0x0000555555576112 in std::panicking::try () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/panicking.rs:275
#33 std::panic::catch_unwind () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/panic.rs:394
#34 std::thread::Builder::spawn_unchecked::{{closure}} () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libstd/thread/mod.rs:469
#35 core::ops::function::FnOnce::call_once{{vtable-shim}} () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/libcore/ops/function.rs:227
#36 0x00005555555b6f5f in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/liballoc/boxed.rs:922
#37 0x00005555555c3d40 in <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once () at /rustc/b9de4ef89e0e53099a084001b26ec3207c5f8391/src/liballoc/boxed.rs:922
#38 std::sys_common::thread::start_thread () at src/libstd/sys_common/thread.rs:13
#39 std::sys::unix::thread::Thread::new::thread_start () at src/libstd/sys/unix/thread.rs:79
#40 0x00007ffff77b45a7 in start_thread () from /nix/store/pr73kx0cdszbv9iw49g8dzi0nqxfjbx2-glibc-2.27/lib/libpthread.so.0
#41 0x00007ffff72d522f in clone () from /nix/store/pr73kx0cdszbv9iw49g8dzi0nqxfjbx2-glibc-2.27/lib/libc.so.6
(gdb) disas
Dump of assembler code for function generator::rt::Context::new:
0x00005555555b4f00 <+0>: sub $0x18,%rsp
0x00005555555b4f04 <+4>: mov %rsp,%rdi
0x00005555555b4f07 <+7>: callq 0x5555555b4320 <generator::stack::Stack::new>
=> 0x00005555555b4f0c <+12>: ud2
End of assembler dump.
Am I the only one seeing this issue? It is quite weird that this appears to not have been noticed by any kind of CI, given it makes me unable to use any feature of loom.
I try to use loom to check my project
When I run cargo test --features=concurrent-test --test test
, it report
Compiling atomic_value v0.1.0 (/home/sherlock/git/atomic_value) Finished test [unoptimized + debuginfo] target(s) in 0.57s Running target/debug/deps/test-6a9029d909e56a54 running 1 test thread panicked while panicking. aborting. error: test failed, to rerun pass '--test test' Caused by: process didn't exit successfully: `/home/sherlock/git/atomic_value/target/debug/deps/test-6a9029d909e56a54` (signal: 4, SIGILL: illegal instruction)
Is there anything I write wrong? If have, could anyone help me to fix it?
Condvar waits and thread parks should check spurious wake ups.
There is no equivalent in loom
for std::sync::Weak<T>
and std::sync::Arc::<T>::downgrade(this: &Arc<T>) -> Weak<T>
. It would be needed for tokio-rs/tokio#2649
Loom's Mutex
lacks a try_lock
method like std::sync::Mutex
has. In order to simulate all code that uses std
's mutex, we should add this.
std::sync::Barrier
std::sync::atomic::AtomicI*
integer types (the signed ones); done in #189Atomic*::into_inner
; done in #327Atomic*::get_mut
(see also #154)std::sync::atomic::compiler_fence
std::sync::atomic::fence
with SeqCst
; done in #220UnsafeCell::raw_get
(see also #276)std::sync::Weak
and std::sync::Arc::downgrade
(see also #156)Atomic*::as_ptr
(see also #298)thread_local
(see also #346)LocalKey<Cell<T>>::*
and LocalKey<RefCell<T>>::*
(see also #347)Currently unstable:
std::sync::atomic::Atomic{I,U}128
SyncUnsafeCell
(see also #333)Based on my admittedly skin-deep understanding of how Loom works, I think Loom would benefit greatly from a more methodical approach to finding interesting traces to test. Specifically, I'm thinking of lineage-driven fault injection techniques (for a deeper dive into what that is and how it works, see this paper, it is very accessible!).
The gist is that rather than randomly injecting faults, or naively exploring the state space, you instead store the execution trace of a successful run (i.e. an execution that produces the correct outcome); this trace is then examined for all of the supports for that successful outcome, which is then used to select a fault to inject which is most likely to produce an error. A new execution is traced, but this time the selected fault is injected; if the run fails, then you know the precise lineage of the error and can print useful information for the developer; if the run was still successful, then the process is repeated until all possible error-producing paths have been explored. Generally there is a finite bound placed on the amount of time allowed for the system to successfully recover. It is also common to place a bound on the number of faults that can occur during execution, and to provide some amount of time for the system to recover without new faults being injected.
A quick example: Given a distributed key/value store, a single test might be that a write to a key on one node is successfully replicated to all nodes in the cluster. The initial execution trace shows that supports for this outcome include messages being sent over the network successfully, and the existence of a quorum in the cluster. LDFI determines that the first possible error could occur if messages are dropped due to a network error when replicating. The next execution injects such an error, but because the database has code to detect this case and handle it by retrying the operation, the successful outcome is still produced. LDFI performs another analysis on this new trace, and determines that causing a network partition during replication can cause the execution to fail. So again, another execution is performed and that fault is injected, and this time, the outcome is unsuccessful. As a result, we know exactly what order of events was necessary in order to produce the failure, and can produce a lineage graph and other useful information for the developer.
The reason this is so much more effective than fuzzing and other random fault injection approaches, is that the LDFI algorithm is essentially constructing a model of the system, and then surgically injecting faults in places where the model shows a weakness. This means overall execution is much faster, and the result is essentially a proof that the system under test is resilient against the set of injectable faults, within the bounds defined by the parameters I mentioned earlier.
I've only used this in distributed applications, but the Molly paper demonstrates that it is by no means restricted to that domain, and I don't see any particular reason why applying LDFI to the problems Loom is aiming to solve is any different.
The key things that LDFI depends on though (in my opinion) are these:
I hope this gives you some ideas to work with, Loom looks great, and I'm excited to be able to use it in my own projects! If LDFI ends up being a good fit, I think it could make it a must-use kind of tool :)
If I create a new empty crate, add loom as a dependency and these two tests in src/lib.rs
:
#[test]
fn test1() {
loom::model(|| ());
}
#[test]
fn test2() {
loom::model(|| ());
}
Then I run this in a loop with:
while RUST_BACKTRACE=full cargo test -- --nocapture; do true; done
Then it will sometimes result in the following panic:
running 2 tests
Completed in 1 iterations
Completed in 1 iterations
thread 'test2' panicked at 'Box<Any>', /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/macros.rs:13:23
stack backtrace:
0: 0x10f2b0c1f - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h9b566daa2a455421
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
1: 0x10f2d649e - core::fmt::write::ha1b9f49163694e4d
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
2: 0x10f2ad647 - std::io::Write::write_fmt::h5bf7ecb3d7765923
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
3: 0x10f2b2f8a - std::panicking::default_hook::{{closure}}::h694d41cd08e58408
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
4: 0x10f2b2ccc - std::panicking::default_hook::ha6f28bd33ba2e689
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
5: 0x10f2b3558 - std::panicking::rust_panic_with_hook::hb00d3c1dea212869
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
6: 0x10f2db907 - std::panicking::begin_panic::hb693b834b4644a40
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
7: 0x10f2525a0 - generator::yield_::raw_yield::h30b3e9da3f8fe51b
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/macros.rs:13
8: 0x10f252490 - generator::yield_::yield_::h11333a583174f4a8
at /Users/faern/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.20/src/yield_.rs:106
9: 0x10f245ae5 - loom::rt::scheduler::spawn_threads::{{closure}}::{{closure}}::hf77784205a5f75de
at /Users/faern/.cargo/git/checkouts/loom-1f4be9fd68e6b59b/0b092a2/src/rt/scheduler.rs:138
10: 0x10f25858d - generator::gen_impl::GeneratorImpl<A,T>::init::{{closure}}::h925cbbe0b99895c0
at /Users/faern/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.20/src/gen_impl.rs:154
11: 0x10f253edb - core::ops::function::FnOnce::call_once{{vtable.shim}}::h43a9b1ff9e3ae9de
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
12: 0x10f262596 - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::h9921eb92259c04a7
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/panicking.rs:0
13: 0x10f2651b7 - generator::gen_impl::gen_init::{{closure}}::hbaf1b6aa22ef8d8a
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
14: 0x10f261e95 - core::ops::function::FnOnce::call_once::h0862c3dbef3b976c
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/panicking.rs:0
15: 0x10f2618fa - std::panicking::try::do_call::h567763505b5fcdd6
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/panicking.rs:331
16: 0x10f261b0d - __rust_try
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/panicking.rs:0
17: 0x10f261875 - std::panicking::try::h60348e4f56614890
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/panicking.rs:274
18: 0x10f261dc1 - std::panic::catch_unwind::hec7f19f75e187d04
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libstd/panicking.rs:0
19: 0x10f26500e - generator::gen_impl::gen_init::h5497ff06b8dbc23d
at /Users/faern/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/cell.rs:1643
test test2 ... ok
test test1 ... ok
This panic does not happen if I add --test-threads=1
to the test run. So it seems multiple loom tests ran in parallel race and cause this.
Tested and reproduced on macOS and Linux with loom 0.3.2, 0.3.4 and latest master branch from git. Both on latest stable Rust (1.43) and a recent nightly.
As seen from the output, both tests succeed, the exit code of cargo test
is still zero. So maybe this is not critical. But it is also not optimal as it confuses the reader of the test output.
Loom is based on a number of papers that are useful to read, but it would be nice to give an informal description of these as well. I wrote a brain dump of my understanding but it probably should be verified / edited:
Papers:
The idea of permutation testing concurrent code is to permute through every possible interleaving multiple threads can have. However, doing so naively results in combinatorial explosion. The number of possible permutations grows factorially. The idea of partial order reduction is to only run "unique" permutations. Two permutations are identical if they result in the same behavior. As an example, assuming sequential consistency, if two threads concurrently read from the same atomic value, it doesn't matter which thread reads first, the end result is
the same. In this case, the two permutations are identical and the goal of partial order reduction is to only run one of those two permutations.
DPOR does this reduction dynamically, i.e. at runtime. The idea is that you only need to permute concurrent accesses to a cell that are dependent, i.e. where the permutation results in different behavior. The way this is done is by running the execution "naively". It runs through the execution without any thread pre-emptions. Whenever an atomic object is accessed, it continues the execution with the same thread. Whenever a mutex is accessed (as long as it is unlocked), it
continues with the same thread. The only time the scheduler switches threads is when the current thread is blocked on a wait call or if the thread terminates. When it switches threads, it picks another thread randomly and eagerly schedules that thread.
The key is, whenever a thread (lets call it thread B) accesses a shared object (atomic, mutex, condvar, ...) loom tracks this access on the object. When a thread (thread B) accesses a shared object, it also checks when the last dependent access was performed. A dependent
access is another thread (thread A) accessing that object at a point that a) is not prior in the causality of the current thread (happens-before) and b) in a way that could result in different
behavior if the order is permuted. When this happens, the exploration (sequence of execution permutations) has to add a backtrack point in order to check the alternate permutation. To do this, loom will perform the exact same execution up until thread A accessed that
shared object. Right before thread A accesses that shared object, loom then permutes. The goal of the permutation is to get thread B to access the object before thread A. To do this, loom performs a preemption and schedules thread B to run so that it can access the
shared object.
However, there is the potential for thread B to be currently blocked (waiting for a mutex to unlock or a condvar to be signaled). In this case, some other thread X needs to be scheduled in order to unblock B, but we don't know which thread X is, so loom ends up permuting through
activating all threads at this point. By doing so, either thread B will eventually access the shared object before thread A or it is impossible for thread B to access the shared object before thread A.
DPOR is a depth-first exploration of permutations.
That is the overview of DPOR. DPOR performs an exhaustive exploration of the execution. This means that every possible concurrent behavior is explored. DPOR works great for small test
cases, but for more complex test cases, it still cannot reduce enough to complete the test in a reasonable time. Enter bounded DPOR. The idea of bounded dpor is that to catch 99.9% of all concurrency bugs does not require an exhaustive exploration of all possible concurrent
behaviors. most bugs can be caught by limiting the number of thread preemptions. By running all possible permutations that can be done with a low number (2~3) thread preemptions, almost all bugs can be caught. The problem is that naively limiting thread preemptions
would break DPOR because DPOR adds permutations detected at run time and by artificially limiting permutations that are explored, DPOR will be be unable to discover critical permutations to run. The Bounded DPOR algorithm counters this problem by adding potentially unnecessary
backtracking points when hitting a bound. This guarantees that all possible permutations within the bounds are executed, but there may be unnecessary (identical to previous permutations) permutations as well.
I notice that loom::sync
has a mock implementation of std::sync::Mutex
but lacks a mock std::sync::RwLock
. It would be nice to be able to test code using read-write locks with loom
.
I am trying to model a spinlock with Loom and running into a similar issue to #52 and #115. In particular, if I run my test loom
with
RUSTFLAGS="--cfg loom" RUST_BACKTRACE=1 cargo test --test loom -- --test-threads=1 --nocapture
I get thread panicked while panicking. aborting.
because Model exeeded maximum number of branches. This is often caused by an algorithm requiring the processor to make progress, e.g. spin locks.
.
#52 says:
the value of
max_branches
is to avoid an infinite loop when trying to model concurrent algorithms that are unbounded. For example, if you have a spin lock, if the thread that is spinning never yields, the model can go forever. If you have a legit case, you can increase the value ofmax_branches
.
Does that mean that I can't test my spinlock with Loom, or do I need to add something to my spinlock implementation or my test to make it work with Loom? My spinlock implementation and Loom code are in this branch.
Thanks!
Running the following code -
use loom::sync::atomic::AtomicUsize;
use std::sync::atomic::Ordering;
use std::thread;
#[test]
fn main() {
loom::model(|| {
let busy = AtomicUsize::new(0);
thread::spawn(move || busy.swap(1, Ordering::SeqCst));
});
}
results in
running 1 test
thread '<unnamed>' panicked at 'cannot access a scoped thread local variable without calling `set` first', <::std::macros::panic macros>:2:4
This is not deterministic either. I see this error message only once in about 5 runs.
I couldn't understand how this error message corresponds to the source code. Is there anything wrong with the code? I couldn't figure out if this error message is an issue with Loom or with the Rust compiler.
Corresponding section from Cargo.toml
:
[dependencies]
loom = "0.3.4"
Result of uname -a
:
Linux anduin 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Cargo version: cargo 1.42.0 (86334295e 2020-01-31)
Rustc version: rustc 1.42.0 (b8cedc004 2020-03-09)
I'm interested in providing "choice" to the model-checker. I'm doing this so I can simulate a lossy channel: each message can either be delivered, or lost. From what I understand of this library, it may be possible to do this using the existing primitives. However, I don't know what the idiomatic approach would be. Does anyone have guidance on how to do this?
Thanks!
After running loom for a very long time on the crossbeam Treiber stack example, I got this:
================== Iteration 100980000 ==================
thread 'main' panicked at 'already borrowed: BorrowMutError', /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/cell.rs:878:9
stack backtrace:
0: backtrace::backtrace::libunwind::trace
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
1: backtrace::backtrace::trace_unsynchronized
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
2: std::sys_common::backtrace::_print_fmt
at src/libstd/sys_common/backtrace.rs:78
3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
at src/libstd/sys_common/backtrace.rs:59
4: core::fmt::write
at src/libcore/fmt/mod.rs:1069
5: std::io::Write::write_fmt
at src/libstd/io/mod.rs:1504
6: std::sys_common::backtrace::_print
at src/libstd/sys_common/backtrace.rs:62
7: std::sys_common::backtrace::print
at src/libstd/sys_common/backtrace.rs:49
8: std::panicking::default_hook::{{closure}}
at src/libstd/panicking.rs:198
9: std::panicking::default_hook
at src/libstd/panicking.rs:218
10: std::panicking::rust_panic_with_hook
at src/libstd/panicking.rs:511
11: rust_begin_unwind
at src/libstd/panicking.rs:419
12: core::panicking::panic_fmt
at src/libcore/panicking.rs:111
13: core::option::expect_none_failed
at src/libcore/option.rs:1268
14: core::result::Result<T,E>::expect
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/result.rs:963
15: core::cell::RefCell<T>::borrow_mut
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/cell.rs:878
16: loom::rt::scheduler::Scheduler::with_execution::{{closure}}
at /home/jon/dev/others/loom/src/rt/scheduler.rs:48
17: scoped_tls::ScopedKey<T>::with
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:189
18: loom::rt::scheduler::Scheduler::with_execution
at /home/jon/dev/others/loom/src/rt/scheduler.rs:48
19: loom::rt::execution
at /home/jon/dev/others/loom/src/rt/mod.rs:135
20: loom::rt::branch
at /home/jon/dev/others/loom/src/rt/mod.rs:92
21: loom::rt::object::Ref<T>::branch_action
at /home/jon/dev/others/loom/src/rt/object.rs:325
22: loom::rt::arc::Arc::branch
at /home/jon/dev/others/loom/src/rt/arc.rs:123
23: loom::rt::arc::Arc::ref_dec
at /home/jon/dev/others/loom/src/rt/arc.rs:93
24: <loom::sync::arc::Arc<T> as core::ops::drop::Drop>::drop
at /home/jon/dev/others/loom/src/sync/arc.rs:90
25: core::ptr::drop_in_place
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
26: core::ptr::drop_in_place
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
27: core::ptr::drop_in_place
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
28: core::ptr::drop_in_place
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
29: core::ptr::drop_in_place
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
30: loom::rt::lazy_static::Set::init_static
at /home/jon/dev/others/loom/src/rt/lazy_static.rs:59
31: loom::lazy_static::Lazy<T>::get::{{closure}}
at /home/jon/dev/others/loom/src/lazy_static.rs:52
32: loom::rt::scheduler::Scheduler::with_execution::{{closure}}
at /home/jon/dev/others/loom/src/rt/scheduler.rs:48
33: scoped_tls::ScopedKey<T>::with
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:189
34: loom::rt::scheduler::Scheduler::with_execution
at /home/jon/dev/others/loom/src/rt/scheduler.rs:48
35: loom::rt::execution
at /home/jon/dev/others/loom/src/rt/mod.rs:135
36: loom::lazy_static::Lazy<T>::get
at /home/jon/dev/others/loom/src/lazy_static.rs:48
37: <crossbeam_epoch::default::COLLECTOR as core::ops::deref::Deref>::deref::__stability
at /home/jon/dev/others/loom/src/lib.rs:248
38: <crossbeam_epoch::default::COLLECTOR as core::ops::deref::Deref>::deref
at /home/jon/dev/others/loom/src/lib.rs:250
39: crossbeam_epoch::default::HANDLE::{{closure}}
at crossbeam-epoch/src/default.rs:17
40: core::ops::function::FnOnce::call_once
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
41: loom::thread::LocalKey<T>::try_with
at /home/jon/dev/others/loom/src/thread.rs:122
42: crossbeam_epoch::default::with_handle
at ./src/default.rs:42
43: crossbeam_epoch::default::pin
at ./src/default.rs:23
44: loom::treiber_stack::TreiberStack<T>::push
at crossbeam-epoch/tests/loom.rs:76
45: loom::treiber_stack::{{closure}}
at crossbeam-epoch/tests/loom.rs:139
46: loom::model::Builder::check::{{closure}}
at /home/jon/dev/others/loom/src/model.rs:198
47: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
48: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/boxed.rs:1008
49: loom::rt::scheduler::spawn_threads::{{closure}}::{{closure}}
at /home/jon/dev/others/loom/src/rt/scheduler.rs:140
50: generator::gen_impl::GeneratorImpl<A,T>::init::{{closure}}
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.20/src/gen_impl.rs:154
51: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
52: <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/boxed.rs:1008
53: generator::gen_impl::gen_init::{{closure}}
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.20/src/gen_impl.rs:387
54: core::ops::function::FnOnce::call_once
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
55: std::panicking::try::do_call
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:331
56: std::panicking::try
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:274
57: std::panic::catch_unwind
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394
58: generator::gen_impl::gen_init
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.20/src/gen_impl.rs:401
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
thread 'main' panicked at 'already borrowed: BorrowMutError', /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/cell.rs:878:9
stack backtrace:
0: 0x5568a4a1bb54 - backtrace::backtrace::libunwind::trace::h59a3549909dc85d8
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/libunwind.rs:86
1: 0x5568a4a1bb54 - backtrace::backtrace::trace_unsynchronized::h4914389b3e3bde2a
at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.46/src/backtrace/mod.rs:66
2: 0x5568a4a1bb54 - std::sys_common::backtrace::_print_fmt::hb5d6f7947976752c
at src/libstd/sys_common/backtrace.rs:78
3: 0x5568a4a1bb54 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h7374a32156a91956
at src/libstd/sys_common/backtrace.rs:59
4: 0x5568a4a44a6c - core::fmt::write::hadc5762fc721fd48
at src/libcore/fmt/mod.rs:1069
5: 0x5568a4a18673 - std::io::Write::write_fmt::h04f87a0d1d606933
at src/libstd/io/mod.rs:1504
6: 0x5568a4a1e605 - std::sys_common::backtrace::_print::h378a405d8cc63cba
at src/libstd/sys_common/backtrace.rs:62
7: 0x5568a4a1e605 - std::sys_common::backtrace::print::h41467859da656030
at src/libstd/sys_common/backtrace.rs:49
8: 0x5568a4a1e605 - std::panicking::default_hook::{{closure}}::h99cf0826a76d7d87
at src/libstd/panicking.rs:198
9: 0x5568a4a1e342 - std::panicking::default_hook::h48aeb629a84fa376
at src/libstd/panicking.rs:218
10: 0x5568a4a1ec62 - std::panicking::rust_panic_with_hook::h83a53b468de03d15
at src/libstd/panicking.rs:511
11: 0x5568a4a1e84b - rust_begin_unwind
at src/libstd/panicking.rs:419
12: 0x5568a4a43461 - core::panicking::panic_fmt::hc5ff7eb515c111c7
at src/libcore/panicking.rs:111
13: 0x5568a4a43283 - core::option::expect_none_failed::h43abe93f50cffa2b
at src/libcore/option.rs:1268
14: 0x5568a4a02c69 - core::result::Result<T,E>::expect::hfa48030a2708ecd6
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/result.rs:963
15: 0x5568a4a02c69 - core::cell::RefCell<T>::borrow_mut::h2d113da5caaad761
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/cell.rs:878
16: 0x5568a4a02c69 - loom::rt::scheduler::Scheduler::with_execution::{{closure}}::h3509fb2e84a1ffec
at /home/jon/dev/others/loom/src/rt/scheduler.rs:48
17: 0x5568a4a02c69 - scoped_tls::ScopedKey<T>::with::h0107b6daedaea52a
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:189
18: 0x5568a49ff8ed - loom::rt::scheduler::Scheduler::with_execution::h54a277ea94c579c0
at /home/jon/dev/others/loom/src/rt/scheduler.rs:48
19: 0x5568a49ff8ed - loom::rt::execution::h662ade3801d3f000
at /home/jon/dev/others/loom/src/rt/mod.rs:135
20: 0x5568a49ff8ed - loom::sync::atomic::atomic::Atomic<T>::load::h729d471f17abbb15
at /home/jon/dev/others/loom/src/sync/atomic/atomic.rs:28
21: 0x5568a49ff8ed - loom::sync::atomic::int::AtomicUsize::load::h3534981295176dc6
at /home/jon/dev/others/loom/src/sync/atomic/int.rs:33
22: 0x5568a49e7143 - crossbeam_epoch::atomic::Atomic<T>::load::h553aca8697d3dc5f
at crossbeam-epoch/src/atomic.rs:209
23: 0x5568a49e7143 - <crossbeam_epoch::sync::list::List<T,C> as core::ops::drop::Drop>::drop::h327c4fa948bc6f03
at crossbeam-epoch/src/sync/list.rs:222
24: 0x5568a49e7143 - core::ptr::drop_in_place::hf6e1a21efb828f17
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
25: 0x5568a49e7143 - core::ptr::drop_in_place::hbd1348898b0a63b6
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
26: 0x5568a49e7143 - core::ptr::drop_in_place::hb99a6cbe449c11da
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
27: 0x5568a49e7ad0 - alloc::sync::Arc<T>::drop_slow::hf3bbc42f6c58aa37
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/sync.rs:768
28: 0x5568a49e70a4 - core::ptr::drop_in_place::h64e50f617e2f7011
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
29: 0x5568a49e70a4 - core::ptr::drop_in_place::h7cca2e333485cc69
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
30: 0x5568a49e9c73 - core::ptr::drop_in_place::h1f9e9a03c054741b
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
31: 0x5568a49e9c73 - core::ptr::drop_in_place::h5868c99decd2b0c3
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
32: 0x5568a49e9c73 - core::ptr::drop_in_place::hc9dca4e76bd3ce6b
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr/mod.rs:177
33: 0x5568a49e9c73 - loom::rt::lazy_static::Set::init_static::h31229a10e164c0f4
at /home/jon/dev/others/loom/src/rt/lazy_static.rs:59
34: 0x5568a49e9c73 - loom::lazy_static::Lazy<T>::get::{{closure}}::h1fed97cf8ac479fd
at /home/jon/dev/others/loom/src/lazy_static.rs:52
35: 0x5568a49e9c73 - loom::rt::scheduler::Scheduler::with_execution::{{closure}}::hc58ec40ade8187ed
at /home/jon/dev/others/loom/src/rt/scheduler.rs:48
36: 0x5568a49e9c73 - scoped_tls::ScopedKey<T>::with::h96db6b97b9583615
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/scoped-tls-0.1.2/src/lib.rs:189
37: 0x5568a49ea2cc - loom::rt::scheduler::Scheduler::with_execution::hc1c9d5dd51d39a35
at /home/jon/dev/others/loom/src/rt/scheduler.rs:48
38: 0x5568a49ea2cc - loom::rt::execution::h1a20acd413d3aef3
at /home/jon/dev/others/loom/src/rt/mod.rs:135
39: 0x5568a49ea2cc - loom::lazy_static::Lazy<T>::get::h9529d211e7fe36f9
at /home/jon/dev/others/loom/src/lazy_static.rs:48
40: 0x5568a49e85ed - <crossbeam_epoch::default::COLLECTOR as core::ops::deref::Deref>::deref::__stability::hda60d5fb7e6b9ac8
at /home/jon/dev/others/loom/src/lib.rs:248
41: 0x5568a49e85ed - <crossbeam_epoch::default::COLLECTOR as core::ops::deref::Deref>::deref::h58b29347df7b4334
at /home/jon/dev/others/loom/src/lib.rs:250
42: 0x5568a49e85ed - crossbeam_epoch::default::HANDLE::{{closure}}::h812e5c27674f3edc
at crossbeam-epoch/src/default.rs:17
43: 0x5568a49e85ed - core::ops::function::FnOnce::call_once::hc4818b5110b57210
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
44: 0x5568a49a2fa9 - loom::thread::LocalKey<T>::try_with::h0a01685127dddbf9
at /home/jon/dev/others/loom/src/thread.rs:122
45: 0x5568a49a51b8 - crossbeam_epoch::default::with_handle::h9c8dcab9f23ad1ca
at /home/jon/dev/others/crossbeam/crossbeam-epoch/src/default.rs:42
46: 0x5568a49a51b8 - crossbeam_epoch::default::pin::hd61ad685e130ca12
at /home/jon/dev/others/crossbeam/crossbeam-epoch/src/default.rs:23
47: 0x5568a49a54a5 - loom::treiber_stack::TreiberStack<T>::push::h9885ac72b6062e01
at crossbeam-epoch/tests/loom.rs:76
48: 0x5568a49a5ea4 - loom::treiber_stack::{{closure}}::h34062294dfd9f36b
at crossbeam-epoch/tests/loom.rs:139
49: 0x5568a49a5ea4 - loom::model::Builder::check::{{closure}}::h635e763cf3dabc67
at /home/jon/dev/others/loom/src/model.rs:198
50: 0x5568a49a5ea4 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h83166ca326c76fb8
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
51: 0x5568a4a07acf - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::he1636322512be8af
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/boxed.rs:1008
52: 0x5568a4a07231 - loom::rt::scheduler::spawn_threads::{{closure}}::{{closure}}::h6709f32b98e6591b
at /home/jon/dev/others/loom/src/rt/scheduler.rs:140
53: 0x5568a4a06ac6 - generator::gen_impl::GeneratorImpl<A,T>::init::{{closure}}::hb182f3ba7238636d
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.20/src/gen_impl.rs:154
54: 0x5568a4a06ac6 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h19cb81283cb5653b
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
55: 0x5568a4a0c5bf - <alloc::boxed::Box<F> as core::ops::function::FnOnce<A>>::call_once::hd0feb95953821b78
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/boxed.rs:1008
56: 0x5568a4a0c78b - generator::gen_impl::gen_init::{{closure}}::h3286d450a96d6ade
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.20/src/gen_impl.rs:387
57: 0x5568a4a0c78b - core::ops::function::FnOnce::call_once::h2e07fd4f3dfbcc23
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ops/function.rs:232
58: 0x5568a4a0c78b - std::panicking::try::do_call::hcf4dd71c5303e067
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:331
59: 0x5568a4a0c78b - std::panicking::try::ha9c067d0babd3b88
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panicking.rs:274
60: 0x5568a4a0c78b - std::panic::catch_unwind::h8b6e0eb8c176ac00
at /home/jon/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/panic.rs:394
61: 0x5568a4a0c78b - generator::gen_impl::gen_init::h1686779882458f9d
at /home/jon/.cargo/registry/src/github.com-1ecc6299db9ec823/generator-0.6.20/src/gen_impl.rs:401
62: 0x0 - <unknown>
thread panicked while panicking. aborting.
Will try to debug tomorrow, just wanted to leave a record of it.
As I was figuring out how to get started with loom, I wanted a way to ensure that a particular access does return multiple values. This made me feel more secure that I was using loom correctly. The solution I have below is a bit different than making assertions within the loom::model
closure, because I can make claims about aggregates of all loom iterations, as opposed to making a claim about a single iteration.
The solution that I have below is what I ended up with (shown on the README example). Is there a better pattern for accomplishing this?
use loom::sync::Arc;
use loom::sync::atomic::AtomicUsize;
use loom::sync::atomic::Ordering::{Acquire, Release, Relaxed};
use loom::thread;
use std::collections::HashSet;
#[test]
fn seen_values() {
// Using std::sync instead of loom::sync b/c I don't care about permuting these... I think?
let seen_values = std::sync::Arc::new(std::sync::Mutex::new(HashSet::new()));
let seen_values_cloned = seen_values.clone();
loom::model(move || {
let num = Arc::new(AtomicUsize::new(0));
let ths: Vec<_> = (0..2)
.map(|_| {
let num = num.clone();
thread::spawn(move || {
let curr = num.load(Acquire);
num.store(curr + 1, Release);
})
})
.collect();
for th in ths {
th.join().unwrap();
}
seen_values_cloned.lock().unwrap().insert(num.load(Relaxed));
});
let expected: HashSet<usize> = [1, 2].iter().cloned().collect();
assert_eq!(*seen_values.lock().unwrap(), expected);
}
I don't know the architecture or future plans for loom so this suggestion might not fit, but one idea I had was: what if loom::model
returned an iterator of the return value of each closure call? Then I could write something like this:
use loom::sync::Arc;
use loom::sync::atomic::AtomicUsize;
use loom::sync::atomic::Ordering::{Acquire, Release, Relaxed};
use loom::thread;
use std::collections::HashSet;
#[test]
fn seen_values() {
let seen_values = loom::model(move || {
let num = Arc::new(AtomicUsize::new(0));
let ths: Vec<_> = (0..2)
.map(|_| {
let num = num.clone();
thread::spawn(move || {
let curr = num.load(Acquire);
num.store(curr + 1, Release);
})
})
.collect();
for th in ths {
th.join().unwrap();
}
// The closure would need to be parameterized to return a user-chosen type
num.load(Relaxed)
}).collect::<HashSet<usize>>();
let expected: HashSet<usize> = [1, 2].iter().cloned().collect();
assert_eq!(seen_values, expected);
}
Currently, atomics establish a modification order based on operation order. The C11 memory model allows for more relaxed orderings.
The implementation strategy is described in CDSChecker: Checking Concurrent Data Structures Written with C/C++ Atomics.
The following two test cases on loom 0.3.5 iterate 4 (test_load) and 3 (test_load_with_initial_store) times, even though I expected both to have the same behavior.
#![cfg(test)]
use loom::sync::atomic::AtomicUsize;
use std::sync::atomic::Ordering::SeqCst;
use std::sync::Arc;
#[test]
fn test_load() {
loom::model(move || {
let v1 = Arc::new(AtomicUsize::new(0));
let v2 = v1.clone();
loom::thread::spawn(move || {
v1.store(1, SeqCst);
});
v2.load(SeqCst);
});
}
#[test]
fn test_load_with_initial_store() {
loom::model(move || {
let v1 = Arc::new(AtomicUsize::new(0));
v1.store(0, SeqCst);
let v2 = v1.clone();
loom::thread::spawn(move || {
v1.store(1, SeqCst);
});
v2.load(SeqCst);
});
}
In 0.3, the Atomic*
types lost the get_mut()
methods, and replaced them with with_mut()
. A problem now is that the atomics from std
don't have with_mut()
, and the types from loom
don't have get_mut()
.
What's recommended way to use them?
I've been playing around with loom a bit and wrote this small sanity check seems to pass when it should not:
#[test]
#[should_panic]
fn causal_cell_race() {
loom::model(|| {
let x = Arc::new(CausalCell::new(1_u32));
let y = Arc::clone(&x);
let th1 = thread::spawn(move || {
x.with_mut(|v| unsafe { *v += 1 });
});
y.with_mut(|v| unsafe { *v += 10 });
th1.join().unwrap();
let v = y.with_mut(|v| unsafe { *v });
assert_eq!(12, v);
});
}
My intention is for loom's CausalCell
to catch the two threads entering the same "critical section" by simultaneously acquiring a *mut u32
; however, the test case seems to pass.
Interestingly, with a small change loom is able to catch the issue:
#[test]
#[should_panic]
fn causal_cell_race_fixed() {
loom::model(|| {
let x = Arc::new(CausalCell::new(1_u32));
let y = Arc::clone(&x);
let z = Arc::clone(&x);
let th1 = thread::spawn(move || {
x.with_mut(|v| unsafe { *v += 1 });
});
let th2 = thread::spawn(move || {
y.with_mut(|v| unsafe { *v += 10 });
});
th1.join().unwrap();
th2.join().unwrap();
let v = z.with_mut(|v| unsafe { *v });
assert_eq!(12, v);
});
}
The output shows an error as expected:
thread 'causal_cell_race_fixed' panicked at 'cell=VersionVec { versions: [0, 1, 0, 0] }; thread=VersionVec { versions: [0, 0, 1, 0] }', src/sync/causal/cell.rs:74:13
I could also be misunderstanding the loom API (very likely). Anyway, great work with loom--I think this type of tooling is very needed.
Hello! I believe I found a case where Loom skips some possible executions of a program. Here's a test case I prepared:
use loom::sync::atomic::{AtomicUsize, Ordering::*};
use std::hash::Hash;
use std::sync::Arc;
pub struct Entry {
key: AtomicUsize,
value: AtomicUsize,
}
#[test]
fn test_outcomes_should_be_the_same() {
assert_eq!(run_example(false), run_example(true));
}
fn run_example(with_extra_load: bool) -> Vec<(usize, usize)> {
println!("run_example(with_extra_load={})", with_extra_load);
collect_all_outcomes(move || {
let entry = Arc::new(Entry {
key: AtomicUsize::new(0),
value: AtomicUsize::new(0),
});
let entry1 = entry.clone();
let entry2 = entry.clone();
let t1 = loom::thread::spawn(move || {
entry1.key.store(1, SeqCst);
entry1.value.store(1, SeqCst);
entry1.value.store(0, SeqCst);
if with_extra_load {
entry1.key.load(SeqCst);
}
entry1.key.store(0, SeqCst);
});
let t2 = loom::thread::spawn(move || loop {
let value = entry2.value.load(SeqCst);
let key = entry2.key.load(SeqCst);
return (value, key);
});
t1.join().unwrap();
t2.join().unwrap()
})
}
/// Run all interleavings of the given function using Loom and return the sorted list of all
/// observed outcomes.
fn collect_all_outcomes<A: Hash + Ord + Send + 'static>(
f: impl Fn() -> A + Sync + Send + 'static,
) -> Vec<A> {
use std::collections::HashSet;
use std::sync::Mutex;
let result_set: Arc<Mutex<HashSet<A>>> = Arc::new(Mutex::new(HashSet::new()));
let result_set_2 = result_set.clone();
loom::model(move || {
let result = f();
result_set.lock().unwrap().insert(result);
});
let mut results = result_set_2.lock().unwrap().drain().collect::<Vec<_>>();
results.sort();
results
}
The test runs the example in two versions: one with an extra entry1.key.load(SeqCst);
call, and one without. The set of possible outcomes should be the same, since the load shouldn't affect the semantics of the program (we don't use its value, and all the other accesses are also SeqCst
, so it doesn't constrain ordering further).
However, when I run it, the test fails:
---- test_outcomes_should_be_the_same stdout ----
run_example(with_extra_load=false)
Completed in 240 iterations
run_example(with_extra_load=true)
Completed in 162 iterations
thread 'test_outcomes_should_be_the_same' panicked at 'assertion failed: `(left == right)`
left: `[(0, 0), (0, 1), (1, 0), (1, 1)]`,
right: `[(0, 0), (0, 1), (1, 1)]`', src/lib.rs:12:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Note that the test with an additional operation has fewer iterations. It misses the outcome (value=1, key=1), which could be generated for example by the following interleaving of the two threads:
t1: entry1.key.store(1, SeqCst);
t1: entry1.value.store(1, SeqCst);
t2: let value = entry2.value.load(SeqCst); // 1
t2: let key = entry2.key.load(SeqCst); // 1
t1: entry1.value.store(0, SeqCst);
t1: entry1.key.load(SeqCst); // 1
t1: entry1.key.store(0, SeqCst);
t2: return (value, key); // (1, 1)
This is the smallest example I could come up with - minimal tweaks (for example changing the order of reading key
and value
) seem to make the problem go away.
Tested on Loom 0.3.5.
The main readme recommends adding loom
as a dev-dependency
. Then it goes on to mention having loom
tests as integration tests:
For example, if the library includes a test file: tests/loom_my_struct.rs ...
This does not work. Because when building integration tests the main library is not built in #[cfg(test)]
mode but rather in the same way cargo build
builds it. So the dev-dependencies
are not available. As such, the crate can't use loom::thread::spawn
etc internally.
Would you recommend only doing loom
tests in unit tests? This is what I have seen some of the libraries using loom doing. Or having loom
as a normal, but optional, dependency under [dependencies]
? Then anyone can activate them when they use the crate. May be good or bad. The bad part is that the crate exposes features normal users are not meant to use. The good part is If the crate itself implements synchronization primitives, then users of that crate can activate the loom
feature if they want to loom test their stuff depending on the synchronization primitives in this library.
I'm trying to test a spinlock that uses parking_lot's lock_api
with Loom.
To implement lock_api
's RawRwLock
I need to supply a const
initial value, as specified on this line. This works with the standard library's std::sync::atomic::AtomicUsize
const fn new
, but not with Loom's atomic_int!
macro which generates a non-const fn new
.
I get the following error when running RUSTFLAGS="--cfg loom" cargo test
, where rin
is one of the fields of the struct I initialize. It does not throw an error with regular cargo test
(using std::sync::atomic
).
error[E0015]: calls in constants are limited to constant functions, tuple structs and tuple variants
--> src/lib.rs:50:14
|
50 | rin: AtomicUsize::new(0),
| ^^^^^^^^^^^^^^^^^^^
Is it possible to make loom::sync::atomic::AtomicUsize::new
const
, or can you recommend another workaround?
For some reason, Arc::strong_count
is unimplemented with the following panic message given if it is called:
no tests checking this? DELETED!
It appears that this was remarked on in #73, but not addressed. Is there a reason for this function being missing?
It's been broken on Windows Nightly for some time already, and it doesn't appear to be actively maintained. See Xudong-Huang/generator-rs#24
When a causality violation occurs during a CausalCell
access, the cell panics, failing the test. If backtraces are enabled (i.e. RUST_BACKTRACE=1
), the panic message will print the stack trace of the access during which the causality violation was detected. However, it is harder to determine where in the program the concurrent accesses that also participated in the violation occurred. The panic message will include the current state of the causal cell in that execution, like this:
"Causality violation: Concurrent mutable access and immutable access(es): cell.with_mut: v=VersionVec { versions: [6, 4, 4, 0] }; mut v=VersionVec { versions: [4, 0, 0, 0] }; thread=VersionVec { versions: [5, 12, 0, 0] }"
but this debug output can be difficult for users to interpret.
I'd like to propose a feature to allow loom to capture the backtraces of all accesses to a CausalCell
. If this feature is enabled (e.g. by the RUST_BACKTRACE
env var, or a separate LOOM_BACKTRACE
env var?), every time a causal cell is accessed, the stack backtrace in which that access occurred would be recorded. If a causality violation is detected, loom
could print a message that includes all the stack frames responsible for the violation. For example:
Causality violation: concurrent mutable and immutable access(es): cell.with_mut:
Accessed mutably from:
1: loom::sync::causal::cell::CausalCell<T>::with_mut
at /Users/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.2.8/src/sync/causal/cell.rs:136
2: my_crate::MyType::do_causality_violation
at src/lib.rs:178
3: my_crate::MyType::maybe_do_a_bad_thing
at src/lib.rs:422
4: my_crate::MyType::some_function
at src/lib.rs:339
...
Concurrently accessed immutably from:
1: loom::sync::causal::cell::CausalCell<T>::with
at /Users/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.2.8/src/sync/causal/cell.rs:81
2: my_crate::MyType::some_other_function
at src/lib.rs:320
3: my_crate::my_mod::my_function
at src/my_mod.rs:112
3: my_crate::do_something
at src/lib.rs:412
...
Concurrently accessed immutably from:
1: loom::sync::causal::cell::CausalCell<T>::with
at /Users/eliza/.cargo/registry/src/github.com-1ecc6299db9ec823/loom-0.2.8/src/sync/causal/cell.rs:81
2: my_crate::MyType::some_other_function
at src/lib.rs:320
3: my_crate::my_mod::my_great_function
at src/my_mod.rs:261
...
Additionally, it could be helpful to process the captured backtraces to remove stack frames inside of loom
, since they're not relevant to the user code being simulated and make the backtrace much longer.
The AtomicPtr implementation seems to be missing an implementation of get_mut
๐ Hi, loving the tool and I found it really easy to get set up.
I have a passing test in my code that I think might be supposed to fail. Most likely, I just don't understand the memory model correctly. Is there a way that I can see the different execution orders that loom is using in order to better understand the memory model?
My test is failing with following error message:
thread '<unnamed>' panicked at 'cannot access a scoped thread local variable without calling `set` first', /home/midas/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libstd/macros.rs:13:23
My code has no unsafe anywhere:
#[cfg(loom)]
use loom::sync::atomic::AtomicU16;
#[cfg(loom)]
use loom::sync::atomic::Ordering;
#[cfg(not(loom))]
use std::sync::atomic::AtomicU16;
#[cfg(not(loom))]
use std::sync::atomic::Ordering;
use arr_macro::arr;
use parking_lot::Mutex;
pub struct Mailbox<T> {
queue: [Mutex<Option<T>>; 256],
/// The first byte (when reading as big endian) is the head, the second byte is the tail.
pointers: AtomicU16,
}
impl<T> Mailbox<T> {
pub fn new() -> Mailbox<T> {
Mailbox {
queue: arr![Mutex::new(None); 256],
pointers: AtomicU16::new(0),
}
}
pub fn insert(&self, value: T) -> Result<(), T> {
loop {
let v = self.pointers.load(Ordering::SeqCst);
let pointers = v.to_be_bytes();
if pointers[0].wrapping_add(1) == pointers[1] {
// We're full!
return Err(value);
} else {
// Try to claim the next spot!
let new_u16 = u16::from_be_bytes([pointers[0].wrapping_add(1), pointers[1]]);
let swapped: bool =
self.pointers.compare_and_swap(v, new_u16, Ordering::SeqCst) == v;
if swapped {
let index = pointers[0].wrapping_add(1) as usize;
let mut l = self.queue[index].lock();
*l = Some(value);
return Ok(());
} else {
// Someone else claimed it under our nose, try again!
continue;
}
}
}
}
pub fn retrieve(&self) -> Option<T> {
loop {
let v = self.pointers.load(Ordering::SeqCst);
let pointers = v.to_be_bytes();
if pointers[0] == pointers[1] {
// We're empty!
return None;
} else {
// Try to claim the next spot!
let new_u16 = u16::from_be_bytes([pointers[0], pointers[1].wrapping_add(1)]);
let swapped: bool =
self.pointers.compare_and_swap(v, new_u16, Ordering::SeqCst) == v;
if swapped {
let index = pointers[1].wrapping_add(1) as usize;
let mut l = self.queue[index].lock();
return l.take();
} else {
// Someone else claimed it under our nose, try again!
continue;
}
}
}
}
pub fn is_mbx_empty(&self) -> bool {
let v = self.pointers.load(Ordering::Release);
let a = v.to_be_bytes();
a[0] == a[1]
}
}
And the test is pretty simple:
#![cfg(loom)]
use lfmbx::Mailbox;
use loom::sync::Arc;
use std::thread;
#[test]
fn multi_insert() {
loom::model(|| {
let mbx: Arc<Mailbox<u8>> = Arc::new(Mailbox::new());
let ths: Vec<_> = (0..2)
.map(|_| {
let mbx = mbx.clone();
thread::spawn(move || mbx.insert(2))
})
.collect();
for th in ths {
th.join().unwrap();
}
assert_eq!(mbx.retrieve(), Some(2));
assert_eq!(mbx.retrieve(), Some(2));
assert_eq!(mbx.retrieve(), None);
});
}
I'm running rustc v1.45.1 and loom v0.3.5
std::sync::Arc<T>
does not require T: Sized
, where as loom's version does. This means that code that uses, say, Arc<dyn Fn>
will not compile with loom, which is unfortunate. I believe we would need to move the T
to the end of loom::Arc
for this to work, though that also breaks the layout needed according to loom::Arc
's comment:
https://github.com/carllerche/loom/blob/339dd48f57d3b9b786e44db4914d08882d7659bf/src/sync/arc.rs#L11-L17
The loom::sync::Arc
type needs to add T: ?Sized
to allow Arc<dyn Any>
.
It's mostly just adding bounds, but it does mean into_raw
and from_raw
need to be carefully adjusted, since ?Sized
types need to be the last field on Inner
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.