utsaslab / crashmonkey Goto Github PK
View Code? Open in Web Editor NEWCrashMonkey: tools for testing file-system reliability (OSDI 18)
License: Apache License 2.0
CrashMonkey: tools for testing file-system reliability (OSDI 18)
License: Apache License 2.0
Part of the revised version of #12.
Checkpoints require support across many parts of CrashMonkey. This part is meant to provide user workloads the ability to tell the CrashMonkey test harness that they want to create a checkpoint. CrashMonkey should provide both a stub binary to accomplish this task (similar to the current stubs in the user_tools
directory) as well as a small API for tests subclassed from BastTestCase.h
. This utility can make use of the sockets class available in the utils/communication
directory.
For checkpoints, we can assume 2 things:
The stub program or API for this part should do 2 things:
After the stub has received confirmation the checkpoint operation completed, it should exit with no error (for the binary) or return to the caller.
The test harness portion of CrashMonkey crashes with an index out of bounds exception when no bios are logged/transferred to user space.
Running sudo ./c_harness -f /dev/vda -d /dev/cow_ram0 -t btrfs -e 102400 tests/rename_root_to_sub.so
on a build from master
should trigger this bug (potentially substituting /dev/vda
with an existing disk in the VM).
I'm trying to test Crashmonkey, but I get an error when trying to register a kernel module. I also change the kernel version on some other PCs, but it shows same results.
Please check this problem.
root@junghan-nuc:~/crashmonkey/build/c_harness -f /dev/vda1 -d /dev/cow_ram0 -t ext2 tests/rename_root_to_sub.so -v
running 0x7ffc10bb66f8
========== PHASE 0: Setting up CrashMonkey basics ==========
Inserting RAM disk module
Loading test case
Loading permuter
Updating dirty_expire_time_centisecs to 3000
========== PHASE 1: Creating base disk image ==========
Formatting test drive
mke2fs 1.42.13 (17-May-2015)
Discarding device blocks: done
Creating filesystem with 10240 1k blocks and 2560 inodes
Filesystem UUID: 0637732c-1f79-4ed0-84ca-125bca2fb70a
Superblock backups stored on blocks:
8193
Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
Mounting test file system for pre-test setup
Running pre-test setup
Unmounting test file system after pre-test setup
Making new snapshot
cloning device /dev/cow_ram0
========== PHASE 2: Recording user workload ==========
Clearing caches
Inserting wrapper module into kernel
insmod: ERROR: could not insert module ../build/disk_wrapper.ko: Cannot allocate memory
Error inserting kernel wrapper module
rmmod: ERROR: Module cow_brd is in use
Unable to remove cow_brd device
root@junghan-nuc:~/crashmonkey/build#
Many times, the crash monkey test fails with an assertion error in random permuter due to which it is unable to decide the result of the test. The error looks like -
c_harness: permuter/RandomPermuter.cpp:307: void fs_testing::permuter::RandomPermuter::AddEpochs(const iterator&, const iterator&, const iterator&, const iterator&): Assertion 'current_res != res_end' failed.
We noticed this happens more frequently while testing for ext4 than the other file systems.
Recently, a couple of gentlemen have been using dm-log-writes
and xfstests
(available here). We should reproduce these bugs in CrashMonkey as well.
Currently, for each VM, only one Crashmonkey instance is running. This wastes a lot of computational power. It would much more efficient to run X instances of Crashmonkey if there are X cores on the machine.
This would require running X wrapper devices per virtual machine. Not sure what kernel problems we will run into when doing this.
Since CrashMonkey has terminology that is either new, or using it in a new context, we should create a wiki page that defines common terms in CrashMonkeys that users may not be familiar with. This will help people discuss concepts of CrashMonkey in a coherent manner.
Something appears to be off with the little ram block device (cow_brd) that I created earlier in the project. The device mapper target has a two part system that is already in place through the snapshot-origin and snapshot targets. There is also a library for device mapper that can be used to programatically control device mapper targets.
The dm target has the advantage of upstream support as well as C library support in the for of libdevmapper
.
It also has a bash interface, and a small script like the following can be used to rig up a simple snapshot device:
#! /bin/bash
SNAP_BASE=/dev/ram0
SNAP_BASE_NAME=snap_base
SNAP_DEV=/dev/ram1
SNAP_NAME=snap_snap
set -x
DEV_SIZE=$(blockdev --getsz "$SNAP_BASE")
echo "0 $DEV_SIZE snapshot-origin $SNAP_BASE" | dmsetup create $SNAP_BASE_NAME
ORIG_BASE=/dev/mapper/$SNAP_BASE_NAME
ORIG_SIZE=$(blockdev --getsz "$ORIG_BASE")
dmsetup create $SNAP_NAME --notable
echo "0 $DEV_SIZE snapshot $ORIG_BASE $SNAP_DEV n 8" \
| dmsetup load $SNAP_NAME
dmsetup mknodes
Despite the fact that a bash library exists to communicate with device mapper targets, I feel an implementation using the libdevmapper
library would be preferable.
I made the mistake of exposing kernel bio flags to user space in earlier versions of CrashMonkey. This is not a portable design choice and needs to be fixed. Instead of directly using kernel bio flags in user space, the kernel code in CrashMonkey should translate from kernel flags to CrashMonkey specific flags. This enables portability of user space code in CrashMonkey. Another project similar to CrashMonkey already does this and has been added to the kernel. The code that accomplishes this can be found here.
At least the following flags/concepts should have their own defines in CrashMonkey:
Part of the revised version of #12.
Checkpoints require support across many parts of CrashMonkey. This part is meant to provide the ability for the disk_wrapper
to actually create a checkpoint. This should be implemented as part of the ioctl created in #40.
For checkpoints, we can assume 2 things:
When a checkpoint request is received an an ioctl, the disk_wrapper
should insert a new disk_write_op
into the sequence to signify that a checkpoint was made. This operation should have no data, but should have flags to denote that it is a checkpoint. New flags will need to be created to signify checkpoint operations as the current flags don't reflect that.
Insertion into the list of disk operations should be done in a thread-safe manner. It could be the case that another process is attempting to insert a write into the list, so be sure to use proper locking to ensure nothing is lost.
The checkpoint operation should appear like all the other operations in the list so that it can be transferred to user space like all the others.
So it seems there is no support to C++, I was trying to port the code to Kernel 3.16 and an included header in the compilation of RandomPermuter.so
(linux/stddef.h
) try to define true
and false
, that are keywords in C++.
A workaround is remove the link to the kernel headers in the compilation of RandomPermuter.so
, and create a local file with a copy of the needed values from linux/blk_types.h
. But I don't like it.
xfstests
in kdave's repo has a source file for a program called fsx
that is meant to perform random write/truncate/allocate operations on a single file in the file system. This would be a useful tool for the CrashMonkey team because it would allow us to quickly bootstrap random tests.
The fsx
program has a few extra things the CrashMonkey team may not need.
fsx
if they are not needed or should be created somewhere other than on the file system under test if they are useful.fsx
has an algorithm to generate data, but I do not know if it generates data the CrashMonkey team can easily check for if given an offset in a file. Part of CrashMonkey's tests include checks for proper data in files, so we would like to make sure that we know what data is being written to a file where. If needed, the fsx
algorithm to create data to write to a file should be modified so that it writes data CrashMonkey can easily verify.Now that the project is getting bigger, the Makefile should be modified so that builds are cleaner.
Included code should be compiled into libraries where possible and linked where needed instead of provided directly as it is now.
Compiled code should also be placed in its own build
directory instead of alongside the source files which generated it. This will make make clean
much easier to define as well as making it easier to avoid checking in binaries in git.
File watches are a feature that would make checking for data consistency a lot easier on users. Therefore, CrashMonkey should support a system where a user can tell CrashMonkey what files should no longer change. These watches may be tied to certain checkpoints in the workload, or they may be something that holds through the entire workload.
For watches, we can assume a few things:
This part of the watch infrastructure gives user workloads the ability to tell CrashMonkey to watch a file. Since workloads can be run either by CrashMonkey (by subclassing BaseTest.h
) or with CrashMonkey in the background, we need to provide both a stub binary and a simple API to setup watches. Watch setup should use sockets to communicate with the CrashMonkey test harness (see utils/communication/
).
When the user requests a watch on a file, the stub should do the following:
The code to manage block devices is currently a part of the main harness code. Since the functionality of this code is very narrowly scoped and it is not directly related to how the test harness should be run, it should likely be moved into a utility class or module.
Part of the revised version of #12.
Checkpoints require support across many parts of CrashMonkey. This part slightly modifies how crash states are generated so that we can give user consistency tests more information about the crash state they are working with.
For checkpoints, we can assume 2 things:
When a new crash state is generated, the Permuter (or subclass) that generated the crash state should inform the CrashMonkey test harness of the most recent checkpoint passed in the bio sequence. An example of a workload, generated crash state, and checkpoint number are shown below.
workload:
+-------------+-------------+-------------+-------------+-------------+
| epoch 1 | epoch 2 | checkpoint | epoch 3 | epoch 4 |
+-------------+-------------+-------------+-------------+-------------+
generated crash state:
+-------------+-------------+-------------+-----------------+
| epoch 1 | epoch 2 | checkpoint | partial epoch 3 |
+-------------+-------------+-------------+-----------------+
returned checkpoint value: 1
Another example could be:
workload:
+-------------+-------------+-------------+-------------+-------------+
| epoch 1 | epoch 2 | checkpoint | epoch 3 | epoch 4 |
+-------------+-------------+-------------+-------------+-------------+
generated crash state:
+-------------+-----------------+
| epoch 1 | partial epoch 2 |
+-------------+-----------------+
returned checkpoint value: 0
Hi,
If I include fsync
in the OperationSet list (ace/ace.py
line 59), and run (python ace.py -l 1 -n False -d False
), ACE fails with the following error message:
Traceback (most recent call last):
File "ace.py", line 1463, in <module>
main()
File "ace.py", line 1422, in main
doPermutation(i)
File "ace.py", line 1242, in doPermutation
cur_line = buildJlang(modified_sequence[insert], length_map)
File "ace.py", line 1017, in buildJlang
ret = flat_list[2]
IndexError: list index out of range
Thanks.
Comments in the kernel indicate that the bi_sector
that our current logging may not be relative to the partition we're monitoring, but relative to the entire block device (offending comment). We need to determine if the sector being logged is the sector relative to the partition of the device we are monitoring or the disk itself (ex. relative to /dev/sda1
or /dev/sda
).
Another system, called log-writes
, performs logging similar to CrashMonkey, but uses device mapper targets instead. Eventually, we would like to move CrashMonkey over to a more standard device mapper target. However, before we do that, we would like to know what the pain points will be. We should use the dm-log-writes
target to determine if the sectors logged are relative to the start of the partition being monitored or relative to the start of the block device.
An easy way to check this is to log some operations with the log-writes system and then try to replay those operations onto a device with a different number of partitions and/or with partitions of different sizes than the device logging was originally done on
Part of the revised version of #12.
Checkpoints require support across many parts of CrashMonkey. This part is meant to provide the ability for the CrashMonkey test harness to tell the disk_wrapper
to create a checkpoint. This can be implemented as an ioctl call to the disk_wrapper
.
For checkpoints, we can assume 2 things:
The ioctl should be a synchronous call. The work done in the ioctl is in #39.
Monitoring the program running the workload with ptrace
inside CrashMonkey would make things a little easier for users in a few ways, including:
Based on the above list, the main goals of adding ptrace
to CrashMonkey should be:
write
so that it can be correlated with logged bioshttps://github.com/kdave/xfstests
Infrastructure to do the following:
We should be able to support a workflow like this:
The user's workload shouldn't have to be written in C++ inside CrashMonkey
I have a long email thread in my inbox with @ashmrtn about this. Will add the summary from that thread here later.
In short, we want to have some mechanism to know what data/metadata to expect in each crash state. The idea is to allow users to call Checkpoint, which captures the user-visible state (directory tree + data) of the file system somewhere. On a crash, we go back to the latest Checkpoint and see if we have all the data in there.
File watches are a feature that would make checking for data consistency a lot easier on users. Therefore, CrashMonkey should support a system where a user can tell CrashMonkey what files should no longer change. These watches may be tied to certain checkpoints in the workload, or they may be something that holds through the entire workload.
For watches, we can assume a few things:
This part of the watch infrastructure implements the logic for file watches. For each file watch made, the CrashMonkey test harness should checksum the data and selective metadata for the specified file. Metadata that does not change on every access (ex. file size, file type, and permissions but not things like accessed time) should be included in the checksum. Checksums should be stored in a hashmap (or hashmap like structure) that maps the file name to the checksum.
There is no limit on the number of times a file can be added to watches. Therefore, each filepath<->checksum hashmap should be stored according to the checkpoint that it corresponds to (ex. if a file is watched referencing checkpoint 1 and checkpoint 2 -- as two separate calls to watch -- then there should be a hashmap corresponding to checkpoint 1 watches and a hashmap corresponding to checkpoint 2 watches).
The code to insert and remove kernel modules is currently a part of the main harness code. Since the functionality of this code is very narrowly scoped and it is not directly related to how the test harness should be run, it should likely be moved into a utility class or module.
Comments in the kernel indicate that the bi_sector
that our current logging may not be relative to the partition we're monitoring, but relative to the entire block device (offending comment). We need to determine if the sector being logged is the sector relative to the partition of the device we are monitoring or the disk itself (ex. relative to /dev/sda1
or /dev/sda
).
An easy way to check this is to log some operations with CrashMonkey and then try to replay those operations onto a device with a different number of partitions and/or with partitions of different sizes than the device logging was originally done on.
We currently have the information from a run of CrashMonkey spread in too many files and logs, which makes interpreting tests hard. Lets consolidate this into one file.
A number of bugs only occur the file system is close to full (storage space almost utilized). Add this as part of the testing.
Some bugs only appear when the kernel is low on memory. Need to figure out how to add those.
Part of the revised version of #12.
Checkpoints require support across many parts of CrashMonkey. This part slightly modifies the way user tests are called so that checkpoint information is passed to user tests
For checkpoints, we can assume 2 things:
The part of CrashMonkey that calls user tests should be modified to pass along the checkpoint number (generated by #41). BaseTestCase.h
should be modified such that this is allowed.
Set it up so that we crash during xfstests and check if the file system is consistent.
This "macro-test" will yield a lot of interesting crash states.
CrashMonkey currently only works with 3.x versions of the Linux kernel. It should be updated to work with 4.x versions of the kernel as well.
User provided tests should have a set of utilities/methods that they can call into which provide them with things like the directory the test file system is mounted at and the file system size.
File watches are a feature that would make checking for data consistency a lot easier on users. Therefore, CrashMonkey should support a system where a user can tell CrashMonkey what files should no longer change. These watches may be tied to certain checkpoints in the workload, or they may be something that holds through the entire workload.
For watches, we can assume a few things:
This part of the watch infrastructure allows CrashMonkey to report errors when the files being watched change in generated crash states.
Each time a crash state is generated, CrashMonkey should examine the checkpoint for the crash state (see #41/#42) and then check all file watches referencing that checkpoint and earlier.
When "checking" a watched file, CrashMonkey should checksum the file data and selected metadata (see #44) at that path present in the generated crash state. If the crash state's checksum does not match the checksum calculated when the file watch was setup, then CrashMonkey should note the error in detail (ex. "checksum for file is incorrect" -- see DataTestResult.h for an example of error strings) in a results/xResult
struct (you will likely have to modify or make a new struct for this). Recording specific errors in a xResult
struct will allow these errors to be printed to the log later in test harness execution.
Currently, the user-space component of CrashMonkey generates tests that run on CrashMonkey's custom kernel module. We would like to generate tests that use dm-flakey (https://www.kernel.org/doc/Documentation/device-mapper/dm-flakey.txt) since dm-flakey is already in the Linux kernel. The advantage of doing so is that the tests CrashMonkey produces can directly be added to xfstests and run by Linux kernel developers.
For example, Jayashree is now porting CrashMonkey tests to dm-flakey tests manually and adding them to xfstests: https://www.spinics.net/lists/fstests/msg10767.html. An adaptor for dm-flakey would make this automatic.
#17 and #48 brought in a log file that the failing tests are printed to. This log output contains the indices of the bios in the crash state, and the order they were written out to disk to to form the crash state. This list of indices should be augmented to show whether the bio was a metadata bio or a data bio.
Metadata bios will be denoted by the META flag in the bio itself. Data bios will not have that flag.
Since the amount of data stored in the test result object is minimal at this time, we will likely have to expand the member variable that contains the crash state index information. Expanding that to have <index, data> pairs should suffice.
Sample output for this change could look like/similar to the following:
Test #26 FAILED: file missing: test file has completely disappeared
last checkpoint: 2
crash state: 0 (M), 1 (D), 2 (D), 4 (D), 3 (D), 5 (M)
Have CrashMonkey run in a "fuzzer" mode where:
Test N crash states at once, where N is the number of cores on the test machine.
The permuters should have unit tests associated with them to make sure they are working properly. These test should include things like checking the permuter works properly when no or 1 bio is logged etc.
Tests should be placed in the test
directory in the repo.
Look at Linux kernel mailing lists and file-system specific lists such as linux-ext4 and linux-btrfs to collect bugs we could attempt to reproduce.
Users may want to exit CrashMonkey before the test harness has finished a complete run. They should be able to hit ctrl-c
on the shell to kill it and expect CrashMonkey to clean up resources properly.
Most of this should just be catching the proper signal and then calling cleanup_harness()
in the instance of the Tester
object that harness/c_harness.cpp
has.
Right now, I know that background communication sockets aren't cleaned up, kernel module(s) aren't removed, and file systems aren't unmounted (depending on when ctrl-c is hit).
To aid correlation between writes to disk and recorded bios, all logs generated by the -l
(that's lower case 'L', not upper case 'i') should be in hex.
The goal of this is allow users to run CrashMonkey with strace -x
logging the write system calls. Then, users can directly correlate the hex strings passed to system calls that strace
logs to the data in bios recorded by CrashMonkey
Based on experiences from some of the newer people on the CrashMonkey team, it seems that it is hard to determine why CrashMonkey failed to run properly if an error occurs. Therefore, the error messages in CrashMonkey should be updated to make it easier to understand what went wrong.
Flush operations are defined oddly. They make sure that the data in the device cache is persisted, but will not make sure the data in the request itself is persisted (link). In light of this, CrashMonkey should split flush operations that have data. The flush operation itself should end the current epoch in CrashMonkey, but the data should be placed in the next epoch as it is not guaranteed to be persisted in the epoch the flush just ended. This should probably be done in the Permuter class when it initializes internal data structures so that it this behavior is transparent to user implemented permuters.
There's a bug in the initialization code of disk_wrapper
that causes it to be unremmovable (and thus forces a system restart) if the device it's supposed to pull IO scheduler flags from does not exist.
The device is stable otherwise, but this should be fixed to make the system more resilient.
This is a list of file system bugs that the current implementation of CrashMonkey cannot reproduce because we don't have the infrastructure for it. If we add the infrastructure in the future, these might be interesting to try to recreate.
The current system is a simple pass/fail return from user data consistency tests. We need to update this to allow user data consistency tests to output meaningful errors that help with finding file system bugs. Without this, it is not easy to determine why a test failed since you can only see summary information about the tests.
Currently, the user specifies how many crash states to test and keeps running tests until it reaches that number. For large tests (>= ~10 bios in a single epoch) this is fine. For smaller tests however, unless the user manually counts how many possible crash states there are and sets the options accordingly, it will cause CrashMonkey to loop infinitely trying to find enough unique crash states to satisfy the command line argument.
CrashMonkey should be updated to reduce the number of tests it will run on small workloads so that it does not spin forever trying to generate unique crash states. It should print a message when it does this so that users are aware of this behavior.
As development goes along, I find that more and more of the flags are letters that don't really relate to what the flag actually does. It would be nice to clean these up and make them sane values that relate to what the flag actually does.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.