Git Product home page Git Product logo

famfs's Issues

Code Coverage: Umbrella issue

We use gcov to test code coverage. We are currently (and will probably continue to) use a combination of smoke and unit tests to measure "official" coverage (see https://github.com/cxl-micron-reskit/famfs/blob/master/markdown/getting-started.md).

But I would like to see our unit-test-only increase for various reasons. One is that smoke tests may not be runnable on Github, since they need an actual dax device. Moreover, there are some inherently hard-to-cover branches - mmap failures and file open failures are top of the list here. We probably need to enable mocking for open and mmap here in order to properly test failures in those functions.

So please work on unit (or smoke) test code coverage improvements, but especially unit test coverage improvements and send PRs.

Better unit test coverage will be good since enabling Github coverage reporting probably can't include coverage from smoke tests (but please correct me if this seems wrong).

mkfs.famfs: support configurable metadata log size?

Currently the log is always 8MiB, which currently holds something over 25000 log entries. Reasons we might support a configurable log size:

  1. 25K entries is not enough
  2. 8MiB is massive overkill (e.g. the device is going to be used for a single file, or a small set). In this case, the minimum log size will be 2MiB - one huge page.

Under no circumstances should we support a log size that is not a 2MiB multiple. Honestly I'm not sure this needs to be addressed, but it's a question that has come up several times.

The trickiest part would be test coverage.

famfs_map_superblock_and_log_raw(): size of log is assumed

This function needs to either:

  • Check the log size after the mmap, and remap if necessary to get the right size
  • Or mmap the superblock first, get the log size, then mmap the log

The first option leaves the cleanup almost exactly the same, whereas the second leaves two things to munmap.

This is a bug, but it's only a latent bug until we support more than one log size.

log play should rigorously avoid following symlinks

I added a test that manually puts a symlink where a directory should be, and then runs logplay. The link was to a directory (/tmp) and logplay just thinks the directory already existed (i.e. stat followed the link to its destination and reported on that).

We may need to use fstatat() in logplay (rather than stat()) to avoid this. Or always do an lstat() before the stat(). Also preventing symlinks might be a good idea, if there's a way.

Integrate fio tests into famfs test suite (probably in smoke...)

Something along the lines of what we did in this sub-thread: https://lore.kernel.org/linux-fsdevel/w5cqtmdgqtjvbnrg5okdgmxe45vjg5evaxh6gg3gs6kwfqmn5p@wgakpqcumrbt/.

Test artifacts should be captured in log files. Extra points for scriptage that sanity checks results (though this could get complicated since not all memory has the same performance...). I already have some script setup, but it will need to check core count and memdev size in order to properly setup on any system/memdev.

We need a unit test that overflows the famfs metadata log

Log overflow has not been tested yet...

Mind you the default log size has space for >25000 entries, and the minimum allocation unit is 2MiB (but if you're creating files <=2MiB, you're probably missing the point of famfs).

Still: will add such a test

Need a test that verifies that mapping faults are 2MiB or 1GiB, and not 4KiB

The famfs kernel RFC v1 has fault counters that can be enabled via /sys/fs/famfs/... These were implemented because we had a bug in the kernel module at one point that resulted in 4K mapping faults rather than 2M, and it killed performance for many concurrent computational processes hammering on the same data frame(s) in memory.

Famfs files are constrained to multiples of 2MiB, and mmap addresses are constrained to 2MiB alignment - both for this reason. But we need a good way to catch a regression.

However, the counters proved controversial (see thread [1]) so they will probably be dropped from the next version of the patch set. Dan Williams pointed out a user space test that ndctl uses to verify something similar [2], but I'm hoping we can do better. The first place to look is the rest of the thread at [1] to see of a prescription becomes clear.

[1] https://lore.kernel.org/linux-fsdevel/3jwluwrqj6rwsxdsksfvdeo5uccgmnkh7rgefaeyxf2gu75344@ybhwncywkftx/T/#m69d2b66e54f9657c38e6e0a0da94ab4b3eca7586
[2] https://github.com/pmem/ndctl/blob/main/test/dax.sh#L31

Does FAMFS supports write now?

When I was trying to do some simple cmds to see the behavior, I notice that if I use cmds like:
echo 1234 >> /mnt/famfs/test
then I am using famfs_dax_write_iter to change the contents.
However the log is not added. Therefore, famfs verify/fsck cannot be passed.
Is that a issue? Or there is another way to write something to the file?

Add clflush as needed to support concurrent cache-incoherent mounts

clflush needed:
after mkfs, on superblock and log
after log append (on log)
before logplay (on log)
after 'famfs cp'
after 'famfs creat' if the file was initialized.

User or app will be responsible for flushing the cash on any data written by apps other than the famfs cli

Still thinking about how the generalized cases should work. Probably a "famfs flush" command and api call that should flush data as appropriate...

famfs mount: works even if there is no superblock & log

The kernel mount is performed, and then the famfs_mkmeta() discovers that there is no valid superblock and bails out on creating the meta files.

Need to think through the right way to handle this gracefully. Probably issue a umount on the way out.

compile failed for the famfs as the extent_type name has been changed to famfs_extent_type.

compile failed as the error:
#make all
make[3]: Entering directory '/root/famfs/debug'
[ 3%] Building C object CMakeFiles/libfamfs.dir/src/famfs_lib.c.o
/root/famfs/src/famfs_lib.c:423:28: warning: ‘enum extent_type’ declared inside parameter list will not be visible outside of this definition or declaration
423 | enum extent_type *type)
| ^~~~~~~~~~~
/root/famfs/src/famfs_lib.c:421:1: error: conflicting types for ‘famfs_get_device_size’; have ‘int(const char *, size_t *, enum extent_type *)’ {aka ‘int(const char *, long unsigned int *, enum extent_type *)’}
421 | famfs_get_device_size(const char *fname,
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/famfs/src/famfs_lib.c:35:
/root/famfs/src/famfs_lib.h:21:12: note: previous declaration of ‘famfs_get_device_size’ with type ‘int(const char *, size_t *, enum famfs_extent_type *)’ {aka ‘int(const char *, long unsigned int *, enum famfs_extent_type *)’}
21 | extern int famfs_get_device_size(const char *fname, size_t *size, enum famfs_extent_type *type);
| ^~~~~~~~~~~~~~~~~~~~~
/root/famfs/src/famfs_lib.c: In function ‘famfs_mkfs’:
/root/famfs/src/famfs_lib.c:3812:14: error: variable ‘type’ has initializer but incomplete type
3812 | enum extent_type type = SIMPLE_DAX_EXTENT;
| ^~~~~~~~~~~
/root/famfs/src/famfs_lib.c:3812:26: error: storage size of ‘type’ isn’t known
3812 | enum extent_type type = SIMPLE_DAX_EXTENT;
| ^~~~
/root/famfs/src/famfs_lib.c:3812:26: warning: unused variable ‘type’ [-Wunused-variable]
make[3]: *** [CMakeFiles/libfamfs.dir/build.make:76: CMakeFiles/libfamfs.dir/src/famfs_lib.c.o] Error 1
make[3]: Leaving directory '/root/famfs/debug'
make[2]: *** [CMakeFiles/Makefile2:184: CMakeFiles/libfamfs.dir/all] Error 2
make[2]: Leaving directory '/root/famfs/debug'
make[1]: *** [Makefile:146: all] Error 2
make[1]: Leaving directory '/root/famfs/debug'
make: *** [Makefile:13: debug] Error 2

File creation will hork if a duplicate relative path is created after a rogue delete of that relative path

When famfs creates a file (famfs_mkfile() in the api, normally from 'famfs cp' or 'famfs creat'), it will fail if the file already exists. But if a rogue delete had taken place, and then a cp or creat tried to create the same relative path, it would not see the file in the mounted filesystem - and would proceed to create and log the file instance.

A rogue delete is any 'rm' that did not occur through the famfs api/cli - and since the api/cli currently does not support delete, it's any delete.

This would effectively make a mess of things.

  • The log would contain two file creates of the same relative path, with different allocation
  • Logplay would instantiate the first, and ignore the second
  • The Master node would have the second file (hey, only the master can create files), but any client that has not experienced a rogue delete would have the first file (which would not map to the same memory as the "rogue create" on the master).

This could be solved by building a hash table of relative paths during logplay (master only), and 1) detecting relative path collisions in the log, and 2) detecting famfs_mkfile() or famfs_mkdir() calls that generate relative path collisions in the log (which are detectable via the mounted namespace if there have been no rogue namespace operations)

One downside to this is that it will make the O() order of file creation worse; it's already kinda expensive because space allocation plays the log to get the free/available bitmap - which is not persisted. There is not an "easy" way to persist the hash table either, so that might need to be re-generated on each [batch of] file create. (batches because 'cp *' and 'cp -r' and 'mkdir -p' lock the log and build the bitmap once for a batch of creates).

Hmm. we could persist the bitmap in a new meta file, and only expose that on the master. Even the hash table could be handled that way too. Extend the flock(log) to cover those files, and it may be a fully working approach worth considering...eventually.

This has not been observed in the wild; Will put this "on ice" initially, but may need to

pcq enhancements - enabling useful automated testing

  • When consuming an entry from a queue, if the crc is bad, should invalidate the cache (for the entry) and retry the consume
  • Add a -s|--status to print status during the run; need status_worker() thread
  • Keep stats on the number of times the producer sees a full queue
  • Keep stats on the number of times a consumer sees an empty queue
  • Keep stats on the number of times a consumer needs to retry reading a valid entry due to crc mismatch (which is either a bad entry or a cache incoherency occurrence)
  • pcq --setperm option to set permission for p|c|b|n (producer, consumer, both, neither) to help with tests

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.