cxl-micron-reskit / famfs Goto Github PK

This is the user space repo for famfs, the fabric-attached memory file system

License: Apache License 2.0

Makefile 0.30% C 35.59% CMake 0.79% Shell 15.04% C++ 2.76% Roff 45.52%

famfs's Issues

Code Coverage: Umbrella issue

We use gcov to test code coverage. We are currently (and will probably continue to) use a combination of smoke and unit tests to measure "official" coverage (see https://github.com/cxl-micron-reskit/famfs/blob/master/markdown/getting-started.md).

But I would like to see our unit-test-only increase for various reasons. One is that smoke tests may not be runnable on Github, since they need an actual dax device. Moreover, there are some inherently hard-to-cover branches - mmap failures and file open failures are top of the list here. We probably need to enable mocking for open and mmap here in order to properly test failures in those functions.

So please work on unit (or smoke) test code coverage improvements, but especially unit test coverage improvements and send PRs.

Better unit test coverage will be good since enabling Github coverage reporting probably can't include coverage from smoke tests (but please correct me if this seems wrong).

mkfs.famfs: support configurable metadata log size?

Currently the log is always 8MiB, which currently holds something over 25000 log entries. Reasons we might support a configurable log size:

25K entries is not enough
8MiB is massive overkill (e.g. the device is going to be used for a single file, or a small set). In this case, the minimum log size will be 2MiB - one huge page.

Under no circumstances should we support a log size that is not a 2MiB multiple. Honestly I'm not sure this needs to be addressed, but it's a question that has come up several times.

The trickiest part would be test coverage.

Figure out how to get the GitHub language scanner to avoid the kconfig/famfs-config-6.8 file

It currently thinks the project is mostly ROFF files :D

famfs_map_superblock_and_log_raw(): size of log is assumed

This function needs to either:

Check the log size after the mmap, and remap if necessary to get the right size
Or mmap the superblock first, get the log size, then mmap the log

The first option leaves the cleanup almost exactly the same, whereas the second leaves two things to munmap.

This is a bug, but it's only a latent bug until we support more than one log size.

Test: need a test for the flock(logfile) approach to serializing allocation and log append

I'm thinking the test should prove that while the lock is held, other processes can't get the lock. What else? We currently rely on this being bulletproof...

Building a kernel documentation references "famfs.config" file that doesn't exist

The reference is here:

https://github.com/cxl-micron-reskit/famfs/blob/master/markdown/building-a-kernel.md

Planning to checkin a fames-config file to this repo, and specify a 'wget' command to retrieve it for the kernel build

Need unit test: overflowing relative path length in log entry

Log entries currently have space for 80 characters of relative path. Need to fail gracefully if we overflow.

log play should rigorously avoid following symlinks

I added a test that manually puts a symlink where a directory should be, and then runs logplay. The link was to a directory (/tmp) and logplay just thinks the directory already existed (i.e. stat followed the link to its destination and reported on that).

We may need to use fstatat() in logplay (rather than stat()) to avoid this. Or always do an lstat() before the stat(). Also preventing symlinks might be a good idea, if there's a way.

move get_multiplier() to a shared location

This function is duplicated in famfs_cli.c and pcq.c. Maybe famfs_cli_lib.c, or something else...

Does famfs provide POSIX-like interfaces for operating files under the famfs file system, such as open, close, write, read, lseek, and other functions?

Integrate fio tests into famfs test suite (probably in smoke...)

Something along the lines of what we did in this sub-thread: https://lore.kernel.org/linux-fsdevel/w5cqtmdgqtjvbnrg5okdgmxe45vjg5evaxh6gg3gs6kwfqmn5p@wgakpqcumrbt/.

Test artifacts should be captured in log files. Extra points for scriptage that sanity checks results (though this could get complicated since not all memory has the same performance...). I already have some script setup, but it will need to check core count and memdev size in order to properly setup on any system/memdev.

famfs fsck: should print the system role

Role is MASTER if the system created the file system, and CLIENT if a different system created it

Transition cache flushing to libpmem2, from pmem.io, because it is a multi-arch implementation

https://github.com/pmem/pmdk/tree/bbf9c4fa2a8ad338052a4b4aed26112e809bd5e4/src/libpmem2

No rush on this, but this looks like the right approach.

We need a unit test that overflows the famfs metadata log

Log overflow has not been tested yet...

Mind you the default log size has space for >25000 entries, and the minimum allocation unit is 2MiB (but if you're creating files <=2MiB, you're probably missing the point of famfs).

Still: will add such a test

Need a test that verifies that mapping faults are 2MiB or 1GiB, and not 4KiB

The famfs kernel RFC v1 has fault counters that can be enabled via /sys/fs/famfs/... These were implemented because we had a bug in the kernel module at one point that resulted in 4K mapping faults rather than 2M, and it killed performance for many concurrent computational processes hammering on the same data frame(s) in memory.

Famfs files are constrained to multiples of 2MiB, and mmap addresses are constrained to 2MiB alignment - both for this reason. But we need a good way to catch a regression.

However, the counters proved controversial (see thread [1]) so they will probably be dropped from the next version of the patch set. Dan Williams pointed out a user space test that ndctl uses to verify something similar [2], but I'm hoping we can do better. The first place to look is the rest of the thread at [1] to see of a prescription becomes clear.

[1] https://lore.kernel.org/linux-fsdevel/3jwluwrqj6rwsxdsksfvdeo5uccgmnkh7rgefaeyxf2gu75344@ybhwncywkftx/T/#m69d2b66e54f9657c38e6e0a0da94ab4b3eca7586
[2] https://github.com/pmem/ndctl/blob/main/test/dax.sh#L31

mkfs.famfs: segfault in get_role if dax device is in system-ram mode

This should be fixed, but the workaround is put your dax device in a valid state and try again

/opt/famfs directory permissions don't allow non-root access to the system uuid file

It needs to be 0755 rather than 0644

Default directory permissions should be 0755

The execute bits control ability to list the directory for non-owners. Hot fix on the way...

Does FAMFS supports write now?

When I was trying to do some simple cmds to see the behavior, I notice that if I use cmds like:
echo 1234 >> /mnt/famfs/test
then I am using famfs_dax_write_iter to change the contents.
However the log is not added. Therefore, famfs verify/fsck cannot be passed.
Is that a issue? Or there is another way to write something to the file?

mkfs.famfs fails silently if the Dax device has a superblock from another host on it

Exit code is 1, but no message. There should be a message.

Killing the superblock also only works on the host that created it. I can see going either way on this, but it at least seems reasonable. Should there be a double-secret force?

Add clflush as needed to support concurrent cache-incoherent mounts

clflush needed:
after mkfs, on superblock and log
after log append (on log)
before logplay (on log)
after 'famfs cp'
after 'famfs creat' if the file was initialized.

User or app will be responsible for flushing the cash on any data written by apps other than the famfs cli

Still thinking about how the generalized cases should work. Probably a "famfs flush" command and api call that should flush data as appropriate...

Refactor smoke tests so they can run concurrently in loops on multiple memory devices

The current run_smoke.sh script requires famfs.ko to NOT be loaded at start. This refactor should:

Require famfs.ko to be loaded at start
Not unload famfs.ko (that would hose other tests, If any - and the smoke tests currently do this)
Need a separate test that loads and unloads famfs.ko...

famfs mount: works even if there is no superblock & log

The kernel mount is performed, and then the famfs_mkmeta() discovers that there is no valid superblock and bails out on creating the meta files.

Need to think through the right way to handle this gracefully. Probably issue a umount on the way out.

cp wildcard to mount point directory fails

"cp /path/to/* /mnt/famfs/" fails (when /mnt/famfs is the mount point), but succeeds to a subdirectory of the mount point

Thanks to Jacob Jacob for reporting

compile failed for the famfs as the extent_type name has been changed to famfs_extent_type.

compile failed as the error:
#make all
make[3]: Entering directory '/root/famfs/debug'
[ 3%] Building C object CMakeFiles/libfamfs.dir/src/famfs_lib.c.o
/root/famfs/src/famfs_lib.c:423:28: warning: ‘enum extent_type’ declared inside parameter list will not be visible outside of this definition or declaration
423 | enum extent_type *type)
| ^~~~~~~~~~~
/root/famfs/src/famfs_lib.c:421:1: error: conflicting types for ‘famfs_get_device_size’; have ‘int(const char *, size_t *, enum extent_type *)’ {aka ‘int(const char *, long unsigned int *, enum extent_type *)’}
421 | famfs_get_device_size(const char *fname,
| ^~~~~~~~~~~~~~~~~~~~~
In file included from /root/famfs/src/famfs_lib.c:35:
/root/famfs/src/famfs_lib.h:21:12: note: previous declaration of ‘famfs_get_device_size’ with type ‘int(const char *, size_t *, enum famfs_extent_type *)’ {aka ‘int(const char *, long unsigned int *, enum famfs_extent_type *)’}
21 | extern int famfs_get_device_size(const char *fname, size_t *size, enum famfs_extent_type *type);
| ^~~~~~~~~~~~~~~~~~~~~
/root/famfs/src/famfs_lib.c: In function ‘famfs_mkfs’:
/root/famfs/src/famfs_lib.c:3812:14: error: variable ‘type’ has initializer but incomplete type
3812 | enum extent_type type = SIMPLE_DAX_EXTENT;
| ^~~~~~~~~~~
/root/famfs/src/famfs_lib.c:3812:26: error: storage size of ‘type’ isn’t known
3812 | enum extent_type type = SIMPLE_DAX_EXTENT;
| ^~~~
/root/famfs/src/famfs_lib.c:3812:26: warning: unused variable ‘type’ [-Wunused-variable]
make[3]: *** [CMakeFiles/libfamfs.dir/build.make:76: CMakeFiles/libfamfs.dir/src/famfs_lib.c.o] Error 1
make[3]: Leaving directory '/root/famfs/debug'
make[2]: *** [CMakeFiles/Makefile2:184: CMakeFiles/libfamfs.dir/all] Error 2
make[2]: Leaving directory '/root/famfs/debug'
make[1]: *** [Makefile:146: all] Error 2
make[1]: Leaving directory '/root/famfs/debug'
make: *** [Makefile:13: debug] Error 2

File creation will hork if a duplicate relative path is created after a rogue delete of that relative path

When famfs creates a file (famfs_mkfile() in the api, normally from 'famfs cp' or 'famfs creat'), it will fail if the file already exists. But if a rogue delete had taken place, and then a cp or creat tried to create the same relative path, it would not see the file in the mounted filesystem - and would proceed to create and log the file instance.

A rogue delete is any 'rm' that did not occur through the famfs api/cli - and since the api/cli currently does not support delete, it's any delete.

This would effectively make a mess of things.

The log would contain two file creates of the same relative path, with different allocation
Logplay would instantiate the first, and ignore the second
The Master node would have the second file (hey, only the master can create files), but any client that has not experienced a rogue delete would have the first file (which would not map to the same memory as the "rogue create" on the master).

This could be solved by building a hash table of relative paths during logplay (master only), and 1) detecting relative path collisions in the log, and 2) detecting famfs_mkfile() or famfs_mkdir() calls that generate relative path collisions in the log (which are detectable via the mounted namespace if there have been no rogue namespace operations)

One downside to this is that it will make the O() order of file creation worse; it's already kinda expensive because space allocation plays the log to get the free/available bitmap - which is not persisted. There is not an "easy" way to persist the hash table either, so that might need to be re-generated on each [batch of] file create. (batches because 'cp *' and 'cp -r' and 'mkdir -p' lock the log and build the bitmap once for a batch of creates).

Hmm. we could persist the bitmap in a new meta file, and only expose that on the master. Even the hash table could be handled that way too. Extend the flock(log) to cover those files, and it may be a fully working approach worth considering...eventually.

This has not been observed in the wild; Will put this "on ice" initially, but may need to

pcq enhancements - enabling useful automated testing

When consuming an entry from a queue, if the crc is bad, should invalidate the cache (for the entry) and retry the consume
Add a -s|--status to print status during the run; need status_worker() thread
Keep stats on the number of times the producer sees a full queue
Keep stats on the number of times a consumer sees an empty queue
Keep stats on the number of times a consumer needs to retry reading a valid entry due to crc mismatch (which is either a bad entry or a cache incoherency occurrence)
pcq --setperm option to set permission for p|c|b|n (producer, consumer, both, neither) to help with tests

cxl-micron-reskit / famfs Goto Github PK

famfs's Issues

Recommend Projects

Recommend Topics

Recommend Org