utsaslab / splitfs Goto Github PK

SplitFS: persistent-memory file system that reduces software overhead (SOSP 2019)

Home Page: https://www.cs.utexas.edu/~vijay/papers/sosp19-splitfs.pdf

License: Other

Makefile 0.45% C 96.65% C++ 0.19% Shell 0.41% Perl 0.13% Python 0.15% Roff 0.07% M4 0.28% CMake 0.01% HTML 0.01% TeX 0.05% Batchfile 0.01% Java 0.14% Lua 0.01% Assembly 1.40% Awk 0.01% GDB 0.01% sed 0.01% Yacc 0.04% Lex 0.01%

ext4-dax file-system non-volatile-memory persistent-memory persistent-storage posix

splitfs's People

Contributors

Stargazers

Watchers

splitfs's Issues

About the relink_v4.13.patch

If I want to use SplitFS normally，should I use the patch order to install the relink_v4.13.patch?

Append/writes recovery failure due to inconsistent inode numbers

The append recovery logic currently depends on inode numbers of the file and the staging file stored in the append log. But the inode number of the file may change in some scenarios upon recovery.

Consider the following example happening in order (SplitFS Strict mode):

file1 is created with size 0. Let its inode number be 1. A LOG_FILE_CREATE operation is created in oplog
An append operation is done on file1. The contents are written on a staging file with say inode number 2
Also, an append log entry is created storing source (1) and destination (2) numbers.
There's a crash (power failure) and there was no fsync. Lets assume it crashed after the append/write call returned to the application.

During recovery, the following happens:

Op log recovery attempts to from step 1 in example attempts to re-create the file (file1 is lost due to lack of fsync, thus relies log recovery) via ext4-dax. This inode number is not guaranteed to be 1. Lets say it is 3 now.
Append log recovery attempts to relink file with an invalid inode (1) and inode (3) and thus the append is lost.

To fix this, one solution that I could think of is to keep track of old and new inode numbers during op log recovery by creating a mapping between old and new inode numbers. During append log recovery use the new inode in place of the old one by examining the mapping.

NVP_MSG (7564): mmap failed

I run SplitFS for varmail of filebench in Centos 7(kernel 4.13) and I get the error below
NVP_MSG (7564): mmap failed for Cannot allocate memory, mmap count 65367, addr -1, errno is 12 NVP_MSG (7564): Open count 1, close count 0 filebench: fileops_nvp.c:2917: nvp_get_dr_mmap_address: Assertion 0' failed.`

I use the DC Optane PM (256G) and don't use PM Emulation just describe in https://github.com/utsaslab/SplitFS/blob/master/experiments.md/#kernel-setup

How to fix this problem?
Thanks.

Ensure SplitFS cleans up files correctly on shutdown

Port fsstress to SplitFS

This requires ensuring that SplitFS correctly handles all the system calls made by the application.

Would be useful to run fsstress on SplitFS.

Change CLFLUSHOPT to CLFLUSH if CLFLUSHOPT is not supported

SplitFS uses CLFLUSHOPT currently.

Check if the machine on which SplitFS is compiled supports CLFLUSHOPT (https://hjlebbink.github.io/x86doc/html/CLFLUSHOPT.html) and if not, use CLFLUSH instead.

Port the POSIX test suite to SplitFS

This requires ensuring that SplitFS correctly handles all the system calls made by the application.

Would be useful to run the POSIX test suite on SplitFS.

Another candidate application is fsstress.

Ensure SplitFS works correctly with multi-threaded applications

A good candidate application will be Filebench suite with multiple threads

Compiling SplitFS with other kernel versions

There is any posibility to make SplitFS run with linux kernel version 5.7 (for example)?
I guess I'll have to compile the right kernel along with the external modules needed for NOVA, PMFS.
Do you think it's possible? How difficult it might be to ajdust the exist files to work with another kernel?
Any recommendation?

Thank you, Yehonatan.

tpcc of sqlite does not work with multithreaded mode

Hi, I ported your implementation of tpcc benchmark for sqlite.
Firstly I tested the benchmark by loading with warehouse number = 4 as below:
./build-release/tpcc_load -w 4 -d tpcc.db
The code compiles and runs with no problems when I set database connection to be 1 like below:
./tpcc_start -w 4 -c 1 -t 2000 -d tpcc.db.

However, when I set the database connection to be 2 like below:
./tpcc_start -w 4 -c 2 -t 2000 -d tpcc.db.
There will occur constraint error:

CHECKING IF SQLITE IS THREADSAFE: RETURN VALUE = 1
***************************************
*** ###easy### TPC-C Load Generator ***
***************************************
option w with value '4'
option c with value '2'
option t (number of transactions) with value '2000'
option d with value 'tpcc.db'
<Parameters>
  [warehouse]: 4
 [connection]: 2
     [rampup]: 10 (sec.)
    [measure]: 20 (sec.)
RAMP-UP TIME.(10 sec.)
thread_main: opening db, thread id = 140629658339072
thread_main: opening db, thread id = 140629649946368
thread_main: opened db, thread id = 140629658339072
thread_main: opened db, thread id = 140629649946368
neword 0:4
neword: error: UNIQUE constraint failed: orders.o_id, orders.o_d_id, orders.o_w_id
neword 0:4
neword: error: UNIQUE constraint failed: orders.o_id, orders.o_d_id, orders.o_w_id
neword 0:4
neword: error: UNIQUE constraint failed: orders.o_id, orders.o_d_id, orders.o_w_id
....
error at thread_main
thread_main: error: cannot commit - no transaction is active

Thank you so much and I really appreciate any idea about possible reasons of what is happening

ext4/kernel changes commits

Hello SplitFS Team,

Are there direct commits/patches which we could check to see to understand what are the ext4/kernel side of the changes required for SplitFS? Could you point to those commits/patches, instead of the whole kernel pls?

About supporting fio

When I run fio on splitfs, it always fails in _hub_FOPEN("/proc/self/status") and gets
NVP_ASSERT(_hub_managed_fileops != NULL) failed!

It seems that the libnvp.so is not loaded, but filebench and other shell commands such as "ls, df" can be intercepted successfully by splitfs.

I also noticed that the filebench code under splitfs is almost unmodified.
How should I start to modify fio to run on splitfs?

Support different number of cores

The current lock-mechanism assumes the machine consists of 16 CPU cores (in a hard-coded fashion).
When SplitFS run over machines with a greater amount of cores, it crashes whenever the application runs on a core with cpuid>=16.
The attached files are a suggested correction.
SplitFS-cores-issue.zip

Support RocksDB key-value store

Enable RocksDB to run on top of SplitFS

Missing fsync after creation of operation log file

After creating the operation log file, we do not seem to be doing an fsync.

This does not guarantee the creation of operation.log and could end up in missing metadata information (about the presence of the op log file) and hence loss of any op logs entries.

This was observed when a crash was simulated after creation of a file /mnt/pmem_emul/foo but upon restart, the operation.log file was missing. Adding an fsync resolved the presence of operation.log issue.

PS: The same is true for append.log

Segmentation fault under filebench workloads

Environment: Splitfs, Optane DC PM, Ubuntu 18.04 LTS, glibc 2.27, gcc 7.5.0

When I run the filebench workloads (varmail, fileserver, webserver, webproxy) using the scripts/filebench/run_fs.sh, it always gets Segmentation fault (core dumped).
Although varmail, fileserver and webproxy can still complete the tests and show results, the results are doubtful because the performance is significant lower than that of ext4-dax.
Webserver generally crash immediately...
It seems there is nothing to do with the workload data size...

I setup Splitfs exactly following the steps, I only change the NVP_NUM_LOCKS from 32 to 144 because my machine has 72 logical CPUs.

Fix SplitFS to pass all tests in the POSIX Test Suite

Right now, a small number seem to be failing.

cc: @OmSaran

Details on git and tar microbenchmark

Hello,
Would it be possible to release the details of the git and tar measurements (Fig 6, c/f/i in the sosp'19 paper)?
What input file(s) and commands are executed for those workloads?
Thanks !

Checksum calculation

Some issues on checksum calculation

Looks like we are calculating the checksum based on the in-memory contents of op log struct here. When the entry_size value goes beyond the memory boundaries of the structure I think it will be an error (It should ideally point to the next information i.e., fname1).
Null string termination: The string procured needs to be null terminated since the buffer is not guaranteed to be zeroed.

While this does not cause issues if there is no crash, it causes issues when trying to recover from a crashed state.

Support WiredTiger on SplitFS

Support any system/library calls needed by SplitFS to handle WiredTiger

Rename crash conssistency

There appears to be a bug in rename atomicity in SplitFS.

Occurs in the following sequence of operations:

Create file1
Create file2
rename file1 -> file2

SplitFS does the following during recovery:

Re-create file1 (since after rename file1 is lost)
Do nothing since file2 exists
Skip rename (step 3) since file2 exists.

End state of the filesystem leaves it in an inconsistent/non-atomic state where file1 and file2 exist after recovery while file2 only exists before recovery.

Support for using a closed file descriptor as a new file descriptor in dup2

When a closed file descriptor is attempted to be used as a new file descriptor in a dup2 call, hub throws an error.

Here's a sample snippet to reproduce the issue.

#define FILE_PATH "/mnt/pmem_emul/test.txt"
#define FILE_PATH2 "/mnt/pmem_emul/test2.txt"

int main() {

  // Open the file
  int fd = open(FILE_PATH, O_CREAT | O_RDWR);
  int fd2 = open(FILE_PATH2, O_CREAT | O_RDWR);
  assert(fd >= 0);
  printf("fd = %d\n", fd);
  printf("fd2 = %d\n", fd2);

  // Write something onto it.
  char buf[] = "Writing to test.txt!\n";
  write(fd, buf, strlen(buf));

  // Close the file
  close(fd);

  // Try to dup2 using fd
 int ret =  dup2(fd2, fd);
 if(ret < 0) {
   perror("Failed to dup2!\n");
   exit(2);
 } else {
   // Write something onto test2 file using duped fd
   char buf2[] = "Writing something to test2.txt!\n";
   write(fd, buf2, strlen(buf2));
 }

 close(fd);
 close(fd2);
}

When a file is closed, the corresponding fileops lookup table in hub (_hub_fd_lookup[new_fd]) is set to NULL.
In dup2, there is a check for whether it is NULL for the new file descriptor, and fails if this check is true.

Not sure why the fileops is set to NULL for the file descriptor being closed.
Shouldn't it be set to the original state i.e use _hub_fileops (as done in hub_check_resolve_fileops function)?

Am I missing something here?

failure in multi-thread write workloads of fio

I am trying to test splitfs in strict mode.

export LEDGER_DATAJ=0
export LEDGER_POSIX=1

However, in this mode, splitfs always encounters

fio: tbl_mmaps.c:424: clear_overlapping_entry: Assertion `0' failed.

while running in multi-thread write workloads of fio. In these workloads, different threads share a single file. And we also find the same problem in single-thread randwrite workload and readwrite-mixed workload. The basic jobfile is as follows.

[global]
directory=/mnt/pmem_emul
filename=test
ioengine=sync
rw=randwrite
filesize=128m
bs=4k
thread=1
runtime=15
time_based=1

[job_0]
[job_1]
[job_2]
[job_3]

Add unit tests to ensure SplitFS is working correctly

Add unit tests to ensure features of SplitFS, such as logging, are working correctly. These would supplement current higher-level checks for correctness.

NVP_MSG (6575): Can't add fileop hub: one with the same name already exists at index 0

Hello!
When I run the script named run_fio.sh, the error like this occurs:

NVP_MSG (6575): Initializing hub_init
NVP_MSG (6575): Can't add fileop hub: one with the same name already exists at index 0
NVP_MSG (6575): Tried to use _hub_ fileops, but they weren't initialized!  BLARG

And the codes of error are NVP_ERROR (pid 6575): NVP_ASSERT(0) failed! and NVP_ERROR (pid 6575): NVP_ASSERT(_hub_fileops != NULL) failed!.
Moreover, I found that the file /lib64/libc.so.6 is lost and the version of glibc is glibc 2.23, but I found libc.so.6 in the directory /lib/x86_64-linux-gnu. Does the lost file cause the problem above? If not, how can I solve this problem?
Looking forward to your reply!
Thanks!

Clean up and organize code better

The code right now has excessive nesting (at least for scripts) and carries over the old naming scheme.

For example, nvp should be usplit.

Installing dependencies right now requires

cd dependencies; ./kernel_deps.sh; cd ..

I would like to be something like
./kernel_deps.sh

Implement SplitFS using system-call interception rather than library calls

Use the system-call intercept library: https://github.com/pmem/syscall_intercept in SplitFS instead of intercepting glibc calls. This will make SplitFS more portable and hopefully more robust (different glibc variants will have the same system call implementation).

utsaslab / splitfs Goto Github PK

splitfs's People

Contributors

Stargazers

Watchers

Forkers

splitfs's Issues

Recommend Projects

Recommend Topics

Recommend Org