nvsl / pmfs-new Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 25.0 112 KB

Porting PMFS to the latest Linux kernel

Makefile 0.14% C 99.45% Shell 0.41%

pmfs-new's People

Stargazers

Watchers

pmfs-new's Issues

Bug in file system initialization

Hi,

I'm not sure if this repo is still maintained, but I have a bug to report in the pmfs_init() function in super.c. When the directory entries in the root directory are set up around line 452, the "." entry is never explicitly flushed to PM. If the system loses power before this entry becomes durable, the root directory is inaccessible on the next mount and attempting to ls it causes the fs to hang. This issue is fixed by adding one flush after setting the values of this entry; I can submit a PR with a patch in a bit.

I can not format the device dev/pmem0 as pmfs

I am sorry to disturb you that I have met a issues about pmfs-new.
According to the "README.md", after rebooting and inserting the module, we can initialize a PMFS instance with the following command:#mount -t pmfs -o init /dev/pmem0 /mnt/ramdisk
But the Terminal reports that mount: wrong fs type, bad option, bad superblock on /dev/pmem0,missing codepage or helper program, or other error
When I was trying to format the device dev/pmem0 as pmfs, the Terminal reports thatmkfs: failed to execute mkfs.pmfs: No such file or directory.
How can I solve this problem?

Crash consistency bug with truncate

Hi,

I believe I've found a crash consistency bug that can be triggered by a crash while using truncate() to reduce file size. I found it by starting with a new, empty file system, creating a file under the root directory, writing 4KB to the file, then truncating the file's size to 0. The bug prevents the file system from being mounted.

Specifically, the bug occurs if the system crashes after adding the inode's number to the truncate list in pmfs_truncate_add() in inode.c and before the inode's root pointer, number of blocks, etc. are updated. When the system is mounted again, it attempts to finish up the truncate operation on the inode. I believe issue arises from the fact that this operation relies on a list of in-use blocks (stored in the pmfs_sb_info struct with the block_inuse_head` pointer), which lives in DRAM. This list is not rebuilt until AFTER recovery attempts to finish truncation, so the truncate functions can't access it. Ultimately the bug is triggered by an assertion on line 70 of balloc.c that checks whether the in-use list is empty.

I think this bug can be fixed by swapping the functions that recover the truncate list and rebuild the DRAM structures in pmfs_fill_super, although I am not sure yet that this fix wouldn't introduce any other bugs.

passing argument 1 of ‘bdev_dax_supported’ from incompatible pointer type

Hi ！
I am trying to use PMFS . My kernel version is 4.15. But I met some problem when I run make. It reports that :

/home/parallels/PMFS-new/super.c: In function ‘pmfs_get_block_info’:
/home/parallels/PMFS-new/super.c:110:27: error: passing argument 1 of ‘bdev_dax_supported’ from incompatible pointer type [-Werror=incompatible-pointer-types]
ret = bdev_dax_supported(sb, PAGE_SIZE);
^~
In file included from /home/parallels/PMFS-new/super.c:35:0:
./include/linux/dax.h:44:20: note: expected ‘struct block_device *’ but argument is of type ‘struct super_block *’
static inline bool bdev_dax_supported(struct block_device *bdev, int blocksize)
^~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors

So I tried to auto change it's type by

ret = bdev_dax_supported((struct block_device *)sb, PAGE_SIZE)
But It lead to bug when mounting pmfs. The dmesg shows

[ 119.700620] RIP: __bdev_dax_supported+0x8b/0x237 RSP: ffffa5dcc3527cb0

Could you give me some help？
Thanks

Does not support O_APPEND in write

This version of PMFS does not support append since it does not have the check file->f_flags & O_APPEND.

Compared to NOVA:

https://github.com/NVSL/linux-nova/blob/976a4d1f3d5282863b23aa834e02012167be6ee2/fs/nova/file.c#L666-L667

https://github.com/NVSL/linux-nova/blob/976a4d1f3d5282863b23aa834e02012167be6ee2/fs/nova/dax.c#L607-L608

KASAN bug in recovery

Hi,

I believe PMFS has some kind of memory bug that is triggered during recovery. It can be reproduced by compiling PMFS with KASAN/running in a kernel with KASAN and doing the following steps:

Create and mount a fresh instance of PMFS
Use dd to copy out the contents of the PM device to an image file
Unmount PMFS
Copy the contents of the image file back onto the PM device
Mount PMFS

This results in the following KASAN report:

[  115.019103] ==================================================================
[  115.020393] BUG: KASAN: slab-out-of-bounds in find_next_bit+0x67/0xb0
[  115.021204] Read of size 8 at addr ffff8881e2bdb100 by task mount/343
[  115.021937] 
[  115.022096] CPU: 0 PID: 343 Comm: mount Tainted: G            E     5.1.0+ #437
[  115.022808] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[  115.023696] Call Trace:
[  115.023989]  dump_stack+0x94/0xd8
[  115.024329]  ? find_next_bit+0x67/0xb0
[  115.024854]  print_address_description+0x78/0x290
[  115.025424]  ? find_next_bit+0x67/0xb0
[  115.025871]  ? find_next_bit+0x67/0xb0
[  115.026316]  kasan_report+0x149/0x18c
[  115.026750]  ? find_next_bit+0x67/0xb0
[  115.027196]  __asan_load8+0x54/0x90
[  115.027615]  find_next_bit+0x67/0xb0
[  115.028049]  __pmfs_build_blocknode_map+0x54/0x479 [pmfs]
[  115.028703]  pmfs_setup_blocknode_map+0x524/0xcb9 [pmfs]
[  115.029330]  ? _raw_write_trylock+0xe0/0xe0
[  115.029809]  ? pmfs_save_blocknode_mappings+0x90e/0x90e [pmfs]
[  115.030541]  ? kasan_check_write+0x14/0x20
[  115.031096]  ? d_flags_for_inode+0x8f/0x140
[  115.031649]  ? kasan_check_read+0x11/0x20
[  115.032170]  ? d_instantiate+0x73/0x90
[  115.032658]  pmfs_fill_super+0x15b4/0x1af0 [pmfs]
[  115.033247]  ? pmfs_check_integrity+0x3d7/0x3d7 [pmfs]
[  115.033943]  ? snprintf+0xbd/0xf0
[  115.034395]  ? vsprintf+0x40/0x40
[  115.034822]  ? sget_userns+0x16a/0x330
[  115.035328]  ? test_single_super+0x20/0x20
[  115.035882]  ? set_blocksize+0x125/0x160
[  115.036396]  mount_bdev+0x210/0x270
[  115.036785]  ? pmfs_check_integrity+0x3d7/0x3d7 [pmfs]
[  115.037430]  pmfs_mount+0x39/0x42 [pmfs]
[  115.037896]  ? pmfs_alloc_inode+0x43/0x43 [pmfs]
[  115.038435]  legacy_get_tree+0x76/0xd0
[  115.038882]  vfs_get_tree+0x56/0x1e0
[  115.039306]  do_mount+0xf0e/0x1bb0
[  115.039703]  ? copy_mount_string+0x40/0x40
[  115.040191]  ? check_stack_object+0x56/0x100
[  115.040682]  ? __virt_addr_valid+0xeb/0x130
[  115.041196]  ? kasan_check_write+0x14/0x20
[  115.041668]  ? _copy_from_user+0x5b/0xb0
[  115.042169]  ? memdup_user+0x63/0x90
[  115.042588]  ? copy_mount_options+0x186/0x200
[  115.043101]  ksys_mount+0xb0/0x120
[  115.043505]  __x64_sys_mount+0x6c/0x80
[  115.043948]  do_syscall_64+0x7d/0x1b0
[  115.044385]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  115.044989] RIP: 0033:0x7fc54b66c48a
[  115.045423] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 018
[  115.047300] RSP: 002b:00007ffdab72c078 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[  115.048040] RAX: ffffffffffffffda RBX: 000055f70eb22060 RCX: 00007fc54b66c48a
[  115.048758] RDX: 000055f70eb22240 RSI: 000055f70eb22280 RDI: 000055f70eb22260
[  115.049457] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000020
[  115.050156] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 000055f70eb22260
[  115.050849] R13: 000055f70eb22240 R14: 0000000000000000 R15: 00000000ffffffff
[  115.051577] 
[  115.051757] Allocated by task 343:
[  115.052164]  save_stack+0x43/0xd0
[  115.052558]  __kasan_kmalloc.constprop.8+0xa7/0xd0
[  115.053089]  kasan_kmalloc+0x9/0x10
[  115.053459]  __kmalloc+0x10f/0x230
[  115.053858]  kzalloc+0x1e/0x23 [pmfs]
[  115.054224]  pmfs_setup_blocknode_map+0x3ad/0xcb9 [pmfs]
[  115.054748]  pmfs_fill_super+0x15b4/0x1af0 [pmfs]
[  115.055215]  mount_bdev+0x210/0x270
[  115.055565]  pmfs_mount+0x39/0x42 [pmfs]
[  115.055983]  legacy_get_tree+0x76/0xd0
[  115.056368]  vfs_get_tree+0x56/0x1e0
[  115.056728]  do_mount+0xf0e/0x1bb0
[  115.057071]  ksys_mount+0xb0/0x120
[  115.057414]  __x64_sys_mount+0x6c/0x80
[  115.057791]  do_syscall_64+0x7d/0x1b0
[  115.058160]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  115.058657] 
[  115.058815] Freed by task 0:
[  115.059105] (stack is not available)
[  115.059462] 
[  115.059620] The buggy address belongs to the object at ffff8881e2bda100
[  115.059620]  which belongs to the cache kmalloc-8k of size 8192
[  115.060846] The buggy address is located 4096 bytes inside of
[  115.060846]  8192-byte region [ffff8881e2bda100, ffff8881e2bdc100)
[  115.061999] The buggy address belongs to the page:
[  115.062473] page:ffffea00078af600 count:1 mapcount:0 mapping:ffff8881ed4028c0 index:0x0 compound_mapcount: 0
[  115.063482] flags: 0x17ffffc0010200(slab|head)
[  115.064005] raw: 0017ffffc0010200 dead000000000100 dead000000000200 ffff8881ed4028c0
[  115.064913] raw: 0000000000000000 0000000080030003 00000001ffffffff 0000000000000000
[  115.065801] page dumped because: kasan: bad access detected
[  115.066457] 
[  115.066633] Memory state around the buggy address:
[  115.067183]  ffff8881e2bdb000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  115.068006]  ffff8881e2bdb080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[  115.068849] >ffff8881e2bdb100: 01 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  115.069672]                    ^
[  115.070056]  ffff8881e2bdb180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  115.070916]  ffff8881e2bdb200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  115.071744] ==================================================================

I believe this code runs while remounting a cleanly-unmounted image as well, but this doesn't result in the KASAN bug. I will post an update if I get a chance to look into the root cause in more detail.

Possible crash consistency bug with write()

Hi,

I'm reporting what I believe to be a crash consistency bug in PMFS. Suppose we run the following workload on an empty PMFS file system mounted at /mnt/pmem:

creat /mnt/pmem/foo
write 1 byte to foo
write 8 bytes to foo

When performing the second write with pmfs_xip_file_write(), the if statement at line 380 of xip.c will be entered because the second write is modifying the same block as the first write, and PMFS will use pmfs_file_write_fast() to avoid using a transaction. After writing the data, this function atomically updates foo's size and time fields, and finally flushes the first cacheline of foo's inode using pmfs_flush_buffer(). However, this flush does not include a fence, so these updates to the inode may be reordered with writes from the beginning of the subsequent system call. This can result in losing the data from the second write, even though the write() call is completed.

I'm not 100% sure that this would be considered a bug in PMFS, but it can result in data loss and only requires the addition of a store fence to fix.

Possible data loss if a crash occurs during or after write()

Hi,

I believe PMFS has a bug that can cause some file data to not be properly persisted, potentially resulting in data loss if the system crashes during or after a write(). The conditions that can cause this bug can be seen with this program:
test3.zip, which assumes a fresh PMFS instance has been mounted at /mnt/pmem, and creates a file, then writes 1 byte to the file, and then appends 1024 bytes.

memcpy_to_nvmm() is implemented using __copy_from_user_inatomic_nocache() to write file data to PM. This function ultimately uses __copy_user_nocache(), defined here: https://elixir.bootlin.com/linux/v5.1/source/arch/x86/lib/copy_user_64.S#L205 to perform the non-temporal memcpy. n the documentation for that function, it indicates that __copy_user_nocache() may use temporal memory moves if the destination address or size of the write is not 4 or 8-byte aligned (depending on the size of the entire write). PMFS uses a function pmfs_flush_edge_cachelines() to take care of the beginning and end of the write if they are handled using cached moves.

In the provided test3.cpp program, the first call to write() requires a cache line flush by pmfs_flush_edge_cachelines(), since its size is less than 4 bytes. The second write() (which appends data to the file) requires both the first and last cache line of the copied region to be flushed because although the length of the write is 8-byte aligned, the destination is not. After __copy_user_nocache() handles the first few unaligned bytes, the remaining number of bytes to write will no longer be a multiple of 8, so the final byte needs to be taken care of via a cached move.

If you add some print statements in pmfs_flush_edge_cachelines() to indicate whether each of the if statements/cache line flushes runs, and run test3.cpp, you should see that the second write only has its first cache line flushed, and not its last. This leaves the very last byte of the write un-flushed, and it could be lost in a crash.

nvsl / pmfs-new Goto Github PK

pmfs-new's People

Stargazers

Watchers

Forkers

pmfs-new's Issues

Bug in file system initialization

I can not format the device dev/pmem0 as pmfs

Crash consistency bug with truncate

passing argument 1 of ‘bdev_dax_supported’ from incompatible pointer type

Does not support O_APPEND in write

KASAN bug in recovery

Possible crash consistency bug with write()

Possible data loss if a crash occurs during or after write()

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent