Git Product home page Git Product logo

Comments (18)

mpartel avatar mpartel commented on July 16, 2024 1

FUSE devs don't want to treat this non-atomicity as a bug and told me to implement access instead. I'll try to find time to do that soon.

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

That's weird.

Some things to check and try:

  • bindfs never returns EACCES for open by itself, but it can forward any error from the underlying FS. See man 2 open for possible reasons. Is it e.g. possible that your Python program changes its working directory at some point? (Does strace show a chdir?)
  • Run bindfs with -d to get debug output. If you can't see open or create calls after the failures start, then the calls are rejected somewhere at the FUSE / kernel level.
  • If you're on FUSE 2, try compiling bindfs with FUSE 3 (if available for your distro).

from bindfs.

G3zz avatar G3zz commented on July 16, 2024

Thanks for your response, I'll try some of these out and respond back in due cause

from bindfs.

G3zz avatar G3zz commented on July 16, 2024

Just to say I have been looking into this. The strace and bindfs -d logs don't scream anything unusual to me - there is no chdir. The error causes the Python program to shutdown so it's not possible to get much information from bindfs -d - although right up until the EACCES error it looks to be working fine.

The program (which I can't share unfortunately) includes logic to monitor the disk space of the directory (using os.listdir("relative/path"), so results in quite frequent calls to stat64. I can see in the strace logs that the file that eventually causes the EACCES error is returned by stat64 many times successfully previously, leading me to wonder if there is some kind of rate-limiting or something going on?

I realise this isn't much to go on, but I copied the python code into the mirror directory and ran it hundreds of times without issue. I still have to try with FUSE 3, but otherwise I don't think I will be able to continue using bindfs unfortunately.

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

Thanks for reporting back. It's a weird one. No new ideas unfortunately, besides checking dmesg if you haven't yet.

from bindfs.

wentam avatar wentam commented on July 16, 2024

Dealing with what I believe to be the same issue when syncing large directories with unison on top of bindfs.

Came up with a simple bash reproduction.

Be advised this is very hard to reproduce on my fast NVME drive. Much easier on a slower SATA SSD.

It's also much easier to reproduce with large real-world directories, though I'm using lots of empty files here to keep the reproduction simple and contained.

Setup:

mkdir source target
bindfs --mirror=user1,user2 source target
mkdir target/sub
cd target/sub && for f in {1..150000}; do touch $f; done

In one terminal in top-level 'target', change to a mirrored user who is not the bindfs process owner and stat really hard:

while true; do find . -print0 | xargs -P 10 -0 stat; done > /dev/null

With the mad-statter™ running: in another terminal in 'target', run the following repeatedly until you see a permission error (I did this as the bindfs process owner):

strace -o /tmp/strace1 rm -f foo; strace -o /tmp/strace2 touch foo

I've seen EACCES produced from openat,unlink, and stat thus far. If it gets bad enough, simply running 'ls' in the target directory will produce errors.

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

Thanks for the repro! Hasn't worked for me yet, but I'll try it on a HDD later.
Which distro and FUSE version is this?

from bindfs.

wentam avatar wentam commented on July 16, 2024

Thanks for the repro! Hasn't worked for me yet, but I'll try it on a HDD later. Which distro and FUSE version is this?

NixOS
fuse 2.9.9
bindfs 1.17.1

Be sure that you've created the files in a subdirectory of target and you're running while true; do find . -print0 | xargs -P 10 -0 stat; done > /dev/null in the top-level of the target (not in the subdirectory), and as a different user than the bindfs process (of which is in the mirror list).

No idea why, but those conditions seem to need to be met for the error to be produced.

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

Thanks!
If possible, please try compiling bindfs against FUSE 3.x.

Actually I suspect the disk's speed doesn't matter for the rerpo, since it should all be cached by the kernel. Likely I'll need to try this on a slower computer or VM 🤔

from bindfs.

wentam avatar wentam commented on July 16, 2024

Thanks! If possible, please try compiling bindfs against FUSE 3.x.

Actually I suspect the disk's speed doesn't matter for the repo, since it should all be cached by the kernel. Likely I'll need to try this on a slower computer or VM thinking

Tried with fuse 3.11 (This is as simple as bindfs.overrideAttrs (old: { buildInputs = [ fuse3 ]; }) with NixOS). Identical behavior, still producing errors.

The device I've primarily been testing this on is indeed on the slower end, but nothing crazy slow (intel i5-7300U DDR4).

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

Doesn't repro on a very slow AMD GX-412TC / Debian 11 either.

I've seen EACCES produced from openat,unlink, and stat thus far.

This still feels like something outside bindfs is doing some questionable caching, but I don't know how to get at it. Try -o entry_timeout=0 maybe? (man mount.fuse3 has options that ~every FUSE FS supports.)

from bindfs.

wentam avatar wentam commented on July 16, 2024

This still feels like something outside bindfs is doing some questionable caching, but I don't know how to get at it. Try -o entry_timeout=0 maybe? (man mount.fuse3 has options that ~every FUSE FS supports.)

No change with -o entry_timeout=0.

With a bit of tweaking, I've managed to create a script that quickly and reliably produces this problem on every machine I've tried, including fast and slow ones: https://gist.github.com/wentam/3a175c71a2f535e1606bb40e0f1aef58

The script will handle everything, including starting bindfs. Using a script like this also helps eliminate any slight differences in setup that could matter.

After setup stage completes, takes less than 1s on every machine I've tried. Also tried on an arch linux machine, still reproduced the error.

$ cd /tmp/work/
/tmp/work/ $ sh bindfs-120-repro.sh testdir [some_other_user]

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

Thank you for the very helpful script! Not sure what I did wrong previously, but the error now happens on my slow machine (Debian 11, kernel 5.10.0-21 fuse 2.9.9). But it does not happen my fast machine (Pop_OS 22.04, kernel 6.0.12, fuse 3.10.5). Both have the latest bindfs from git.

The kernel still seems to get a steady stream of FUSE-related changes, so I'm tempted to suspect a bug in older kernels. What kernel version are you running?

from bindfs.

wentam avatar wentam commented on July 16, 2024

What kernel version are you running?

Kernel is 6.1.9 (and have also repro'd on devices with many different kernel versions)

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

Ok, I've now seen it not repro in a Debian 11 VM (or Ubuntu 22.04 or 22.10) on the fast machine where it doesn't repro natively either. So the repro seems to be machine-dependent rather than software version dependent.

I'll see if I can find a more minimal test case...

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

I think I now have pretty compelling evidence that this is a FUSE bug.

I modified FUSE's passthrough example's to do user mirroring by adding the line

stbuf->st_uid = fuse_get_context()->uid;

to xmp_getattr, and I modified your script to run passthrough instead of bindfs.

If I run the modified passthrough with -odefault_permissions (which bindfs must add automatically for correctness), it fails your test. Without that flag, or without that modification, it doesn't fail.

I'll package this up as a bug report to FUSE, but not sure yet if I can get that done today.

from bindfs.

G3zz avatar G3zz commented on July 16, 2024

I've been following along with interest. I tried running with FUSE 3 on the same machine (a Raspberry Pi 4) and continued to replicate the issue. For my use case I have resorted to other techniques to sharing files between users, but I'm very happy to test this issue when a solution is found.

Thanks,
Geraint

from bindfs.

mpartel avatar mpartel commented on July 16, 2024

Thanks, I'm hopeful about finding the time in a few weeks. This probably just needs a few more hours of good concentration, but those have been in short supply :/

from bindfs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.