Git Product home page Git Product logo

Comments (12)

dannycjones avatar dannycjones commented on June 25, 2024 1

I was able to circumvent the "bad file descriptor" described above by just copying the file in question to change its access pattern to just read from EBS. I haven't seen this hanging issue in 1.4.0 so I am happy if you want to close this issue for now. We can reopen it if I am able to reproduce it again.

@SamStudio8 It may be worth subscribing to #749 since it looks like the same issue.

from mountpoint-s3.

monthonk avatar monthonk commented on June 25, 2024

Hi @SamStudio8, thanks for reporting the issue. From your backtrace, mountpoint seems to be stuck at mknod operation. We looked into the code and found that there is a possibility of deadlock if mknod and forget are called on files under the same directory. This is something we should fix but we're still not sure weather it is the root cause for your problem or not.

The backtrace you provided is very helpful but it's only for 1 worker thread. Would it be possible for you to get backtrace from all workers in mountpoint, so we can confirm the root cause?

from mountpoint-s3.

monthonk avatar monthonk commented on June 25, 2024

Another thing we're interested in is a way to reproduce the issue. Could you share more about your access pattern that causing the hang so we have better idea how to reproduce it?

from mountpoint-s3.

SamStudio8 avatar SamStudio8 commented on June 25, 2024

@monthonk Thanks for taking a look! I am glad the backtraces were helpful. The machine that I produced these backtraces from has been terminated, but I will capture a backtrace of all workers the next time I see this manifest and update here. With regards to the access pattern, it is hard to characterise specifics that would be helpful for a repro but there are several things going on:

  • consistent access to ranges of a very large file (which should be cached)
  • consistent access to several small files
  • heavy write load copying from EBS volumes to S3 mount to create new files (and directories)

Sorry that isn't much to go on!

from mountpoint-s3.

SamStudio8 avatar SamStudio8 commented on June 25, 2024

@monthonk Backtrace of all threads from a new hang attached.
backtraces.txt

from mountpoint-s3.

monthonk avatar monthonk commented on June 25, 2024

@SamStudio8 thanks again for providing more details on your access pattern! I will take a look at the full backtrace and let you know once we have any updates.

from mountpoint-s3.

monthonk avatar monthonk commented on June 25, 2024

I have looked at the full backtraces you sent but didn't see any forget calls among other file operations. So, there might be another problem with the lock somewhere and still need further investigation.

Anyway, we have just released v1.4.0 which includes a bug fix for the problem with mknod and forget. It might help reduce the possibility of deadlock for you as well since mknod now handle the lock properly. I think it's worth trying the new version!

from mountpoint-s3.

SamStudio8 avatar SamStudio8 commented on June 25, 2024

Thanks @monthonk, way ahead of you on that front! Took 1.4.0 for a spin this morning but we're getting some intermittent "bad file descriptor" errors (we haven't changed anything workload related between 1.3.2 and 1.4.0). I am trying to isolate the cause to see what we're doing wrong.

from mountpoint-s3.

SamStudio8 avatar SamStudio8 commented on June 25, 2024

Just FYI I think I have associated the "bad file descriptor" error to an entry in the mount-s3 log that indicates that we're reading from a closed file handle. I am not sure how this is the case, but I will investigate further and then hopefully we'll be able to reap the benefits of 1.4.0!

from mountpoint-s3.

ahmarsuhail avatar ahmarsuhail commented on June 25, 2024

Hey @SamStudio8, wondering if you were able to try again with v1.4.0 and are still facing this issue?

from mountpoint-s3.

SamStudio8 avatar SamStudio8 commented on June 25, 2024

Hi @ahmarsuhail, thanks for checking in. I was able to circumvent the "bad file descriptor" described above by just copying the file in question to change its access pattern to just read from EBS. I haven't seen this hanging issue in 1.4.0 so I am happy if you want to close this issue for now. We can reopen it if I am able to reproduce it again.

from mountpoint-s3.

ahmarsuhail avatar ahmarsuhail commented on June 25, 2024

Sounds good, thank you! closing for now.

from mountpoint-s3.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.