Git Product home page Git Product logo

Comments (9)

lczech avatar lczech commented on July 30, 2024

Hey Kaiku,

woooow you really dug deep there!

Your first question: The PID is the internal number that the operating system (at least Unix/Linux/Mac - bit different on Windows) uses to identify a process or program that you are running. It's just the way that programs are internally organized and addressed. If you want to end a process externally for example (because it might be stuck, or something), you can usually call kill 1234 with the PID there.

Your second question: From what you wrote, I really have no idea where the problem is coming from. I don't think that filtering for existing PIDs in the slurm script is the way to go though. This seems to be an issue within the trimmomatic script itself. The cluster and slurm are not concerned with the internals of a rule once it is running. Or why did you come to the conclusion that slurm is involved at all?

Based on the error that you reported, I'd instead look into the trimmomatic rule, and see where and why it is breaking. The exact code for the rule execution is here, which you can copy into a script, and instead of having the rule execute the wrapper script, have it execute your copy of the script. You can then add debugging log messages to it to see how far it gets and where it breaks.

As a pre-test though to see if this is actually being caused by that particular rule, you could try to use another trimming tool, and see if that works. That would then hint at the trimmomatics script being the issue. If that however also fails with a similar error, then I'd say you are right in the assumption that the error is coming more from a snakemake/slurm thing.

Hope that helps, cheers
Lucas

from grenepipe.

lczech avatar lczech commented on July 30, 2024

Oh wait, there is more:

  • If you have access to another cluster (Carnegie?), you could try that as well, and see if you get similar issues there.
  • Also, if you think slurm is the issue, you can instead get an interactive slurm session on the cluster with some resources, and run the pipeline within there - executing it with --cores instead of --profile so that it runs "locally" within that interactive session, instead of submitting the rules as jobs. Failure or success of that would also give strong hints into where this error is coming from.

Lucas

from grenepipe.

Kkaholoaa avatar Kkaholoaa commented on July 30, 2024

Hi Lucas,

Sorry for the late reply, I really wanted to try to troubleshoot this on my own but it seems I hit a wall - probably because of my lack of computational knowledge. The good news is I was able to identify that the issue seems to be with my wrappers, and I would really appreciate any advice you have for me to troubleshoot this...

To begin, I switched trimming tools from trimmomatic to adapterremoval, and it worked! At the same time, I also started receiving errors for fastqc:

Screenshot 2023-05-19 at 1 10 24 AM

By looking into the fast qc rule, I found yet another PID error:

Screenshot 2023-05-19 at 12 37 13 AM

Per your suggestion, I also ran grenepipe locally using --cores instead of --profile, and still came across the PID issues, meaning that it is probably not slurm that is causing this issue.

As it turns out, both trimmomatic and fastqc use snakemake wrappers, while adapter removal does not (it uses a shell script instead). This tells me that the error might have something to do with snakemake wrappers, or the way that slurm is interacting with them.

Here, the PID errors point to the snakemake wrapper in both Trimmomatic and Fastqc, but with adapter removal there is no snakemake wrapper and the trim was successful:

FastQC:
Screenshot 2023-05-19 at 1 11 49 AM

Trimmomatic:
Screenshot 2023-05-19 at 1 12 05 AM

Adapter Removal:
Screenshot 2023-05-19 at 1 12 17 AM

To explore more into this issue, I attempted to use debugging codes by having print functions (simply added in echo statements to get more info on why the wrappers dont work... but I don't really understand debugging just yet), so that hopefully I could identify the location in which in both wrappers (trimmomatic and fastqc) errors occurred. Although my debugging didnt work at all, I did find that grenepipe also began to map my trimmed reads from adapter removal onto my reference genome. This then led to the discovery that BWA also does not work, and with no surprise, yet another PID error that points to a snakemake wrapper:

Screenshot 2023-05-19 at 12 48 16 AM

As a result, I am convinced the error is with snakemake wrappers, but google does not do a very good job as to explaining the issue nor any potential ways to resolve it, and this is where I need your help.

Here is some of my environment information, and I noticed that the version of my snakemake wrapper utils is not the version recommended on the snakemake website, and I also noticed that everything in grenepipe was created by conda forge except for snakemake, which was created by bio conda.
Screenshot 2023-05-19 at 1 02 58 AM

I also found some links that may be helpful, where some people are saying that their cluster does not support internet access (which snakemake requires), and another different post that describes that subworkflows have been depreciated in favor of snakemake modules (unsure if this is relevant here).

https://stackoverflow.com/questions/76267968/snakemake-wrappers-not-working-on-slurm-cluster-compute-nodes-without-internet

https://snakemake.readthedocs.io/en/stable/snakefiles/modularization.html

Finally, I'd just thought to send this over just in case, but here is my snakemake code that I am using to run grenepipe... maybe there is an error on the way I am running grenepipe??? For context, I use tmux to create a new session, then I activate the grenepipe environment, and finally I use the follow code to run grenepipe from within the grenepipe directory, but pointing to my analysis directory where both my config.yaml and my sample_table.tsv files are located.

Screenshot 2023-05-19 at 1 06 22 AM

Overall, do you have any advice for me at this time? Anything would be appreciated into how to resolve this snakemake wrapper issue...

Thank you!
Kaiku

from grenepipe.

lczech avatar lczech commented on July 30, 2024

Hey Kaiku,

interesting, there is some progress there! Your thorough analysis helps, but it has one flaw, as seen in one of your screenshots:

image

The call to the Snakemake wrapper in the FastQC rule is actually commented out, and we are using a local script instead there.

Hence,

As a result, I am convinced the error is with snakemake wrappers

might not be the case ;-)
I think that the error is some part of the script crashing due to some other error, so that something stops running, and the PID error is just the visible effect of that, but not the cause.

What you could try: Following the Cluster Troubleshooting steps, locate the .err and .out slurm log files of the rule that crashed. In your case, for example from your screenshot

image

you'd want to look for the slurm log files for job IDs 19203847 and 19203864, and see if there are any hints in there. Maybe out of memory, or some other trouble with the program. It is admittedly weird that two different tools kind of crash with the same consequences (PID not found), but it might be that they both crash for unrelated reasons, and the PID thing is just the effect that this has on Snakemake. Not sure at this point. If you could post those err and out files, we might find more hints.

Cheers
Lucas

from grenepipe.

Kkaholoaa avatar Kkaholoaa commented on July 30, 2024

Hi Lucas, apologies for not making it clear, but the PID errors are from the slurm logs (see below), and the normal logs are coming up as empty.

Slurm log 19203864:
Screenshot 2023-05-19 at 1 08 47 PM

Slurm log 19203847:
Screenshot 2023-05-19 at 1 10 09 PM

Logs > FastQC is filled with a bunch of empty logs:
Screenshot 2023-05-19 at 1 11 04 PM

I agree that it seems like something weird is happening on the back end, and hopefully there is somewhere else that I could look for hints?

Please let me know what you think,
Thank you!
Kaiku

from grenepipe.

lczech avatar lczech commented on July 30, 2024

Hm, so, this is not an issues with the wrappers (as per my answer above), and not caused by a tool in the scripts or wrappers failing directly with any usable error... It could be an issue with the cluster itself - some misconfiguration or limitation that we are not aware of. Could you try running it on the Carnegie cluster instead?

Other than that, I have no good idea as of now, unfortunately. We can debug together once we are both back in the same room ;-)

from grenepipe.

Kkaholoaa avatar Kkaholoaa commented on July 30, 2024

Hello, after running this on the Carnegie cluster instead of Stanford's Sherlock Cluster, the pipeline now works!!

Screenshot 2023-05-23 at 1 49 17 PM

Although it works on the Carnegie Cluster, I will attempt to work with the Stanford IT staff to get this issue resolved on the Sherlock cluster, and will post our findings here just in case any other clusters out there may have the same issue...

Thank you so much!
Best,
Kaiku

from grenepipe.

lczech avatar lczech commented on July 30, 2024

Awesome, so then this does not seem to be a grenepipe issue per-se, but more related to some cluster infrastructure or configuration. I'll close the issue here then - but feel free to add further findings. As you said, it might be relevant for others as well. And we can re-open should it turn out to be related to grenepipe after all ;-)

Thanks again for your persistence, and curious to hear more on this story, if you ever find out what's going on there!

from grenepipe.

lczech avatar lczech commented on July 30, 2024

Awesome, so then this does not seem to be a grenepipe issue per-se, but more related to some cluster infrastructure or configuration. I'll close the issue here then - but feel free to add further findings. As you said, it might be relevant for others as well. And we can re-open should it turn out to be related to grenepipe after all ;-)

Thanks again for your persistence, and curious to hear more on this story, if you ever find out what's going on there!

from grenepipe.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.