Comments (11)
Hey Yannick,
so, you installed the grenepipe env via micromamba, then activated it, and then ran snakemake from within that environment? What are the exact commands that you are running there?
Lucas
from grenepipe.
Hei Lucas,
I have to admit to my shame that in this instance, I forgot to activate the grenepipe environment after starting up the new tmux session.
I'll see if I run into an issue this time, but I think it was just me being negligent.
Sorry for any wasted time!
Yannick
from grenepipe.
Haha no worries, it happens :-)
from grenepipe.
So far so good, this time the analysis seems to be working fine!
However, it now crashed with another permission error, which I'm unsure how to interpret.
Here the snakemake log and the bwa_mem log, since there seems to be an error appearing there as well (line 2227).
2024-07-17T151022.207727.snakemake.log
D-1.log
Up to here, everything seems to go well.
Cheerio
Yannick
from grenepipe.
Okay, interesting, a lot is going on here...
First, the error was in sample B-1
, not D-1
. Could you maybe also send me the file logs/mapping/bwa-mem/B-1.log
?
Furthermore, the error message
PermissionError: [Errno 13] Permission denied: '/scratch/fr_yf1009_o05i14'
somehow indicates some issue with that directory. I assume that is some user scratch space or temporary directory? It could be that Snakemake automatically sets this, following some configuration of your cluster - not sure where that exact directory is coming from. But it could for instance be that there is no more space there, or some similar issue. Could you check that?
Furthermore, I am noticing the following warning of grenepipe in the beginning of your log file:
In the reference genome, there are chromosome/contig names that contain problematic characters. As we use these names to create file names, this can lead to crashes later in the pipeline. We generally advise to only use alpha-numeric characters, dots, dashes, and underscores for the reference sequence names for this reason.
Problematic reference genome names:
- lcl|ptg000001l
- lcl|ptg000002l
- lcl|ptg000003l
- ...
The issue here is that your reference genome contains pipe (or bar) characters |
. This will very likely lead to issues down the line, as that is a special character in unix systems, and using it in file names can lead to all kind of trouble. Depending on your settings, grenepipe will however create files for the contigs, named after the contigs, so that might fail.
Hence, I highly recommend to rename the reference genome sequences, for instance by replacing the |
with a simply underscore _
. I have thought about doing that automatically in grenepipe, but concluded that it would probably be more trouble for the user to find all their contigs renamed after the fact. But let me know what you think about that.
Lastly, the log file you sent also made me realize that Snakemake changed their logging behaviour, so that the log file that you sent does not contain all information of grenepipe any more... I've told them about it. In the meantime, for future debugging, when you need to send more log files, could you maybe use the latest master
branch of the grenepipe repository, e.g., via the green "Code" button here. That will produce an additional log file almost identical to the one you sent there, but with all the information. It will be located along the other logs of the pipeline in logs/snakemake
.
Cheers and so long
Lucas
from grenepipe.
Hei Lucas,
unfortunately no log seems to have been created for B-1
, I added the only one in the bwa-mem log folder.
Also, I'm not sure how to check this scratch directory. As far as I understand it's stored on our cluster's temporary file directory $TMPDIR, which appears to be emptied as soon as the corresponding jobs are either done or terminated.
edit: I managed to acces it, and du -sh
for the directory returns a size of 8.0K
I've changed to contig names in the reference genome and replaced the grenepipe folder with your master version and am about to start a new run now.
I'll let you know what happens!
All the best
Yannick
from grenepipe.
Hej Yannick,
unfortunately no log seems to have been created for B-1, I added the only one in the bwa-mem log folder.
Ah okay, that likely means the job failed for some snakemake reason, so that the program being run there did not even get to produce any output.
Also, I'm not sure how to check this scratch directory. As far as I understand it's stored on our cluster's temporary file directory $TMPDIR, which appears to be emptied as soon as the corresponding jobs are either done or terminated.
edit: I managed to acces it, and du -sh for the directory returns a size of 8.0K
Haha yes, 8K is not nearly enough space... not sure why your cluster would give you such limited temp space... I'd get in touch with your admins to see what is happening there.
In the meantime, you can set the temp dir explicitly to something that has more space available, see here. This should work for example:
--default-resources tmpdir="/path/to/more/space"
I've changed to contig names in the reference genome and replaced the grenepipe folder with your master version and am about to start a new run now.
I'll let you know what happens!
Great!
So long
Lucas
from grenepipe.
Hei Lucas,
Haha yes, 8K is not nearly enough space... not sure why your cluster would give you such limited temp space... I'd get in touch with your admins to see what is happening there.
I double-checked and as far as I can tell, there should be no quota and therefore unlimited space in the temp directories, so it is a bit of a mystery. I'll try your approach if/when the current analysis fails.
Cheers
Yannick
from grenepipe.
UNLIMITED SPACE? You could get rich! :-D
Yes, see what happens. If things get deleted from that temp dir once the job finishes, it could be that it is too eagerly deleting files before Snakemake is done moving them to the actual results dir, or something like that.
Cheers
Lucas
from grenepipe.
Hi Lucas,
the last attempt unfortunately also finished with the same "permission denied" notification in regards to that scratch directory.
Here the snakemake log, I hope this one is the one you'd need:
2024-07-18T111225.342551.snakemake.log
Interestingly, the analysis got much further than the last time.
Is there a way to have a new run start where this one left off?
Cheers
Yannick
from grenepipe.
Hi Yannick,
the last attempt unfortunately also finished with the same "permission denied" notification in regards to that scratch directory.
That does not look like an issue with Snakemake or grenepipe, but rather with your cluster configuration, so there is not much I can do to fix that particular error. If you want to keep using that scratch space, I'd get in touch with the cluster admin, to see what is causing this.
In the meantime, you can instead instruct Snakemake to use a different directory for temp files, as explained above. Have you tried that?
Interestingly, the analysis got much further than the last time.
Hm not sure that I understand. Your previous run went from 15:10 to 18:07, and your new one from 11:12 to 17:16, which seems longer to me :-)
Is there a way to have a new run start where this one left off?
Yes, Snakemake usually does not run things that are already there and up to date. However, if an input file changes, any output files that depend on it will be re-created. That is determined via time stamps, so if the input is newer than the output, the job creating the output is executed again.
In your case it seems that the reference file was updated:
[Thu Jul 18 11:12:28 2024]
localcheckpoint samtools_faidx:
input: /gpfs/bwfor/home/fr/fr_fr/fr_yf1009/GenoDat/Mbel_Q1_clean.fasta
output: /gpfs/bwfor/home/fr/fr_fr/fr_yf1009/GenoDat/Mbel_Q1_clean.fasta.fai
log: /gpfs/bwfor/home/fr/fr_fr/fr_yf1009/GenoDat/logs/Mbel_Q1_clean.fasta.samtools_faidx.log
jobid: 7
reason: Updated input files: /gpfs/bwfor/home/fr/fr_fr/fr_yf1009/GenoDat/Mbel_Q1_clean.fasta
resources: tmpdir=/scratch/fr_yf1009_o05i14
DAG of jobs will be updated after completion.
Did you somehow change or move that file in the meantime? If so, that would explain the re-running.
So long
Lucas
from grenepipe.
Related Issues (20)
- MissingRuleException HOT 13
- PID error HOT 9
- java.lang.OutOfMemoryError: Java heap space HOT 2
- GRENEPIPE v12.1 HOT 5
- Make "trimming-tool" optional HOT 4
- restrict-regions and short contigs HOT 2
- ModuleNotFoundError: No module named 'chardet' HOT 2
- Write full executed command for each step to log files for reproducibility HOT 3
- merging calls from multiple pipeline runs? HOT 2
- mamba is difficult to install in grenepipe environment HOT 6
- Feature Request: Download reference genome and known variation HOT 2
- config file HOT 5
- greenepipe run error HOT 5
- problem with dedup HOT 4
- a new type of error HOT 2
- a new type of error HOT 1
- another type of error HOT 11
- permission denied error HOT 5
- --rerun-incomplete flag repeat mapping step for all samples HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grenepipe.