Comments (20)
Hi!
Maybe the cdna file that you used contains transcript ids that are not in the GTF? What files cdna and gtf files are you using?
Cheers.
from irap.
Hi,
thanks for the hint!
Homo_sapiens.GRCh38.cdna.all.fa is the cdna file and Homo_sapiens.GRCh38.90.gtf is the gtf file, but I don't remember if I got it from the same source. I try to get new ones and retry it.
Or can you give me a suggetion which files are best to use?
from irap.
It seems that Ensembl recently decided that the transcript ids in the cdna file do not need to be exactly the same as the transcript ids in the gtf file (e.g., ENST00000434970.2 in one file and in the other file you get ENST00000434970). This is basically a data issue, anyway I pushed a change to libTSVAggrTransByGene to detect this issue and try to fix it. Please let me know if it fixes the issue.
Cheers.
from irap.
from irap.
Yes, it is related.
As I wrote above, I added a workaround for that in the script but I just realized that the problem pops up in another point of the pipeline - thank you Ensembl for making my day!
from irap.
A quick update on this issue. There is another problem with the latest versions of Ensembl - the cdna file contains transcripts that are not in the GTF file or, from another perspective, the GTF file has missing transcripts. For instance, in the Ensembl v90, there are 16k transcripts found in the human cdna file and not found in the matching GTF (e.g., ENST00000631435). Currently iRAP will fail because it checks for consistency between the two files (which no longer seems to exist). To work around this data issue, I' ll change the code today/tomorrow to inform and warn about these inconsistencies and carry on (but not to exit).
Cheers.
from irap.
Hey there,
Thank you very much for your efforts! I can try your patch tomorrorw and send results to you next week.
Cheers.
from irap.
Hi,
The code should now be able to cope with the inconsistencies. Please let me know if it works or not for you.
Cheers.
from irap.
Hey, i tried it today and I got following error
[DONE] Assembly and quantification
make: *** No rule to make target 'Master2/none/kallisto/transcripts.raw.kallisto.irap.tsv', needed by 'stage3'. Schluss.
from irap.
Hi, it should already be fixed in the latest version?
Cheers
from irap.
Oh, thanks.
It works so far now.
from irap.
Is there a possibillity to filter low expressed genes? I have a lot of genes that have zero expression and if I include them in DE, edgeR will fail. I removed those genes with expression sums over all samples below 10 in transcripts.raw.kallisto.tsv and now it continues DEA.
from irap.
In report generation it still fails with following error:
make: *** No rule to make target 'Master2/report/riq//raw_data/C1_0s1.f.fastqc.tsv', needed by 'Master2/report/fastq_qc_report.tsv'. Schluss.
from irap.
You may use the parameter
de_min_count=10
in the configuration file to remove genes with total expression below 10.
from irap.
Hey,
If I use this parameter, I get an error message. Is this a bug?
[INFO] Filtering out genes with low counts (<=5)...
[INFO] Filtering out genes with low counts (<=5)...done.
Fehler in conds[i] <- label2group[[conds[i]]] : Ersetzung hat Länge 0
Ruft auf: map.conds2cols
Ausführung angehalten
/home/niklas/irap_install/scripts/../aux/mk/irap_de.mk:161: recipe for target 'Master/tophat2/htseq2/edger/G0SvsG20S.genes_de.tsv' failed
from irap.
Hi,
Could you rerun the ./scripts/irap_DE_edger script with --debug option and share with me the irap_DE_edgeR.Rdata file that is generated? This will allow me to reproduce the error.
Thanks
PS: you should get the irap_DE_edger command by rerunning irap as follows
irap conf=path2your_conf_file [any options that you are passing in the command line] Master/tophat2/htseq2/edger/G0SvsG20S.genes_de.tsv -n
from irap.
Hi,
here you go. Thank you very much!
It seems like not all values appear in the opt$label2group object.
from irap.
Hey there,
any progress here?
I searched a bit in the R-script. The problem is: The opt$label2group variable does only contain the values for the selected groups.
"C1_0s1" "C1_0s2" "C2_0s1" "C2_0s2" "C1_90s1" "C1_90s2" "C2_90s1"
"G0S" "G0S" "G0S" "G0S" "G90S" "G90S" "G90S"
But colnames (data.f) contains
"C1_0s1" "C1_0s2" "C1_H2O22" "C1_20s1" "C1_90s1" "C1_90s2" "C2_0s1" "C2_0s2" "C2_H2O21" "C2_H2O22" "C2_20s1" "C2_20s2" "C2_90s1"
So either remove the data.f columns not needed (what I would not do), or add the missing items to the opt$label2group and use them in the design matrix and care for the correct contrast. I think if you remove the not needed columns from testing, you remove information from testing that are helpful. Maybe change this that the whole available information are used?
edit: Further investigation lead me to the error: in irap_de.R
in line 195 it says
data.f <- data[rows.sel,]
,
but it should be:
data.f <- data.f[rows.sel,]
from irap.
Hi, thank you for the Rdata file. I'll look into this issue in the coming days - last week I did not have the time.
Cheers.
from irap.
Hi, indeed the problem was on the line that you mentioned. It should now be fixed in the latest release (0.8.5p5). Many thanks again for the report and the fix ;-)
from irap.
Related Issues (20)
- aux/R/irap_de.R::process.cmdline.args mis-casts --min (opt$min_count) as a string, should be integer HOT 1
- sample name do not match
- fastq_pre_barcodes.c:34:17: fatal error: bam.h: No such file or directory
- Downloads to a submodule?
- iRAP pipeline for EMBL-EBI Expression Atlas HOT 1
- Configuration file for strand specific protocol
- mapper options in config file is ignored HOT 3
- irap_raw2metric no html file, not gzipped HOT 7
- our_prefix error when analysis name is same as the raw read file name without suffix HOT 2
- failed to load GTF file HOT 3
- RSEM strandness HOT 12
- TPM values rounded to 2 decimals HOT 3
- Docker link in wiki is wrong HOT 1
- Error running Docker image HOT 2
- iRAP's Kallisto call seems to provide read length at fragment size parameter HOT 1
- sh: 1: set: Illegal option -o pipefail error in "irap_gtf_to_fasta". HOT 4
- iRAP does not recognise stage0 completion for large genomes with HISAT2 HOT 5
- docker image for v1.0.1 HOT 2
- docker image 1.0.1 not working HOT 1
- No rule to make target HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from irap.