Hey there, I try to use kallisto to quant the reads, but it failes d

Hi! Maybe the cdna file that you used contains tran ids that are not in the

It seems that Ensembl recently decided that the tran ids in the cdna file do not

Hey, i tried it today and I got following error <code class="notrans

Merging Matrices in libTSVAggrTransByGene fails? about irap HOT 20 CLOSED

nunofonseca commented on June 24, 2024

Merging Matrices in libTSVAggrTransByGene fails?

from irap.

Comments (20)

nunofonseca commented on June 24, 2024

Hi!
Maybe the cdna file that you used contains transcript ids that are not in the GTF? What files cdna and gtf files are you using?
Cheers.

from irap.

Niklas123Niklas commented on June 24, 2024

Hi,
thanks for the hint!
Homo_sapiens.GRCh38.cdna.all.fa is the cdna file and Homo_sapiens.GRCh38.90.gtf is the gtf file, but I don't remember if I got it from the same source. I try to get new ones and retry it.

Or can you give me a suggetion which files are best to use?

from irap.

nunofonseca commented on June 24, 2024

It seems that Ensembl recently decided that the transcript ids in the cdna file do not need to be exactly the same as the transcript ids in the gtf file (e.g., ENST00000434970.2 in one file and in the other file you get ENST00000434970). This is basically a data issue, anyway I pushed a change to libTSVAggrTransByGene to detect this issue and try to fix it. Please let me know if it fixes the issue.
Cheers.

from irap.

maximilianh commented on June 24, 2024

This somewhat relates to the discussion we had in Palo Alto: the "real" ID is ENST00000434970. If you ignore the version number, then Gencode and Ensembl are exactly the same transcript set, as far as I know. On differences between Ensembl and Gencode see their FAQ: https://www.gencodegenes.org/faq.html

from irap.

nunofonseca commented on June 24, 2024

Yes, it is related.
As I wrote above, I added a workaround for that in the script but I just realized that the problem pops up in another point of the pipeline - thank you Ensembl for making my day!

from irap.

nunofonseca commented on June 24, 2024

A quick update on this issue. There is another problem with the latest versions of Ensembl - the cdna file contains transcripts that are not in the GTF file or, from another perspective, the GTF file has missing transcripts. For instance, in the Ensembl v90, there are 16k transcripts found in the human cdna file and not found in the matching GTF (e.g., ENST00000631435). Currently iRAP will fail because it checks for consistency between the two files (which no longer seems to exist). To work around this data issue, I' ll change the code today/tomorrow to inform and warn about these inconsistencies and carry on (but not to exit).

Cheers.

from irap.

Niklas123Niklas commented on June 24, 2024

Hey there,

Thank you very much for your efforts! I can try your patch tomorrorw and send results to you next week.

Cheers.

from irap.

nunofonseca commented on June 24, 2024

Hi,
The code should now be able to cope with the inconsistencies. Please let me know if it works or not for you.
Cheers.

from irap.

Niklas123Niklas commented on June 24, 2024

Hey, i tried it today and I got following error

[DONE] Assembly and quantification
make: *** No rule to make target 'Master2/none/kallisto/transcripts.raw.kallisto.irap.tsv', needed by 'stage3'. Schluss.

from irap.

nunofonseca commented on June 24, 2024

Hi, it should already be fixed in the latest version?
Cheers

from irap.

Niklas123Niklas commented on June 24, 2024

Oh, thanks.
It works so far now.

from irap.

Niklas123Niklas commented on June 24, 2024

Is there a possibillity to filter low expressed genes? I have a lot of genes that have zero expression and if I include them in DE, edgeR will fail. I removed those genes with expression sums over all samples below 10 in transcripts.raw.kallisto.tsv and now it continues DEA.

from irap.

Niklas123Niklas commented on June 24, 2024

In report generation it still fails with following error:
make: *** No rule to make target 'Master2/report/riq//raw_data/C1_0s1.f.fastqc.tsv', needed by 'Master2/report/fastq_qc_report.tsv'. Schluss.

from irap.

nunofonseca commented on June 24, 2024

You may use the parameter
de_min_count=10
in the configuration file to remove genes with total expression below 10.

from irap.

Niklas123Niklas commented on June 24, 2024

Hey,

If I use this parameter, I get an error message. Is this a bug?

[INFO] Filtering out genes with low counts (<=5)...
[INFO] Filtering out genes with low counts (<=5)...done.
Fehler in conds[i] <- label2group[[conds[i]]] : Ersetzung hat Länge 0
Ruft auf: map.conds2cols
Ausführung angehalten
/home/niklas/irap_install/scripts/../aux/mk/irap_de.mk:161: recipe for target 'Master/tophat2/htseq2/edger/G0SvsG20S.genes_de.tsv' failed

from irap.

nunofonseca commented on June 24, 2024

Hi,
Could you rerun the ./scripts/irap_DE_edger script with --debug option and share with me the irap_DE_edgeR.Rdata file that is generated? This will allow me to reproduce the error.
Thanks
PS: you should get the irap_DE_edger command by rerunning irap as follows
irap conf=path2your_conf_file [any options that you are passing in the command line] Master/tophat2/htseq2/edger/G0SvsG20S.genes_de.tsv -n

from irap.

Niklas123Niklas commented on June 24, 2024

Hi,
here you go. Thank you very much!

Rdata.zip

It seems like not all values appear in the opt$label2group object.

from irap.

Niklas123Niklas commented on June 24, 2024

Hey there,
any progress here?
I searched a bit in the R-script. The problem is: The opt$label2group variable does only contain the values for the selected groups.

"C1_0s1"  "C1_0s2"  "C2_0s1"  "C2_0s2"  "C1_90s1" "C1_90s2" "C2_90s1"
"G0S"     "G0S"     "G0S"     "G0S"     "G90S"    "G90S"    "G90S"

But colnames (data.f) contains
"C1_0s1" "C1_0s2" "C1_H2O22" "C1_20s1" "C1_90s1" "C1_90s2" "C2_0s1" "C2_0s2" "C2_H2O21" "C2_H2O22" "C2_20s1" "C2_20s2" "C2_90s1"

So either remove the data.f columns not needed (what I would not do), or add the missing items to the opt$label2group and use them in the design matrix and care for the correct contrast. I think if you remove the not needed columns from testing, you remove information from testing that are helpful. Maybe change this that the whole available information are used?

edit: Further investigation lead me to the error: in irap_de.R in line 195 it says
data.f <- data[rows.sel,],
but it should be:
data.f <- data.f[rows.sel,]

from irap.

nunofonseca commented on June 24, 2024

Hi, thank you for the Rdata file. I'll look into this issue in the coming days - last week I did not have the time.
Cheers.

from irap.

nunofonseca commented on June 24, 2024

Hi, indeed the problem was on the line that you mentioned. It should now be fixed in the latest release (0.8.5p5). Many thanks again for the report and the fix ;-)

from irap.

Merging Matrices in libTSVAggrTransByGene fails? about irap HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent