Git Product home page Git Product logo

Comments (20)

nunofonseca avatar nunofonseca commented on June 24, 2024

Hi!
Maybe the cdna file that you used contains transcript ids that are not in the GTF? What files cdna and gtf files are you using?
Cheers.

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

Hi,
thanks for the hint!
Homo_sapiens.GRCh38.cdna.all.fa is the cdna file and Homo_sapiens.GRCh38.90.gtf is the gtf file, but I don't remember if I got it from the same source. I try to get new ones and retry it.

Or can you give me a suggetion which files are best to use?

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

It seems that Ensembl recently decided that the transcript ids in the cdna file do not need to be exactly the same as the transcript ids in the gtf file (e.g., ENST00000434970.2 in one file and in the other file you get ENST00000434970). This is basically a data issue, anyway I pushed a change to libTSVAggrTransByGene to detect this issue and try to fix it. Please let me know if it fixes the issue.
Cheers.

from irap.

maximilianh avatar maximilianh commented on June 24, 2024

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

Yes, it is related.
As I wrote above, I added a workaround for that in the script but I just realized that the problem pops up in another point of the pipeline - thank you Ensembl for making my day!

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

A quick update on this issue. There is another problem with the latest versions of Ensembl - the cdna file contains transcripts that are not in the GTF file or, from another perspective, the GTF file has missing transcripts. For instance, in the Ensembl v90, there are 16k transcripts found in the human cdna file and not found in the matching GTF (e.g., ENST00000631435). Currently iRAP will fail because it checks for consistency between the two files (which no longer seems to exist). To work around this data issue, I' ll change the code today/tomorrow to inform and warn about these inconsistencies and carry on (but not to exit).

Cheers.

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

Hey there,

Thank you very much for your efforts! I can try your patch tomorrorw and send results to you next week.

Cheers.

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

Hi,
The code should now be able to cope with the inconsistencies. Please let me know if it works or not for you.
Cheers.

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

Hey, i tried it today and I got following error

[DONE] Assembly and quantification
make: *** No rule to make target 'Master2/none/kallisto/transcripts.raw.kallisto.irap.tsv', needed by 'stage3'. Schluss.

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

Hi, it should already be fixed in the latest version?
Cheers

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

Oh, thanks.
It works so far now.

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

Is there a possibillity to filter low expressed genes? I have a lot of genes that have zero expression and if I include them in DE, edgeR will fail. I removed those genes with expression sums over all samples below 10 in transcripts.raw.kallisto.tsv and now it continues DEA.

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

In report generation it still fails with following error:
make: *** No rule to make target 'Master2/report/riq//raw_data/C1_0s1.f.fastqc.tsv', needed by 'Master2/report/fastq_qc_report.tsv'. Schluss.

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

You may use the parameter
de_min_count=10
in the configuration file to remove genes with total expression below 10.

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

Hey,

If I use this parameter, I get an error message. Is this a bug?

[INFO] Filtering out genes with low counts (<=5)...
[INFO] Filtering out genes with low counts (<=5)...done.
Fehler in conds[i] <- label2group[[conds[i]]] : Ersetzung hat Länge 0
Ruft auf: map.conds2cols
Ausführung angehalten
/home/niklas/irap_install/scripts/../aux/mk/irap_de.mk:161: recipe for target 'Master/tophat2/htseq2/edger/G0SvsG20S.genes_de.tsv' failed

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

Hi,
Could you rerun the ./scripts/irap_DE_edger script with --debug option and share with me the irap_DE_edgeR.Rdata file that is generated? This will allow me to reproduce the error.
Thanks
PS: you should get the irap_DE_edger command by rerunning irap as follows
irap conf=path2your_conf_file [any options that you are passing in the command line] Master/tophat2/htseq2/edger/G0SvsG20S.genes_de.tsv -n

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

Hi,
here you go. Thank you very much!

Rdata.zip

It seems like not all values appear in the opt$label2group object.

from irap.

Niklas123Niklas avatar Niklas123Niklas commented on June 24, 2024

Hey there,
any progress here?
I searched a bit in the R-script. The problem is: The opt$label2group variable does only contain the values for the selected groups.

"C1_0s1"  "C1_0s2"  "C2_0s1"  "C2_0s2"  "C1_90s1" "C1_90s2" "C2_90s1"
"G0S"     "G0S"     "G0S"     "G0S"     "G90S"    "G90S"    "G90S"

But colnames (data.f) contains
"C1_0s1" "C1_0s2" "C1_H2O22" "C1_20s1" "C1_90s1" "C1_90s2" "C2_0s1" "C2_0s2" "C2_H2O21" "C2_H2O22" "C2_20s1" "C2_20s2" "C2_90s1"

So either remove the data.f columns not needed (what I would not do), or add the missing items to the opt$label2group and use them in the design matrix and care for the correct contrast. I think if you remove the not needed columns from testing, you remove information from testing that are helpful. Maybe change this that the whole available information are used?

edit: Further investigation lead me to the error: in irap_de.R in line 195 it says
data.f <- data[rows.sel,],
but it should be:
data.f <- data.f[rows.sel,]

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

Hi, thank you for the Rdata file. I'll look into this issue in the coming days - last week I did not have the time.
Cheers.

from irap.

nunofonseca avatar nunofonseca commented on June 24, 2024

Hi, indeed the problem was on the line that you mentioned. It should now be fixed in the latest release (0.8.5p5). Many thanks again for the report and the fix ;-)

from irap.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.