Git Product home page Git Product logo

Comments (6)

lwratten avatar lwratten commented on August 21, 2024

Having a brainstorm and have come up with a few possible ways to do this:

  1. Have a --align_transcriptome flag or similar and when this is activated assume all reference fasta files are transcriptomes

  2. Allow transcriptome to be specified in the sample sheet by entering transcriptome path/to/ref.fa in the genome section i.e.

sample,fastq,barcode,genome
K562_RUN1_REP3,,1,transcriptome path/to/ref.fa
HEPG2_RUN3_REP5,,2,path/genome.fa
CALIBRATION_RUN,,3,

In this case we could parse the transcriptome section to understand which references are transcriptomes and which are genomes.

  1. Have a 5th column in the samplesheet for gtf/transcriptome(fa)
sample,fastq,barcode,genome,transcript
K562_RUN1_REP3,,1,,path/to/ref.fa
HEPG2_RUN3_REP5,,2,path/genome.fa, path/to/annot.gtf
CALIBRATION_RUN,,3,

In this case the 5th column would be optional - logic would be as follows

if 5th col exists:
     if gtf:
          check the genome file exists
          if minimap2:
               convert to bed12
               perform transcript aware genome alignment using `--junc-bed` flag
          if graphmap2:
               perform transcript aware genome alignment using `--gtf` flag
     else if fa:
          perform transcriptome alignment

Keen to get an agreement on how we should implement this as well as #31 so we can start developing.

from nanoseq.

drpatelh avatar drpatelh commented on August 21, 2024

This is a tricky one especially since we could have the possibility where we have different genomes/transcriptomes for the samples in the samplesheet.

I think we should just have an additional transcriptome entry in the samplesheet. This could either be a fasta transcriptome or a gtf file which we can use to extract the transcripts from the genome fasta file. That's the most flexible option.

  • If genome is present and not transcriptome map to that. If it's an iGenomes reference then we get the gtf automatically and generate the transcriptome on the fly.
  • If genome and transcriptome are present then use transcriptome but will have to make sure transcriptome is a fasta and not gtf.
  • If genome isn't present and transcriptome is then use transcriptome.

Working out and validating whether we need to use gtf or fasta for transcriptome will involve quite a bit of refactoring I suspect.

How does that sound?

from nanoseq.

lwratten avatar lwratten commented on August 21, 2024

I think that sounds good!
I also feel like it's gonna require a lot of refactoring but it will give a lot of flexibility and functionality to our pipeline that will be worth it.
Especially for downstream steps like nanopolish etc. where transcriptome alignment is required.

from nanoseq.

drpatelh avatar drpatelh commented on August 21, 2024

Fixed in #46

from nanoseq.

drpatelh avatar drpatelh commented on August 21, 2024

Some things left to do:

  • There still may be some bugs in the logic so it will need extensive testing with different entries for genome and transcriptome, and by using the different --skip flags to see if the channels are all defined properly.
  • Add detailed documentation.

from nanoseq.

drpatelh avatar drpatelh commented on August 21, 2024

Additional tests have been added to GitHub Actions to cater for the the testing. Extensive documentation was also added in #57

from nanoseq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.