Git Product home page Git Product logo

nf-resequencing-mem's People

Contributors

bunop avatar lazzarib avatar

Watchers

 avatar  avatar

nf-resequencing-mem's Issues

:bug: check_samplesheet can't detect header in certain conditions

check_samplesheet.py can't detect header in certain case, even if it's provided correctly. This could be due to csv.Sniffer.has_header method, which is known to produce both false positive and negatives: for exaple

sample,fastq_1,fastq_2
200-1-5,/home/ngs/freeclimb_resequencing/IGA_2022-01/delivery_20220110/raw_sequences/1_ID2101_200-1-5-H9H05KWZ-H1_S1_L001_R1_001.fastq.gz,/home/ngs/freeclimb_resequencing/IGA_2022-01/delivery_20220110/raw_sequences/1_ID2101_200-1-5-H9H05KWZ-H1_S1_L001_R2_001.fastq.gz
201-1-9,/home/ngs/freeclimb_resequencing/IGA_2022-01/delivery_20220110/raw_sequences/1_ID2101_201-1-9-H9H05KWZ-A2_S1_L001_R1_001.fastq.gz,/home/ngs/freeclimb_resequencing/IGA_2022-01/delivery_20220110/raw_sequences/1_ID2101_201-1-9-H9H05KWZ-A2_S1_L001_R2_001.fastq.gz
202-1-10,/home/ngs/freeclimb_resequencing/IGA_2022-01/delivery_20220110/raw_sequences/1_ID2101_202-1-10-H9H05KWZ-B2_S1_L001_R1_001.fastq.gz,/home/ngs/freeclimb_resequencing/IGA_2022-01/delivery_20220110/raw_sequences/1_ID2101_202-1-10-H9H05KWZ-B2_S1_L001_R2_001.fastq.gz

has issue in detecting header

:zap: replace `picard/markduplicates` with `samtools/markdup`

According to samtools/markdup documentation, there are additional steps to be done before calling the markdup steps. This steps are performed by bam_markduplicates_samtools nf-core subworkflow, however they need to be performed before the sort step (that is done during alignment)

  • install samtools/markdup module
  • patch samtools/markdup module to produce reports
  • fix and rename cram_markduplicates_picard local subworflow
  • install bam_markduplicates_samtools nf-core workflow
  • remove picard/markduplicates modules and configuration
  • check samtools/markdup with MultiQC config
  • test --save_cram option

:sparkles: deal with CRAM aligments

use samtools view to convert CRAM into BAM -> CRAM convert with SAMTOOLS/CRAM module:

  • call freebayes using CRAM files
  • check alignments with samtools view
  • extract info from CRAM
  • replace BAM with CRAM
  • deal with region overlap

Test in PBS environment

Test this pipeline in a PBS environment

  • install singularity
  • test with conda using PBS
  • test with singularity using PBS

:boom: launch freebayes on all samples

  • move freebayes/single to freebayes/multi
  • try to split by chromosomes and calculate freebayes on each chromosome (with all samples)
  • chunk chromosomes as described in freebayes parallelization
  • collect all data in a unique VCF file
  • accept bwa indexes and genome indexes as parameters

:bug: update pipeline to support latest nf-core/tools

nf-core/tools v2.7.1 cannot lint pipeline anymore: there are some changes which could affect CI test. Pipeline need to be updated and lint test should pass without warnings:

Additional information

pipeline name has issue: .nf-core.yml should change like this:

lint:
  actions_awsfulltest: False
  pipeline_todos: False
  files_exist: False
  nextflow_config:
    - manifest.name
  files_unchanged:
    - .gitignore
    - .github/workflows/linting.yml
  actions_ci: False
  merge_markers: False
  multiqc_config: False
repository_type: pipeline

v2.7.1 has issue in parsing this file, the dev branch of nf-core/tools can parse this config file correctly

:sparkles: accept input data as samples data file

This pipeline currently cannot deal with single end libraries. Moreover, is difficult to manage single and paired ends samples in the same run. Currently, nextflow pipelines accept input data from a sample files, where single and paired ends files are clearly stated and managed properly. Adapt this pipeline in order to accept data like this

:bug: markduplicates has issues with cram files

By visually inspecting the alignments (using samtools tview, there are some regions which seems to have bad alignments: the reason seems to be that Markduplicates changes the sequence in aligned file. For example, before markduplicates we have:

samtools view WT.cram Chr01:60000-60001 --reference Pvulgaris_442_v2.0.fa | head -n1
A01083:294:H3LHWDSXC:2:2553:27751:35556 147     Chr01   59859   60      150M    =       59547   -462    CGCCGCGTCTTTTAAGAAAATAGCGGGAGAAGAAACTTCGATTTTCAATAACAATGAAGGTAAATTAAATTGATAAATTTTATATTCAATTGATAGCAATAAATCACGCAAATATGTAAATTGAAATATTTATTTTAAAGTTTCGATAAC FFF:,FFF:F:FFF,FFFFFF:,F,,,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFMC:Z:150M       AS:i:145        XS:i:20 MD:Z:24T125     NM:i:1  RG:Z:WT

And after markduplicates we have:

samtools view WT.md.cram Chr01:60000-60001 --reference Pvulgaris_442_v2.0.fa | head -n1
A01083:294:H3LHWDSXC:2:2553:27751:35556 147     Chr01   59859   60      150M    =       59547   -462    NCNNCNCGNGGGGCCCCCCCGCCNCCCCCCCCCCCNGGNCCGGGGNCCGCCNCCGCCCCCGCCCGGCCCGGCCGCCCGGGGCGCGGNCCGGCCGCCNCCGCCCGNCNCNCCCGCGCGCCCGGCCCCGCGGGCGGGGCCCCGGGNCCGCCN FFF:,FFF:F:FFF,FFFFFF:,F,,,:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFF:FFF:FFFFFFFFFFFFFFFFFFFF:FF:FFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFAS:i:145        MC:Z:150M       PG:Z:MarkDuplicates     XS:i:20 MD:Z:0C0G0C0C0G0C0G0T0C0T0T0T0T0A0A0G0A0A0A0A0T0A0G0C0T0G0G0A0G0A0A0G0A0A0A0C0T0T0C0G0A0T0T0T0T0C0A0A0T0A0A0C0A0A0T0G0A0A0G0G0T0A0A0A0T0T0A0A0A0T0T0G0A0T0A0A0A0T0T0T0T0A0T0A0T0T0C0A0A0T0T0G0A0T0A0G0C0A0A0T0A0A0A0T0C0A0C0G0C0A0A0A0T0A0T0G0T0A0A0A0T0T0G0A0A0A0T0A0T0T0T0A0T0T0T0T0A0A0A0G0T0T0T0C0G0A0T0A0A0C0    NM:i:150        RG:Z:WT

Then number of matches is identical, however markduplicates add 150 mismatches, and the sequence changed in column 10 is the sequence visualized using samtools tview. This behaviour does not affect all the genome regions. Is not clear how this affects the calling process. Markduplicates should be removed as described in #71

:boom: add vcfallelicprimitive step

Add vcfallelicprimitives step before doing the normalization with bcftools. Work on each chromosome independently in order to optimize resources

:arrow_up: upgrade pipeline modules

Try to update modules to latest version:

$ nf-core modules lint 

Should return 0 warnings

Upgrade modules:

  • picard/markduplicates
  • bwa/mem
  • samtools/flagstat
  • samtools/sort
  • bwa/index
  • bwa/mem
  • fastqc
  • multiqc
  • picard/markduplicates
  • samtools/flagstat
  • trimgalore

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.