Git Product home page Git Product logo

Comments (5)

goldenflaw avatar goldenflaw commented on August 17, 2024

I guess sam2bed.pl doesn't support hard clipping, what do you think is the reasonable behavior? Throw away the reads or simply remove the hard clipped part?

from leafcutter.

davidaknowles avatar davidaknowles commented on August 17, 2024

So I'm used to seeing soft clipping: what does hard clipping represent here?

from leafcutter.

vasiliosz avatar vasiliosz commented on August 17, 2024

I'm not sure actually, but I think it is only done for some (all?) chimeric alignments - and that this is according to SAM/BAM specification. Though I can't find the description in the specs.

In STAR, it's handled like this:
https://github.com/alexdobin/STAR/blob/41b5d74bcaed5e04fdf3c8c07a81311feb8cb837/source/ReadAlign_alignBAM.cpp#L129-L132

    //alignType>=0: unmapped reads
    //          -1: normal mapped reads
    //          -11: chimeric alignment, chimeric junction on the left
    //          -12: chimeric alignment, chimeric junction on the right

And then clipping here:
https://github.com/alexdobin/STAR/blob/41b5d74bcaed5e04fdf3c8c07a81311feb8cb837/source/ReadAlign_alignBAM.cpp#L388-L396

        uint seqMateLength=readLengthOriginal[Mate];
        if (alignType==-11) {//hard-clip on the left
            seqMateLength-=trimL1;
            seqOut+=trimL1;
            qualOut+=trimL1;
        } else if (alignType==-12) {
            seqMateLength-=trimR1;
        } else {//no-chimeric alignment
        };

Normally, this is not output with STAR - but can be turned on like so (which I do for other purposes):

--chimSegmentMin 15 \
--chimJunctionOverhangMin 15 \
--chimOutType WithinBAM \

Then it can look like this:

HISEQ:56:D1W4DACXX:2:2301:9101:92922    2193    chr1    629704  255     25H25M  =       629794  0       TAAGCTCGCACTGATTTTTTACCTG       IJJIIGJJIHJIHHHHHFFDDFC@B       NH:i:1  HI:i:2  AS:i:24 NM:i:0  MD:Z:25   SA:Z:chr1,629871,-,25M25S,255,0;
HISEQ:56:D1W4DACXX:2:2301:9101:92922    99      chr1    629794  255     50M     =       629871  102     CCACAGAAGCTGCCATCAAGTATTTCCTCACGCAAGCAACCGCATCCATA      C@CFFFFFHHHHHJJJJJIJFHIJJJJJJJIJJJIIJJJJJJJJJJJJII      NH:i:1  HI:i:2  AS:i:7NM:i:0  MD:Z:50
HISEQ:56:D1W4DACXX:2:2301:9101:92922    147     chr1    629871  255     25M25S  =       629794  -102    AATATACTCTCCGGACAATGAACCATAAGCTCGCACTGATTTTTTACCTG      JIIHC=IHGDJIIIGIJJJIHDJIJIJJIIGJJIHJIHHHHHFFDDFC@B      NH:i:1  HI:i:2  AS:i:7NM:i:0  MD:Z:25    SA:Z:chr1,629704,-,25H25M,255,0;

These reads should all be flagged with 0x800. I guess they can be skipped for the purpose of leafcutter? Otherwise I need to regenerate all the BAM-files without the flag. Or be able to stream the bam file through samtools first like:

samtools view -b -F 0x800 my.bam | bam2junc.sh ... 

from leafcutter.

goldenflaw avatar goldenflaw commented on August 17, 2024

Thanks Vasilios, I think the best option for now is to use samtools view -b
-F 0x800 my.bam.

On Wed, May 25, 2016 at 11:38 PM, Vasilios Zachariadis <
[email protected]> wrote:

I'm not sure actually, but I think it is only done for some (all?)
chimeric alignments - and that this is according to SAM/BAM specification.
Though I can't find the description in the specs
https://github.com/samtools/hts-specs.

https://github.com/alexdobin/STAR/blob/41b5d74bcaed5e04fdf3c8c07a81311feb8cb837/source/ReadAlign_alignBAM.cpp#L129-L132

//alignType>=0: unmapped reads
//          -1: normal mapped reads
//          -11: chimeric alignment, chimeric junction on the left
//          -12: chimeric alignment, chimeric junction on the right

And then clipping here:

https://github.com/alexdobin/STAR/blob/41b5d74bcaed5e04fdf3c8c07a81311feb8cb837/source/ReadAlign_alignBAM.cpp#L388-L396

    uint seqMateLength=readLengthOriginal[Mate];
    if (alignType==-11) {//hard-clip on the left
        seqMateLength-=trimL1;
        seqOut+=trimL1;
        qualOut+=trimL1;
    } else if (alignType==-12) {
        seqMateLength-=trimR1;
    } else {//no-chimeric alignment
    };

Normally, this is not output with STAR - but can be turned on like so
(which I do for other purposes):

--chimSegmentMin 15
--chimJunctionOverhangMin 15
--chimOutType WithinBAM \

Then it can look like this:

HISEQ:56:D1W4DACXX:2:2301:9101:92922 2193 chr1 629704 255 25H25M = 629794 0 TAAGCTCGCACTGATTTTTTACCTG IJJIIGJJIHJIHHHHHFFDDFC@B NH:i:1 HI:i:2 AS:i:24 NM:i:0 MD:Z:25 SA:Z:chr1,629871,-,25M25S,255,0;
HISEQ:56:D1W4DACXX:2:2301:9101:92922 99 chr1 629794 255 50M = 629871 102 CCACAGAAGCTGCCATCAAGTATTTCCTCACGCAAGCAACCGCATCCATA C@CFFFFFHHHHHJJJJJIJFHIJJJJJJJIJJJIIJJJJJJJJJJJJII NH:i:1 HI:i:2 AS:i:7NM:i:0 MD:Z:50
HISEQ:56:D1W4DACXX:2:2301:9101:92922 147 chr1 629871 255 25M25S = 629794 -102 AATATACTCTCCGGACAATGAACCATAAGCTCGCACTGATTTTTTACCTG JIIHC=IHGDJIIIGIJJJIHDJIJIJJIIGJJIHJIHHHHHFFDDFC@B NH:i:1 HI:i:2 AS:i:7NM:i:0 MD:Z:25 SA:Z:chr1,629704,-,25H25M,255,0;

These reads should all be flagged with 0x800. I guess they can be skipped
for the purpose of leafcutter? Otherwise I need to regenerate all the
BAM-files without the flag. Or stream the bam file through samtools first
like:

samtools view -b -F 0x800 my.bam | bam2junc.sh ...


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#7 (comment)

from leafcutter.

vasiliosz avatar vasiliosz commented on August 17, 2024

Thanks.

from leafcutter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.