Git Product home page Git Product logo

Comments (8)

nh13 avatar nh13 commented on June 17, 2024

This is a tricky one indeed!

Using Explain Flags we see:

  1. the first read is paired, the first in the pair, and has its mate unmapped.
  2. the second read is paired, the second in the pair, and is unmapped.

The confusion lies in "what is the alignment start and end of each read", in particular the second (unmapped) read. Since the second read is unmapped, its end is defined as its alignment start. See BAMFileReader.java:

        private IntervalComparison compareIntervalToRecord(final QueryInterval interval, final SAMRecord record) {
            // interval.end <= 0 implies the end of the reference sequence.
            final int intervalEnd = (interval.end <= 0? Integer.MAX_VALUE: interval.end);
            final int alignmentEnd;
            if (record.getReadUnmappedFlag() && record.getAlignmentStart() != SAMRecord.NO_ALIGNMENT_START) {
                // Unmapped read with coordinate of mate.
                alignmentEnd = record.getAlignmentStart();
            } else {
                alignmentEnd = record.getAlignmentEnd();
            }

Thus, the alignment start AND end are 11966, which lies out of the interval 1: 11967-11967. Therefore you get only one read!

from htsjdk.

chilinglam avatar chilinglam commented on June 17, 2024

Hi @nh13, Thank you for the insight! Also thank you for showing me the tool that explains the flag! It is very helpful.

(I updated this because I made a mistake in the flag!)

You are right from the logic of the code, the end will be equal to the start for the second read. However,
I'm little concerned because samtools returns both reads in both queries. It means that the results of the two tools are not compatible. Just wonder if samtools could be wrong in this case?

from htsjdk.

nh13 avatar nh13 commented on June 17, 2024

See the SAM spec, in particular the unmapped flag bit:

Bit 0x4 is the only reliable place to tell whether the read is unmapped. If 0x4 is set, no
assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10, 0x100 and
0x800, and the bit 0x20 of the previous read in the template.

It is only by convention that we can use RNAM and POS to position the unmapped mate next to its mapped mate.

Nonetheless, there is very little coordination between samtools and htsjdk to support common APIs, so perhaps I don't share as much of your concern. Thank-you though for the bug reports and questions.

from htsjdk.

chilinglam avatar chilinglam commented on June 17, 2024

Thank you for sharing your knowledge. My main concern is about which result (from samtools or htsjdk) is correct, not related to the APIs.

From what you described, apparently the interpretation of the spec about is different between samtools and htsjdk regarding the unmapped read.

Thank you, I learned something interesting :)

from htsjdk.

chilinglam avatar chilinglam commented on June 17, 2024

Hi @nh13, I thought about it again. Why would the query,

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));

, returns the both the first and second reads if the second read is unmapped?

I think users want a consistent result from the query. That is:

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11967));

includes all reads from:

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));

Can you give me a reason why it is fine that the first query doesn't include all reads from the second query in this case?

Thank you!

from htsjdk.

yfarjoun avatar yfarjoun commented on June 17, 2024

That wasn't the original query you posted.

you had originally posted

QueryInterval query=new
QueryInterval(reader.getFileHeader().getSequenceIndex(1),
11967,11967));

as the second query.

Yossi.

On Wed, Jul 30, 2014 at 12:54 PM, chilinglam [email protected]
wrote:

Hi @nh13 https://github.com/nh13, I thought about it again. Why would
the query,

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));

, returns the both the first and second reads if the second read is
unmapped?

I think users want a consistent result from the query. That is:

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11967));

includes all reads from:

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));

Can you give me a reason why it is fine that the first query doesn't
include all reads from the second query in this case?

Thank you!


Reply to this email directly or view it on GitHub
#76 (comment).

from htsjdk.

chilinglam avatar chilinglam commented on June 17, 2024

Hi @yfarjoun, you are right. It is not the same query I posted earlier. I'm asking a question related to this issue. From a user perspective, reads spanned in 1:11966-11967 should include all reads in 1:11966-11966. Don't you agree? If not, I need a reason because my users will also ask me the same question. Thank you!

from htsjdk.

nh13 avatar nh13 commented on June 17, 2024

I pushed a branch for us to work on unit tests for this: https://github.com/samtools/htsjdk/tree/nh_issue_76

I believe the behavior is as you described and want:
1:11966-11966 -> 2 records
1:11966-11967 -> 2 records
1:11967-11967 -> 1 record**

** due to the unmapped read having start and end 11966.

from htsjdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.