Hi developers, I need your help again! I got an issue when I try to

This is a tricky one indeed! Using <a href="http://picard.sourceforg

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I pushed a branch for us to work on unit tests for this: <a href="https://github.com/s

Potential Bug: Missing reads when querying using Reader API about htsjdk HOT 8 CLOSED

samtools commented on June 17, 2024

Potential Bug: Missing reads when querying using Reader API

from htsjdk.

Comments (8)

nh13 commented on June 17, 2024

This is a tricky one indeed!

Using Explain Flags we see:

the first read is paired, the first in the pair, and has its mate unmapped.
the second read is paired, the second in the pair, and is unmapped.

The confusion lies in "what is the alignment start and end of each read", in particular the second (unmapped) read. Since the second read is unmapped, its end is defined as its alignment start. See BAMFileReader.java:

        private IntervalComparison compareIntervalToRecord(final QueryInterval interval, final SAMRecord record) {
            // interval.end <= 0 implies the end of the reference sequence.
            final int intervalEnd = (interval.end <= 0? Integer.MAX_VALUE: interval.end);
            final int alignmentEnd;
            if (record.getReadUnmappedFlag() && record.getAlignmentStart() != SAMRecord.NO_ALIGNMENT_START) {
                // Unmapped read with coordinate of mate.
                alignmentEnd = record.getAlignmentStart();
            } else {
                alignmentEnd = record.getAlignmentEnd();
            }

Thus, the alignment start AND end are 11966, which lies out of the interval 1: 11967-11967. Therefore you get only one read!

from htsjdk.

chilinglam commented on June 17, 2024

Hi @nh13, Thank you for the insight! Also thank you for showing me the tool that explains the flag! It is very helpful.

(I updated this because I made a mistake in the flag!)

You are right from the logic of the code, the end will be equal to the start for the second read. However,
I'm little concerned because samtools returns both reads in both queries. It means that the results of the two tools are not compatible. Just wonder if samtools could be wrong in this case?

from htsjdk.

nh13 commented on June 17, 2024

See the SAM spec, in particular the unmapped flag bit:

Bit 0x4 is the only reliable place to tell whether the read is unmapped. If 0x4 is set, no
assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10, 0x100 and
0x800, and the bit 0x20 of the previous read in the template.

It is only by convention that we can use RNAM and POS to position the unmapped mate next to its mapped mate.

Nonetheless, there is very little coordination between samtools and htsjdk to support common APIs, so perhaps I don't share as much of your concern. Thank-you though for the bug reports and questions.

from htsjdk.

chilinglam commented on June 17, 2024

Thank you for sharing your knowledge. My main concern is about which result (from samtools or htsjdk) is correct, not related to the APIs.

From what you described, apparently the interpretation of the spec about is different between samtools and htsjdk regarding the unmapped read.

Thank you, I learned something interesting :)

from htsjdk.

chilinglam commented on June 17, 2024

Hi @nh13, I thought about it again. Why would the query,

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));

, returns the both the first and second reads if the second read is unmapped?

I think users want a consistent result from the query. That is:

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11967));

includes all reads from:

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));

Can you give me a reason why it is fine that the first query doesn't include all reads from the second query in this case?

Thank you!

from htsjdk.

yfarjoun commented on June 17, 2024

That wasn't the original query you posted.

you had originally posted

QueryInterval query=new
QueryInterval(reader.getFileHeader().getSequenceIndex(1),
11967,11967));

as the second query.

Yossi.

On Wed, Jul 30, 2014 at 12:54 PM, chilinglam [email protected]
wrote:

Hi @nh13 https://github.com/nh13, I thought about it again. Why would
the query,

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));

, returns the both the first and second reads if the second read is
unmapped?

I think users want a consistent result from the query. That is:

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11967));

includes all reads from:

QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));

Can you give me a reason why it is fine that the first query doesn't
include all reads from the second query in this case?

Thank you!

—
Reply to this email directly or view it on GitHub
#76 (comment).

from htsjdk.

chilinglam commented on June 17, 2024

Hi @yfarjoun, you are right. It is not the same query I posted earlier. I'm asking a question related to this issue. From a user perspective, reads spanned in 1:11966-11967 should include all reads in 1:11966-11966. Don't you agree? If not, I need a reason because my users will also ask me the same question. Thank you!

from htsjdk.

nh13 commented on June 17, 2024

I pushed a branch for us to work on unit tests for this: https://github.com/samtools/htsjdk/tree/nh_issue_76

I believe the behavior is as you described and want:
1:11966-11966 -> 2 records
1:11966-11967 -> 2 records
1:11967-11967 -> 1 record**

** due to the unmapped read having start and end 11966.

from htsjdk.

Potential Bug: Missing reads when querying using Reader API about htsjdk HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent