Comments (8)
This is a tricky one indeed!
Using Explain Flags we see:
- the first read is paired, the first in the pair, and has its mate unmapped.
- the second read is paired, the second in the pair, and is unmapped.
The confusion lies in "what is the alignment start and end of each read", in particular the second (unmapped) read. Since the second read is unmapped, its end is defined as its alignment start. See BAMFileReader.java
:
private IntervalComparison compareIntervalToRecord(final QueryInterval interval, final SAMRecord record) {
// interval.end <= 0 implies the end of the reference sequence.
final int intervalEnd = (interval.end <= 0? Integer.MAX_VALUE: interval.end);
final int alignmentEnd;
if (record.getReadUnmappedFlag() && record.getAlignmentStart() != SAMRecord.NO_ALIGNMENT_START) {
// Unmapped read with coordinate of mate.
alignmentEnd = record.getAlignmentStart();
} else {
alignmentEnd = record.getAlignmentEnd();
}
Thus, the alignment start AND end are 11966, which lies out of the interval 1: 11967-11967
. Therefore you get only one read!
from htsjdk.
Hi @nh13, Thank you for the insight! Also thank you for showing me the tool that explains the flag! It is very helpful.
(I updated this because I made a mistake in the flag!)
You are right from the logic of the code, the end will be equal to the start for the second read. However,
I'm little concerned because samtools returns both reads in both queries. It means that the results of the two tools are not compatible. Just wonder if samtools could be wrong in this case?
from htsjdk.
See the SAM spec, in particular the unmapped flag bit:
Bit 0x4 is the only reliable place to tell whether the read is unmapped. If 0x4 is set, no
assumptions can be made about RNAME, POS, CIGAR, MAPQ, bits 0x2, 0x10, 0x100 and
0x800, and the bit 0x20 of the previous read in the template.
It is only by convention that we can use RNAM and POS to position the unmapped mate next to its mapped mate.
Nonetheless, there is very little coordination between samtools and htsjdk to support common APIs, so perhaps I don't share as much of your concern. Thank-you though for the bug reports and questions.
from htsjdk.
Thank you for sharing your knowledge. My main concern is about which result (from samtools or htsjdk) is correct, not related to the APIs.
From what you described, apparently the interpretation of the spec about is different between samtools and htsjdk regarding the unmapped read.
Thank you, I learned something interesting :)
from htsjdk.
Hi @nh13, I thought about it again. Why would the query,
QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));
, returns the both the first and second reads if the second read is unmapped?
I think users want a consistent result from the query. That is:
QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11967));
includes all reads from:
QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));
Can you give me a reason why it is fine that the first query doesn't include all reads from the second query in this case?
Thank you!
from htsjdk.
That wasn't the original query you posted.
you had originally posted
QueryInterval query=new
QueryInterval(reader.getFileHeader().getSequenceIndex(1),
11967,11967));
as the second query.
Yossi.
On Wed, Jul 30, 2014 at 12:54 PM, chilinglam [email protected]
wrote:
Hi @nh13 https://github.com/nh13, I thought about it again. Why would
the query,QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));
, returns the both the first and second reads if the second read is
unmapped?I think users want a consistent result from the query. That is:
QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11967));
includes all reads from:
QueryInterval query=new QueryInterval(reader.getFileHeader().getSequenceIndex(1), 11966,11966));
Can you give me a reason why it is fine that the first query doesn't
include all reads from the second query in this case?Thank you!
—
Reply to this email directly or view it on GitHub
#76 (comment).
from htsjdk.
Hi @yfarjoun, you are right. It is not the same query I posted earlier. I'm asking a question related to this issue. From a user perspective, reads spanned in 1:11966-11967 should include all reads in 1:11966-11966. Don't you agree? If not, I need a reason because my users will also ask me the same question. Thank you!
from htsjdk.
I pushed a branch for us to work on unit tests for this: https://github.com/samtools/htsjdk/tree/nh_issue_76
I believe the behavior is as you described and want:
1:11966-11966 -> 2 records
1:11966-11967 -> 2 records
1:11967-11967 -> 1 record**
** due to the unmapped read having start and end 11966.
from htsjdk.
Related Issues (20)
- CRAM files have zero-length blocks with a compression method associated with them (so cannot be decompressed). HOT 1
- VCF sorting change breaks existing VCFs HOT 1
- Race conditions accessing CRAM's reference data through ReferenceSource
- Add a better error message when the reference is too long for indexing HOT 5
- Question: Tabix index to find min(POS) and max(POS) in each contig
- Histogram class uses unsafe methods for returning min/max
- CRAM: 8-Bin Base Quality Compression HOT 1
- SamReader slower than expected on network drive HOT 1
- Escaped doublequotes in INFO descriptions result in invalid VCF file HOT 2
- Question: How to generate FAI files and TBI files HOT 5
- SamReaderFactory should use current defaults when makeDefault() is called
- Error when reading a VCF 4.3 file with spaces in INFO field values HOT 1
- htsjdk.samtools.util.IOUtil.assertFileIsWritable on certain NFS environments fails to correctly assess permissions HOT 1
- SAMFileWriterFactory creates .bai file when writing .cram file HOT 1
- Deleted Java developers team
- IOUtil.openFileForMd5CalculatingWriting() puts the md5 file in the wrong place.
- support for Azure blob storage? (specifically access to BAM/CRAM files in Azure blob storage by IGV)
- VariantContext has inconsistent behavior on getting allele type with spanning deletions
- Resolve issues around integrating HTTP NIO provider
- Truncation issue with GZIPInputStream when reading bgzip-compressed streams HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from htsjdk.