Comments (4)
We solved this already though elsewhere, with additional notation such as {chra:b}:10-20, or even just parsing from the other end so the last colon is the one that is used. This doesn't work if you rather foolishly created a contig named "chr10:100-200" as that's ambiguous, but that's why we adding the curly brace notation.
The htslib APIs correct support this, so it's possibly simply an issue of synced reader doing its own parsing rather than using the official region parsing API. (I haven't looked.)
from htslib.
This is not easy to support. The function accepts regions in multiple formats (chr
, chr:pos
, chr:beg-end
, chr:beg-
) and with colons in contig names it is impossible to tell if x:1234
refers to the contig 'x' at position 1234 or to a contig named x:1234
.
The only solution I can think of is to move the responsibility for resolving ambiguities like this to the user and require full intervals chr:beg-end
in cases where chr
is of the form STR:NUM
, STR:NUM-
, or STR:NUM-NUM
.
from htslib.
Guess they should have thought better before allowing colons in contig names... Looks kinda necessary after all! 😅
from htslib.
bcf_sr_regions_init
is an unfortunate API that does not have the set of valid contig names available, so the algorithm in Appendix A of the SAM spec is not directly applicable.
However a simplified version of it could be used, and as @jkbonfield noted, the explicit delimiter notation suggested in the appendix could also be supported.
Or perhaps it should be superseded by an API function that is provided with the set of contigs in play in the file(s) to be read.
from htslib.
Related Issues (20)
- Allocation size too big/invalid memory access during `extend_ref`/`cram_add_to_ref`
- Heap overflow during hts_md5_update
- `annot-tsv --version` does not follow conventions HOT 1
- Segmentation fault of htslib c program written myself HOT 3
- Reading BCFtools FORMAT/BCSQ field HOT 2
- bgzip: keep original modification time when compressing a file.
- Unable to install the HTSlib on Ubuntu 20.04 HOT 2
- formerly caught bgzf_idx_amend_last symbol HOT 3
- Passing missing --reference incorrectly reports cram file missing as well HOT 1
- bgzip: allow specification of output name.
- bcftools --write-index creates sometimes indexes older than the data file. HOT 1
- bcftools corrupts duplicate GT format fields HOT 1
- CRAM load_ref_portion() fails on some Mistletoe references
- feature request: parallelize tabix HOT 5
- Using POSIX and htslib to create x compressed vcf files using x threads HOT 2
- bcftools-1.19: Invalid index is produced by --write-index and --threads on my bcf.gz files HOT 4
- Samtools view fails on CRAM and compressed reference if .gzi file is missing HOT 1
- todo: Improve `--regions-overlap variant`
- wishlist s3 cache HOT 6
- S3 plugin does not correctly handle 307 redirects for newly created buckets HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from htslib.