Git Product home page Git Product logo

Comments (2)

pd3 avatar pd3 commented on June 2, 2024

I traced the problem down to

htslib/tbx.c

Lines 211 to 217 in a59bc52

if ((ret = bgzf_getline(fp, '\n', s)) >= 0) {
tbx_intv_t intv;
if (get_intv(tbx, s, &intv, 0) < 0)
return -2;
*tid = intv.tid; *beg = intv.beg; *end = intv.end;
}
return ret;

The tbx_readrec() function returns the length of the string on success whereas the corresponding bcf_readrec() returns 0. The iterator then returns this value via hts_itr_next()

htslib/hts.c

Lines 3909 to 3917 in a59bc52

if ((ret = iter->readrec(fp, data, r, &tid, &beg, &end)) >= 0) {
iter->curr_off = bgzf_tell(fp);
if (tid != iter->tid || beg >= iter->end) { // no need to proceed
ret = -1; break;
} else if (end > iter->beg && iter->end > beg) {
iter->curr_tid = tid;
iter->curr_beg = beg;
iter->curr_end = end;
return ret;

Arguably the iterator should behave consistently across all data types.

from htslib.

pd3 avatar pd3 commented on June 2, 2024

Continuing with this, it seems that bcf_sr_seek is a wrong solution for the task: bcftools consensus wants to query a region similarly to bcf_sr_set_regions, but on the fly. Instead the function attempts to seek to a contig and if not present, a streaming-like mode is entered based on the order of contigs in the header (BCF) or tabix index. In this mode repeated seeks to a non-existent locations result in different outcome from subsequent bcf_sr_next_line calls.

A desired fix should (1) make repeated seek+next_line calls consistent and (2) provide on-the-fly alternative to bcf_sr_set_regions

from htslib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.