Git Product home page Git Product logo

rsamtools's Introduction

Rsamtools is an R/Bioconductor package that provides an interface to the samtools, bcftools, and tabix utilities for manipulating SAM (Sequence Alignment / Map), FASTA, binary variant call (BCF) and compressed indexed tab-delimited (tabix) files.

See https://bioconductor.org/packages/Rsamtools for more information including how to install the release version of the package (please refrain from installing directly from GitHub).

rsamtools's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rsamtools's Issues

CVE_2012_1461-1 FOUND

I just dug these out of my ClamAV log files:

/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/ex1.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/tagfilter.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/no_which_buffered_pileup.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/example.gtf.gz.tbi: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/example.gtf.gz: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/querybins.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/example_from_SAM_Spec.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/revbins.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/ex1.bcf.bci: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/tiny.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/ex1.bcf: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/extdata/no_which_whole_file.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/unitTests/cases/ex1_unsort.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/unitTests/cases/plp_refskip.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/unitTests/cases/ex1_noindex.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND
/home/mehrad/R/x86_64-pc-linux-gnu-library/3.4/Rsamtools/unitTests/cases/ex1_shuf1000.bam: PUA.Win.Exploit.CVE_2012_1461-1 FOUND

To save you some time, this is the CVE's page:
https://nvd.nist.gov/vuln/detail/CVE-2012-1461
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2012-1461

Considering that almost all of them are data files, they should be audited before inclusion to the package.

Issue with CRAM support install

Hello! I am trying to install Rsamtools with cram support using BiocManager::install("Bioconductor/Rsamtools@cram"). This gives the error "hts_utilities.c:2:10: fatal error: cram/cram.h: No such file or directory". I have tried on both Mac and Linux, with Bioconductor version 1.30.10 and Rhtslib version 1.22.0. Can you help me figure out how to use the CRAM support for Rsamtools?

Error installing

I have an error trying to install Rsamtools using BiocManager::install('Rsamtools'). This disrupts the installation of dada2 for me unfortunately.

/bin/sh: 1: Syntax error: "(" unexpected
make: *** [/home2/ISAD/dp415/miniconda3/envs/dada2_pipeline/lib/R/etc/Makeconf:174: Biostrings_stubs.o] Error 2
ERROR: compilation failed for package ‘Rsamtools’

Session Info here:

`sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS/LAPACK: /home2/ISAD/dp415/miniconda3/envs/dada2_pipeline/lib/libopenblasp-r0.3.10.so

locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached):
[1] BiocManager_1.30.10 compiler_4.0.3 tools_4.0.3
[4] remotes_2.2.0 `

Overhaul with new Samtools source code

In my lab we are trying to replace what was once a Python script with an R package, and where we were using Samtools we are now using Rsamtools. We are realizing our analysis doesn't work well using Rsamtools because it's missing some of the relatively new functionality of Samtools, namely incorporating the BAQ computation in mpileup. This has been our experience, but I'm sure Rsamtools is missing features important to others as well which have been added to Samtools in recent years, and thus are not found in the old Samtools C code Rsamtools is built on.

Issue with install on using conda

Everytime I try to install using conda, I get the following error:

x86_64-conda-linux-gnu-cc: error: .Rprofile:: No such file or directory
x86_64-conda-linux-gnu-cc: error: Setting: No such file or directory
x86_64-conda-linux-gnu-cc: error: R: No such file or directory
x86_64-conda-linux-gnu-cc: error: repository:-D_FILE_OFFSET_BITS=64: No such file or directory

Any ideas how to fix this?

Bug in scanTabix

Hi,

I came across a bug in scanTabix where no data is returned when requesting regions on double-digit chromosomes (ie >chr9). This only appears to be an issue on Windows and when the tabix file is above a certain size.

Here is a tabix file and index that will reproduce the issue. Apologies for the huge file, I tried a downsampling but the bug only seems to occur with the larger file.

Reproducible example:

library(Rsamtools)
library(GenomicRanges)
library(IRanges)

tbx.file <- "fragments.tsv.gz"
range.chr14 <- GRanges(seqnames = 'chr14', ranges = IRanges(start = 99635624, end = 99737861))
tbx <- TabixFile(file = tbx.file)
scanTabix(file = tbx, param = range.chr14)

This code will return data on macOS or linux but an empty vector on windows (I tested on Windows 7 with R 3.6.1 and the current version of Rsamtools).

countBam with empty GRanges object should return zero counts

Hi,

I have a scenario where I'm loading bed files and counting the alignments within the ranges provided by the bed files. These bed files may sometimes contain no ranges. However, when I pass the empty GRanges object to countBam() via scanBamParam() the entire bam file is counted. To my understanding, no explicit range should return zero. Passing a NULL object would justify counting the entire file, but I'm not convinced passing a GRanges object with no ranges should behave this way.

An example is below:

fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE)                                           
gr <- GRanges()                                                            
countBam(fl, param = ScanBamParam(which = gr))

This returns the following which appears incorrect to me.

space start end width    file records nucleotides
1    NA    NA  NA    NA ex1.bam    3307      116551

I can work around this easily enough using an if statement, but it just appeared to be a logical inconsistency. Could the test from line 5 in https://github.com/Bioconductor/Rsamtools/blob/master/R/countBam.R instead test for a NULL instead of a zero length?

Thanks in advance,

Steve

scanBam 'which' returns the same record multiple times

Regarding this documented behavior: "When one record overlaps two ranges in which, the record is returned twice."

Is there any way to disable this? This doesn't match what 'samtools view' does, and makes things difficult downstream. I don't see the benefit of this, if one want to simply extract reads that overlap specific regions.

Thanks.

Error with getSeq from opened fasta file (Rsamtools 2.0.2)

Hi,
I am using Rsamtools 2.0.2, and I have faced an issue on reading sequences from fasta files.

My issue is similar to #5 , but not exactly. When I try to "getSeq" from fasta file I get an error message. This is my script:

GTFfile = "Homo_sapiens.GRCh37.87.gtf"
FASTAfile = "Homo_sapiens.GRCh37.dna.toplevel.fa"

FASTA <- FaFile(FASTAfile)
open(FASTA)

getSeq(FASTA, GRanges("chr1", IRanges(1,5)))

I get this:
Error in value[3L] : record 1 (chr1:1-5) failed
file: Homo_sapiens.GRCh37.dna.toplevel.fa

If I run it in Windows, I get this:
[E::faidx_fetch_seq2] Failed to retrieve block. (Seeking in a compressed, .gzi unindexed, file?)
Error in value[3L] : record 1 (chr1:1-5) failed
file: Homo_sapiens.GRCh37.dna.toplevel.fa

The reference files are standard ensembl reference files and are downloaded from ftp://ftp.ensembl.org/pub/grch37/release-87/fasta/homo_sapiens/dna .

Thank you for your help.
Farshad

Is there a way to extract part of the sequence that maps to a specific area in the reference!

Dear all,

I have a bam file from which I want to extract the exact part of DNA sequences that map within the following region 54-78 to my reference .

I have tried so far unsuccessfully bedtools, samtools, custom scripts and I was wondering if there is a quick way to do it with RSamtools.

Here is a link to my data which is tiny, only 4 sequences, https://1drv.ms/u/s!ArOzUuixE-mggfskZPX_Y0cpyQH4RA?e=hAa0W3 and this is the desired result that I want to take out.

I just want to extract the sequences that map in the positions 54-78 in the genome and my data looks like this
TACGGTTATATTGACAGACCGAGGG
TAATTCGAAACCATAAGGCTCGCAA
TACGGTTATATTGACAGACCGAGGG
TAATTCGAAACCATAAGGCTCGCAA
TACGGTTATATTGACAGACCGAGGG

I have managed to parse the file with Rsamtools and make it to a data.frame

bam <- scanBam("test_4_seqs_sorted.bam")

#names of the BAM fields
names(bam[[1]])

#distribution of BAM flags
table(bam[[1]]$flag)

.unlist <- function (x){
  ## do.call(c, ...) coerces factor to integer, which is undesired
  x1 <- x[[1L]]
  if (is.factor(x1)){
    structure(unlist(x), class = "factor", levels = levels(x1))
  } else {
    do.call(c, x)
  }
}

#store names of BAM fields
bam_field <- names(bam[[1]])

#go through each BAM field and unlist
list <- lapply(bam_field, function(y) .unlist(lapply(bam, "[[", y)))

#store as data frame
bam_df <- do.call("DataFrame", list)
names(bam_df) <- bam_field

bam_df 

cram support

Hi,

is the cram format officially supported by rsamtools? The only reference I can find to cram is in issue #21, so I'm not sure if this is production ready.

Thanks
Matthias

Installation issue related to linker files

I'm having issues trying to install Rsamtools via BiocManager::install("Rsamtools").

Error has something to do with linker files, see below:

Bioconductor version 3.16 (BiocManager 1.30.19), R 4.2.2 (2022-10-31)
Installing package(s) 'Rsamtools'
trying URL 'https://bioconductor.org/packages/3.16/bioc/src/contrib/Rsamtools_2.14.0.tar.gz'
Content type 'application/x-gzip' length 2873806 bytes (2.7 MB)
==================================================
downloaded 2.7 MB

.Rprofile: Setting R repository:* installing *source* package ‘Rsamtools’ ...
** using staged installation
** libs
x86_64-conda-linux-gnu-cc -I"/compbio/home/shan.sabri/miniconda3/lib/R/include" -DNDEBUG .Rprofile: Setting R repository:-D_FILE_OFFSET_BITS=64 -I'/compbio/home/shan.sabri/miniconda3/lib/R/library/Rhtslib/include' -I'/compbio/home/shan.sabri/miniconda3/lib/R/library/S4Vectors/include' -I'/compbio/home/shan.sabri/miniconda3/lib/R/library/IRanges/include' -I'/compbio/home/shan.sabri/miniconda3/lib/R/library/XVector/include' -I'/compbio/home/shan.sabri/miniconda3/lib/R/library/Biostrings/include' -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /compbio/home/shan.sabri/miniconda3/include -I/compbio/home/shan.sabri/miniconda3/include -Wl,-rpath-link,/compbio/home/shan.sabri/miniconda3/lib   -fpic  -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /compbio/home/shan.sabri/miniconda3/include -fdebug-prefix-map=/home/conda/feedstock_root/build_artifacts/r-base-split_1671440533574/work=/usr/local/src/conda/r-base-4.2.2 -fdebug-prefix-map=/compbio/home/shan.sabri/miniconda3=/usr/local/src/conda-prefix  -c Biostrings_stubs.c -o Biostrings_stubs.o
x86_64-conda-linux-gnu-cc: warning: .Rprofile:: linker input file unused because linking not done
x86_64-conda-linux-gnu-cc: error: .Rprofile:: linker input file not found: No such file or directory
x86_64-conda-linux-gnu-cc: warning: Setting: linker input file unused because linking not done
x86_64-conda-linux-gnu-cc: error: Setting: linker input file not found: No such file or directory
x86_64-conda-linux-gnu-cc: warning: R: linker input file unused because linking not done
x86_64-conda-linux-gnu-cc: error: R: linker input file not found: No such file or directory
x86_64-conda-linux-gnu-cc: warning: repository:-D_FILE_OFFSET_BITS=64: linker input file unused because linking not done
x86_64-conda-linux-gnu-cc: error: repository:-D_FILE_OFFSET_BITS=64: linker input file not found: No such file or directory
make: *** [Biostrings_stubs.o] Error 1
ERROR: compilation failed for package ‘Rsamtools’
* removing ‘/compbio/home/shan.sabri/miniconda3/lib/R/library/Rsamtools’

sessionInfo is posted below:

R version 4.2.2 (2022-10-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /compbio/home/shan.sabri/miniconda3/lib/libopenblasp-r0.3.21.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

loaded via a namespace (and not attached):
[1] BiocManager_1.30.19 compiler_4.2.2      tools_4.2.2
[4] renv_0.16.0

Any help would be greatly appreciated!

Help with pileup

Dear RSamtools,

I'm hoping to use the pileup function to output a table like the below. My goal is to be strand sensitive, distiguish +/- (for paired-end), include indels, as well count of 5'starts and 3'stops. I have so far been unsuccessful coming up with an implementation of PileupParam to retrieve this information. Are you able to help? Thanks!

chr loc ref strand  A   T   C   G   a   t   c   g   Insertion   Deletion    Stops   Starts
1   888659  T   +   0   9   0   2   0   0   0   0   2:AG|1:AGAG 1:ATT   3   2

Indexing a gzipped fasta file

Hi,

I am trying to index a gzipped fasta file but I get the following error:

> indexFa("./FASTA_files/A6-26.fa.gz")
Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'object' in selecting a method for function 'index': error in evaluating the argument 'file' in selecting a method for function 'indexFa': 'open' index failed
  file: ./FASTA_files/A6-26.fa.gz

I can do it with the uncompressed fasta file with no problems:

> indexFa("./FASTA_files/A6-26.fa")
[1] "./FASTA_files/A6-26.fa.fai"

I know you can index compressed fasta files with command line Samtools.

Am I doing something wrong?

ScanBamParam: load all tags

Hi, Is there a way to specify with ScanBamParam to load all tags, without having to specify them one by one?

Rsamtools pileup and Pysam pileup output discrepancy

Describe the bug
I have mRNA sequencing data that I've aligned to a genome. Specifically, I am interested in determining the total number of reads at each base and the percentage of occurrences of A, T, G, C, deletions, and insertions at these bases. I have utilized both Rsamtools pileup and Pysam pileup to achieve this. However, I noticed that their outputs are inconsistent, even though I used the same BAM file as input for both tools.
To double check which one is correct. I use samtools on linux to slice out specific one base and found out the totaly number of read align to that base is the same as Pysam output. Not the Rsamtools output. Below is the code I implemented:

To Reproduce
Rsamtools:
param <- ScanBamParam(which=GRanges(strand = "-",
seqnames = "chr1",
ranges = IRanges(start=944203,
end=959309)))
pilup_params = Rsamtools::PileupParam(max_depth = 102500,
min_mapq = 20,
distinguish_nucleotides = T,
ignore_query_Ns = F,
include_insertions = T,
distinguish_strands = F)
bam.pileup = pileup(bamfile,
scanBamParam=param,
pileupParam = pilup_params)
bam.pileup = bam.pileup[order(bam.pileup$pos), ]

Pysam:
bamfile = AlignmentFile(bam_dir, "rb")
pileups = bamfile.pileup('chr1', 944202, 959308, truncate=True, min_mapping_quality=20)
columns=['chr', 'position', "coverage", 'A', 'T', 'G', 'C', '-', '+']
df = pd.DataFrame(columns=columns)
for pileupcolumn in pileups:
data = []
for pileupread in pileupcolumn.pileups:
if pileupread.is_del:
data.append('-')
elif pileupread.indel:
data.append('+')
else:
data.append(pileupread.alignment.query_sequence[pileupread.query_position])
base_counts = Counter(data)
row = {
'chr': 'chr1',
'position': pileupcolumn.pos+1,
'coverage': pileupcolumn.n,
'A': base_counts['A'],
'T': base_counts['T'],
'G': base_counts['G'],
'C': base_counts['C'],
'-': base_counts['-'],
'+': base_counts['+']
}
df = df.append(row, ignore_index=True)

Here is the samtools command I used on linux
samtools view -h -F 256 -F 4 -F 2048 -bq 20 bamfile.bam chr:944202-959303 >NOC2L.bam
samtools view -c NOC2L.bam

The samtools linux command output 146 total at chr:944202-959303 which is the same as Pysam. But Rsamtools output 82 reads.

And here is the screen shot of Pysam and Rsamtools output

pysamoutput
Rsamtools

`Error: scanTabix: '4' not present in tabix index`

Hello,

So I seem to be having some issues with querying remote tabix files (e.g from ENCODE). Though I'm not sure if this is strictly related to the file being remote, or some other difference in how the file is formatted.

Reprex

Main example

 #### Setup ####
    
    #### Get example data (Parkinson's Disease GWAS) ####
    if(!require("echodata")) remotes::install_github("RajLabMSSM/echodata") 
    #### Construct query as granges ####
    dat <- echodata::BST1
    gr <- GenomicRanges::GRanges(
        seqnames = as.integer(dat$CHR),
        ranges = IRanges::IRanges(
            start = as.integer(dat$POS),
            end = as.integer(dat$POS)
        )
    )
    #### For reconstructing the data as a data.table with colnames ####
    scanTabix_to_dt <- function(bgz, query){
        ### Add missing header back in 
        header <- Rsamtools::headerTabix(bgz) 
        query_dt <- data.table::fread(paste(c(header$header, query),
                                            collapse = "\n"), fill=TRUE)
        return(query_dt)
    } 
    
    
    
    #### ------- Example 1: Remote file ------- ####
    ### Currently produces error
    
    ## This is an tabix-indexed and bgzip-compressed ENCODE file.
    bgz1 <- file.path(
        "https://egg2.wustl.edu/roadmap/data/byFileType",
        "chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final",
        "E099_15_coreMarks_dense.bed.bgz"
    ) 
    #### Query ####
    query1 <- Rsamtools::scanTabix(file = bgz1, param = gr) # <-- error occurs here
    #  "Error: scanTabix: '4' not present in tabix index"
    
    #### Running without the gr query returns the while file ####
    ## However, because it's the whole file, this takes a while.
    query1 <- Rsamtools::scanTabix(file = bgz1)
    ## This part tends to crash R (memory overload?)
    # query_dt1 <- scanTabix_to_dt(bgz1, query1) 

Extended examples

    #### ------- Example 2: Local file ------- #### 
   
 ### Currently working
    fullSS_path <- echodata::example_fullSS()
    bgz2 <- Rsamtools::bgzip(file = fullSS_path, 
                            dest = paste0(fullSS_path,".bgz"), 
                            overwrite = TRUE)
    tbi2 <- Rsamtools::indexTabix(file = bgz2, 
                                 seq = 2, 
                                 start = 3,
                                 end = 3, 
                                 comment = "SNP")  
    #### Query ####
    query2 <- Rsamtools::scanTabix(file = bgz2, param = gr) 
    query_dt2 <- scanTabix_to_dt(bgz2, query2)
     
    
    
    
    #### ------- Example 3: Local file prepared without Rsamtools ------- #### 
    ### Currently working 
    ## Tho not when bgzipping with CLI. may be a version conflict issue
    ## (unrelated to Rsamtools per se).
    
    fullSS_path <- echodata::example_fullSS()
    bgz3 <- paste0(fullSS_path,".bgz") 
    #### Compress the file via CLI ####
    bgzip_method <- "rsamtools"
    if(bgzip_method == "cli"){
        system(paste("bgzip -f", fullSS_path))
        ## However, this causes an error with tabix:
        ## "[tabix] the compression of '..._subset.tsv.bgz' is not BGZF". weird!
        #### Check version of bgzip ####
        help <- system("bgzip -h", intern = TRUE)
        cat(help, sep = "\n")
    } else {
        ## Compressing with Rsamtools seems to work fine with tabix CLI 
        bgz3 <- Rsamtools::bgzip(file = fullSS_path,
                                 dest = bgz3, 
                                 overwrite = TRUE)
    }
    #### Tabix-index ####
    system(paste("tabix",
                 "-f",
                 "-h",
                 "-s",2,
                 "-b",3,
                 "-e",3,
                 "-c","SNP",
                 bgz3
    )) 
    query3 <- Rsamtools::scanTabix(file = bgz3, param = gr)
    query_dt3 <- scanTabix_to_dt(bgz3, query3)

Session info

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
  [1] bitops_1.0-7                matrixStats_0.61.0          fs_1.5.2                   
  [4] lubridate_1.8.0             bit64_4.0.5                 filelock_1.0.2             
  [7] progress_1.2.2              httr_1.4.2                  GenomeInfoDb_1.30.1        
 [10] gh_1.3.0                    tools_4.1.0                 utf8_1.2.2                 
 [13] R6_2.5.1                    DT_0.21                     DBI_1.1.2                  
 [16] BiocGenerics_0.40.0         tidyselect_1.1.2            prettyunits_1.1.1          
 [19] bit_4.0.4                   curl_4.3.2                  compiler_4.1.0             
 [22] cli_3.2.0                   Biobase_2.54.0              xml2_1.3.3                 
 [25] DelayedArray_0.20.0         rtracklayer_1.54.0          rappdirs_0.3.3             
 [28] stringr_1.4.0               digest_0.6.29               Rsamtools_2.10.0           
 [31] piggyback_0.1.1             R.utils_2.11.0              XVector_0.34.0             
 [34] pkgconfig_2.0.3             htmltools_0.5.2             echodata_0.99.7            
 [37] MatrixGenerics_1.6.0        BSgenome_1.62.0             dbplyr_2.1.1               
 [40] fastmap_1.1.0               htmlwidgets_1.5.4           rlang_1.0.2                
 [43] rstudioapi_0.13             RSQLite_2.2.10              BiocIO_1.4.0               
 [46] generics_0.1.2              jsonlite_1.8.0              echoconda_0.99.5           
 [49] BiocParallel_1.28.3         dplyr_1.0.8                 zip_2.2.0                  
 [52] R.oo_1.24.0                 VariantAnnotation_1.40.0    RCurl_1.98-1.6             
 [55] magrittr_2.0.2              GenomeInfoDbData_1.2.7      Matrix_1.4-0               
 [58] Rcpp_1.0.8.3                S4Vectors_0.32.3            fansi_1.0.2                
 [61] reticulate_1.24-9000        lifecycle_1.0.1             R.methodsS3_1.8.1          
 [64] yaml_2.3.5                  stringi_1.7.6               brio_1.1.3                 
 [67] SummarizedExperiment_1.24.0 zlibbioc_1.40.0             BiocFileCache_2.2.1        
 [70] grid_4.1.0                  blob_1.2.2                  parallel_4.1.0             
 [73] crayon_1.5.0                lattice_0.20-45             Biostrings_2.62.0          
 [76] GenomicFeatures_1.46.5      hms_1.1.1                   KEGGREST_1.34.0            
 [79] pillar_1.7.0                GenomicRanges_1.46.1        rjson_0.2.21               
 [82] biomaRt_2.50.3              clisymbols_1.2.0            stats4_4.1.0               
 [85] XML_3.99-0.9                glue_1.6.2                  data.table_1.14.2          
 [88] png_0.1-7                   vctrs_0.3.8                 testthat_3.1.2             
 [91] purrr_0.3.4                 tidyr_1.2.0                 assertthat_0.2.1           
 [94] cachem_1.0.6                openxlsx_4.2.5              echotabix_0.99.3           
 [97] restfulr_0.0.13             gitcreds_0.1.1              tibble_3.1.6               
[100] GenomicAlignments_1.30.0    AnnotationDbi_1.56.2        memoise_2.0.1              
[103] IRanges_2.28.0              ellipsis_0.3.2             

Rsamtools cannot handle csi index?

Hi,

I'm working with the barley genome which has chromosomes exceeding the 2^29 length limit for creating a bai index.
I therefore need to make a csi index.

But Rsamtools seems not to be able to handle bam files that exceed this limit and it seems it cannot handle csi index.

ScanBamParams documentation

I've found the documentation on the usage of ScanBamParams to be opaque. In particular, it wasn't immediately apparent to me that these arguments:


isPaired | A logical(1) indicating whether unpaired (FALSE), paired (TRUE), or any (NA) read should be returned.
-- | --
isProperPair | A logical(1) indicating whether improperly paired (FALSE), properly paired (TRUE), or any (NA) read should be returned. A properly paired read is defined by the alignment algorithm and might, e.g., represent reads aligning to identical reference sequences and with a specified distance.

isUnmappedQuery | A logical(1) indicating whether unmapped (TRUE), mapped (FALSE), or any (NA) read should be returned.

hasUnmappedMate | A logical(1) indicating whether reads with mapped (FALSE), unmapped (TRUE), or any (NA) mate should be returned.

...

In the docs needed to be wrapped in a flag argument like so:

p2 <- ScanBamParam(what=scanBamWhat(),
                   flag=scanBamFlag(isMinusStrand=FALSE))

Clearly, I figured it out, mostly thanks to the online documentation here: https://www.rdocumentation.org/packages/Rsamtools/versions/1.24.0/topics/ScanBamParam

So it wasn't too hard to find good examples. But, I think the documentation could be improved in this section.

Support for delimited tags

Dear Rsamtools team,

I am trying to use the below to retrieve reads that are tagged as duplicates with picard, and distinguished by type with either 'DT:Z:SQ' or 'DT:Z:LB' (default of picard) as optional tags. If I am doing something wrong, can you please advise? If this is not currently supported, can you add a feature for this? If I enter the full tag, I get an errorError in validObject(.Object) : invalid class “ScanBamParam” object: 'tag' must be two letters, e.g,, 'MD'

> DT <- ScanBamParam(tag=c("DT"), what="flag")
> bam.DT <- scanBam(bamFile, param=DT)
> str(bam.DT[[1]][["tag"]])
List of 1
 $ DT: NULL

Support for more recent HTSlib

Hi,

Are there plans to drop the "included" RHTSlib and depend on the bioconductor package itself? This would enable us the use more modern interfaces (csi, cram, ...).

If not, could you give some pointers where best to start to implement this? Our ultimate goals would be to enable cram compatibility for downstream tools.

Thanks.

unable to load shared object '/build/r-rsamtools/src/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so'

gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c bcffile.c -o bcffile.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c encode.c -o encode.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c fafile.c -o fafile.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c idxstats.c -o idxstats.o
In file included from /usr/lib/R/library/Rhtslib/include/sam.h:29,
                 from bamfile.h:5,
                 from idxstats.c:1:
idxstats.c: In function ‘idxstats_bamfile’:
/usr/lib/R/library/Rhtslib/include/bam.h:57:32: warning: ignoring return value of ‘bgzf_seek’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   57 | #define bam_seek(fp, pos, dir) bgzf_seek(fp, pos, dir)
      |                                ^~~~~~~~~~~~~~~~~~~~~~~
idxstats.c:20:12: note: in expansion of macro ‘bam_seek’
   20 |     (void) bam_seek(fp, 0, 0);
      |            ^~~~~~~~
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c io_sam.c -o io_sam.o
In file included from /usr/lib/R/library/Rhtslib/include/sam.h:29,
                 from io_sam.c:2:
io_sam.c: In function ‘_scan_bam_all’:
/usr/lib/R/library/Rhtslib/include/bam.h:57:32: warning: ignoring return value of ‘bgzf_seek’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   57 | #define bam_seek(fp, pos, dir) bgzf_seek(fp, pos, dir)
      |                                ^~~~~~~~~~~~~~~~~~~~~~~
io_sam.c:304:12: note: in expansion of macro ‘bam_seek’
  304 |     (void) bam_seek(bfile->file->x.bam, bfile->pos0, SEEK_SET);
      |            ^~~~~~~~
g++ -std=gnu++14 -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -flto=auto  -c pbuffer_wrapper.cpp -o pbuffer_wrapper.o
g++ -std=gnu++14 -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -flto=auto  -c pileup.cpp -o pileup.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c pileupbam.c -o pileupbam.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c sam.c -o sam.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c sam_opts.c -o sam_opts.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c sam_utils.c -o sam_utils.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c samtools_patch.c -o samtools_patch.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c scan_bam_data.c -o scan_bam_data.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c tabixfile.c -o tabixfile.o
tabixfile.c: In function ‘index_tabix’:
tabixfile.c:190:5: warning: ‘bgzf_is_bgzf’ is deprecated: Use bgzf_compression() or hts_detect_format() instead [-Wdeprecated-declarations]
  190 |     if (bgzf_is_bgzf(fn) != 1)
      |     ^~
In file included from tabixfile.c:3:
/usr/lib/R/library/Rhtslib/include/htslib/bgzf.h:243:9: note: declared here
  243 |     int bgzf_is_bgzf(const char *fn) HTS_DEPRECATED("Use bgzf_compression() or hts_detect_format() instead");
      |         ^~~~~~~~~~~~
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c tagfilter.c -o tagfilter.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c utilities.c -o utilities.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c zip_compression.c -o zip_compression.o
g++ -std=gnu++14 -shared -L/usr/lib64/R/lib -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -o Rsamtools.so Biostrings_stubs.o COMPAT_bcf_hdr_read.o IRanges_stubs.o PileupBuffer.o PosCacheColl.o R_init_Rsamtools.o ResultManager.o S4Vectors_stubs.o XVector_stubs.o as_bam.o bam.o bam_data.o bam_mate_iter.o bam_plbuf.o bam_sort.o bambuffer.o bamfile.o bcffile.o encode.o fafile.o idxstats.o io_sam.o pbuffer_wrapper.o pileup.o pileupbam.o sam.o sam_opts.o sam_utils.o samtools_patch.o scan_bam_data.o tabixfile.o tagfilter.o utilities.o zip_compression.o /usr/lib/R/library/Rhtslib/usrlib/libhts.a -lcurl -L/usr/lib64/R/lib -lR
installing to /build/r-rsamtools/src/00LOCK-Rsamtools/00new/Rsamtools/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘Rsamtools’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/build/r-rsamtools/src/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so':
  /build/r-rsamtools/src/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so: undefined symbol: hts_open_format

scanTabix should return a stream

I am having a problem with scanTabix.

I am reading tabix-indexed data that I would like to parse into a data.frame, quickly.

All I expect scanTabix to do is to give me a stream that I can feed into a table parsing function, together with defined columns.

scanTabix instead provides a list of strings. So I have to paste those together to get a parseable format.

That errors out because paste0 implementation fails to produce a string > 2GB in size.

Can you please offer a stream option to get data out of scanTabix? You must have a stream under the covers, you just choose to parse it by newlines, which is something that not only does not help me, it actually harms performance of my code...

indexFa breaks FaFile object when using getSeq

Hi,

I have noticed some odd behavior using the getSeq function with a FaFile object as input.

fa <- FaFile("test.fasta")
fa
> class: FaFile 
> path: test.fasta
> index: test.fasta.fai
> isOpen: FALSE 
> yieldSize: NA 
getSeq(FaFile(fasta),rtracklayer::import.gff3(gff))
> A DNAStringSet instance of length 1
>      width seq                          names               
> [1]     5 AGCTA                         chr1

The index file is automatically created, when I run getSeq.

However, If I now run the indexFa function, the FaFile object becomes unusable

indexFa(fa)
> class: FaFile 
> path: test.fasta
> index: test.fasta
> isOpen: FALSE 
> yieldSize: NA 
getSeq(fa,GRanges("chr1",IRanges(1,5)))
> Error in value[[3L]](cond) :  record 1 (chr1:1-5) failed
>   file: test.fasta

I noticed that the pointer for the index does revert to the fasta file itself, which might be the underlying problem.
test.zip

I probably don't need to run the indexFa function to retrieve sequences. However, I noticed this behavior and just wanted to let someone know.

sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RNAmodR_0.0.2        Rsamtools_1.34.0     Modstrings_0.99.12   Biostrings_2.50.1    XVector_0.22.0       GenomicRanges_1.34.0
 [7] GenomeInfoDb_1.18.1  IRanges_2.16.0       S4Vectors_0.20.1     BiocGenerics_0.28.0 

loaded via a namespace (and not attached):
  [1] assertive.base_0.0-7        colorspace_1.3-2            colorRamps_2.3              biovizBase_1.30.1          
  [5] htmlTable_1.13.1            base64enc_0.1-3             dichromat_2.0-0             rstudioapi_0.8             
  [9] assertive.sets_0.0-3        bit64_0.9-7                 AnnotationDbi_1.44.0        assertive.data.uk_0.0-2    
 [13] codetools_0.2-15            splines_3.5.2               knitr_1.21                  Formula_1.2-3              
 [17] assertive_0.3-5             assertive.data.us_0.0-2     cluster_2.0.7-1             compiler_3.5.2             
 [21] httr_1.4.0                  backports_1.1.3             assertthat_0.2.0            Matrix_1.2-15              
 [25] lazyeval_0.2.1              acepack_1.4.1               htmltools_0.3.6             prettyunits_1.0.2          
 [29] tools_3.5.2                 bindrcpp_0.2.2              gtable_0.2.0                glue_1.3.0                 
 [33] GenomeInfoDbData_1.2.0      reshape2_1.4.3              dplyr_0.7.8                 Rcpp_1.0.0                 
 [37] Biobase_2.42.0              gdata_2.18.0                rtracklayer_1.42.1          assertive.files_0.0-2      
 [41] assertive.datetimes_0.0-2   assertive.models_0.0-2      xfun_0.4                    stringr_1.3.1              
 [45] ensembldb_2.6.3             gtools_3.8.1                XML_3.98-1.16               zlibbioc_1.28.0            
 [49] scales_1.0.0                BSgenome_1.50.0             VariantAnnotation_1.28.7    hms_0.4.2                  
 [53] ProtGenerics_1.14.0         SummarizedExperiment_1.12.0 AnnotationFilter_1.6.0      RColorBrewer_1.1-2         
 [57] assertive.matrices_0.0-2    assertive.strings_0.0-3     yaml_2.2.0                  curl_3.2                   
 [61] memoise_1.1.0               gridExtra_2.3               ggplot2_3.1.0               biomaRt_2.38.0             
 [65] rpart_4.1-13                latticeExtra_0.6-28         stringi_1.2.4               RSQLite_2.1.1              
 [69] checkmate_1.8.5             GenomicFeatures_1.34.1      caTools_1.17.1.1            BiocParallel_1.16.5        
 [73] rlang_0.3.0.1               pkgconfig_2.0.2             matrixStats_0.54.0          bitops_1.0-6               
 [77] lattice_0.20-38             assertive.data_0.0-3        ROCR_1.0-7                  purrr_0.2.5                
 [81] bindr_0.1.1                 GenomicAlignments_1.18.1    htmlwidgets_1.3             assertive.properties_0.0-4 
 [85] bit_1.1-14                  tidyselect_0.2.5            assertive.code_0.0-3        plyr_1.8.4                 
 [89] magrittr_1.5                R6_2.3.0                    gplots_3.0.1                Hmisc_4.1-1                
 [93] DelayedArray_0.8.0          DBI_1.0.0                   pillar_1.3.1                foreign_0.8-71             
 [97] assertive.numbers_0.0-2     survival_2.43-3             RCurl_1.95-4.11             nnet_7.3-12                
[101] tibble_1.4.2                crayon_1.3.4                assertive.types_0.0-3       KernSmooth_2.23-15         
[105] progress_1.2.0              grid_3.5.2                  data.table_1.11.8           blob_1.1.1                 
[109] digest_0.6.18               munsell_0.5.0               assertive.reflection_0.0-4  Gviz_1.26.4  

`Rsamtools::indexTabix`: Capture console output

I recently posted an error I've been encountering with seqminer (here) but realized it seems to trace all the way back to tabix itself.

Regardless, in cases where I get the following kind of error...

[E::hts_idx_push] Unsorted positions on sequence #2: 71999802 followed by 7

...I'd like to wrap the function in atryCatch and capture the messages so that I rerun it with the offending lines in the skip argument. Something like:

out <- tryCatch({  
               Rsamtools::indexTabix(file = bgz_file,
                                     seq = 2,
                                     start = 3,
                                     end = 3, 
                                     comment ="SNP"
               ) 
           })

However, I can't seem to get the output into R. I've tried:

  • tryCatch (above)
  • capture.output
  • sink()

Do you know a way to extract this info?

Thanks,
Brian

Session info

R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 11.4

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] arrow_7.0.0      ggimage_0.3.0    ggplot2_3.3.5    dplyr_1.0.8      hexSticker_0.4.9
[6] echotabix_0.99.3

loaded via a namespace (and not attached):
  [1] AnnotationHub_3.2.2           BiocFileCache_2.2.1           systemfonts_1.0.4            
  [4] igraph_1.2.11                 BiocParallel_1.28.3           GenomeInfoDb_1.30.1          
  [7] digest_0.6.29                 yulab.utils_0.0.4             htmltools_0.5.2              
 [10] magick_2.7.3                  fansi_1.0.2                   magrittr_2.0.2               
 [13] memoise_2.0.1                 BSgenome_1.62.0               echoverseTemplate_0.99.0     
 [16] ontologyPlot_1.6              openxlsx_4.2.5                Biostrings_2.62.0            
 [19] matrixStats_0.61.0            R.utils_2.11.0                sysfonts_0.8.5               
 [22] prettyunits_1.1.1             colorspace_2.0-3              blob_1.2.2                   
 [25] rappdirs_0.3.3                textshaping_0.3.6             xfun_0.30                    
 [28] crayon_1.5.0                  RCurl_1.98-1.6                echodata_0.99.6              
 [31] jsonlite_1.8.0                hexbin_1.28.2                 graph_1.72.0                 
 [34] VariantAnnotation_1.40.0      glue_1.6.2                    gtable_0.3.0                 
 [37] zlibbioc_1.40.0               XVector_0.34.0                DelayedArray_0.20.0          
 [40] Rgraphviz_2.38.0              BiocGenerics_0.40.0           scales_1.1.1                 
 [43] DBI_1.1.2                     Rcpp_1.0.8.2                  showtextdb_3.0               
 [46] xtable_1.8-4                  progress_1.2.2                gridGraphics_0.5-1           
 [49] bit_4.0.4                     clisymbols_1.2.0              stats4_4.1.0                 
 [52] DT_0.21                       htmlwidgets_1.5.4             httr_1.4.2                   
 [55] ontologyIndex_2.7             ellipsis_0.3.2                pkgconfig_2.0.3              
 [58] XML_3.99-0.9                  R.methodsS3_1.8.1             farver_2.1.0                 
 [61] seqminer_8.4                  dbplyr_2.1.1                  utf8_1.2.2                   
 [64] here_1.0.1                    ggplotify_0.1.0               tidyselect_1.1.2             
 [67] labeling_0.4.2                rlang_1.0.2                   later_1.3.0                  
 [70] AnnotationDbi_1.56.2          BiocVersion_3.14.0            munsell_0.5.0                
 [73] tools_4.1.0                   cachem_1.0.6                  cli_3.2.0                    
 [76] generics_0.1.2                RSQLite_2.2.10                evaluate_0.15                
 [79] stringr_1.4.0                 fastmap_1.1.0                 yaml_2.3.5                   
 [82] ragg_1.2.2                    knitr_1.37                    bit64_4.0.5                  
 [85] fs_1.5.2                      zip_2.2.0                     purrr_0.3.4                  
 [88] KEGGREST_1.34.0               gh_1.3.0                      showtext_0.9-5               
 [91] mime_0.12                     R.oo_1.24.0                   xml2_1.3.3                   
 [94] biomaRt_2.50.3                brio_1.1.3                    compiler_4.1.0               
 [97] rstudioapi_0.13               interactiveDisplayBase_1.32.0 filelock_1.0.2               
[100] curl_4.3.2                    png_0.1-7                     testthat_3.1.2               
[103] paintmap_1.0                  tibble_3.1.6                  stringi_1.7.6                
[106] GenomicFeatures_1.46.5        desc_1.4.1                    lattice_0.20-45              
[109] Matrix_1.4-0                  vctrs_0.3.8                   pillar_1.7.0                 
[112] lifecycle_1.0.1               BiocManager_1.30.16           data.table_1.14.2            
[115] bitops_1.0-7                  httpuv_1.6.5                  rtracklayer_1.54.0           
[118] GenomicRanges_1.46.1          R6_2.5.1                      BiocIO_1.4.0                 
[121] promises_1.2.0.1              IRanges_2.28.0                ontoProc_1.16.0              
[124] assertthat_0.2.1              pkgload_1.2.4                 SummarizedExperiment_1.24.0  
[127] rprojroot_2.0.2               rjson_0.2.21                  withr_2.5.0                  
[130] GenomicAlignments_1.30.0      Rsamtools_2.10.0              S4Vectors_0.32.3             
[133] GenomeInfoDbData_1.2.7        parallel_4.1.0                hms_1.1.1                    
[136] grid_4.1.0                    ggfun_0.0.5                   tidyr_1.2.0                  
[139] rmarkdown_2.13                MatrixGenerics_1.6.0          piggyback_0.1.1              
[142] Biobase_2.54.0                shiny_1.7.1                   lubridate_1.8.0              
[145] restfulr_0.0.13          

installation error in Microsoft R Open (R4.0.2)

Here are the error message:

* installing *source* package ‘Rsamtools’ ...
** using staged installation
** libs
gcc -std=gnu99 -I/opt/microsoft/ropen/4.0.2/lib64/R/include -DNDEBUG Microsoft R Open 4.0.2 The enhanced R distribution from Microsoft Microsoft packages Copyright (C) 2020 Microsoft Corporation  Using the Intel MKL for parallel mathematical computing (using 48 cores).  Default CRAN mirror snapshot taken on 2020-07-16. See: https://mran.microsoft.com/.  -D_FILE_OFFSET_BITS=64 -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/Rhtslib/include' -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/S4Vectors/include' -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/IRanges/include' -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/XVector/include' -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/Biostrings/include' -DU_STATIC_IMPLEMENTATION   -fpic  -DU_STATIC_IMPLEMENTATION -O2 -g  -c Biostrings_stubs.c -o Biostrings_stubs.o
/bin/bash: -c:line 0: syntax error near unexpected token `('
/bin/bash: -c:line 0: `gcc -std=gnu99 -I/opt/microsoft/ropen/4.0.2/lib64/R/include -DNDEBUG Microsoft R Open 4.0.2 The enhanced R distribution from Microsoft Microsoft packages Copyright (C) 2020 Microsoft Corporation  Using the Intel MKL for parallel mathematical computing (using 48 cores).  Default CRAN mirror snapshot taken on 2020-07-16. See: https://mran.microsoft.com/.  -D_FILE_OFFSET_BITS=64 -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/Rhtslib/include' -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/S4Vectors/include' -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/IRanges/include' -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/XVector/include' -I'/opt/microsoft/ropen/4.0.2/lib64/R/library/Biostrings/include' -DU_STATIC_IMPLEMENTATION   -fpic  -DU_STATIC_IMPLEMENTATION -O2 -g  -c Biostrings_stubs.c -o Biostrings_stubs.o'
make: *** [Biostrings_stubs.o] error 1
ERROR: compilation failed for package ‘Rsamtools’
* removing ‘/opt/microsoft/ropen/4.0.2/lib64/R/library/Rsamtools’

The downloaded source packages are in
	‘/tmp/RtmpS8bLL3/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In .inet_warning(msg) :
  installation of package ‘Rsamtools’ had non-zero exit status

It seems that the error came from /bin/bash: -c:line 0: syntax error near unexpected token `(' and [Biostrings_stubs.o] error 1

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /opt/microsoft/ropen/4.0.2/lib64/R/lib/libRblas.so
LAPACK: /opt/microsoft/ropen/4.0.2/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US             LC_NUMERIC=C              
 [3] LC_TIME=en_US              LC_COLLATE=en_US          
 [5] LC_MONETARY=en_US          LC_MESSAGES=en_US         
 [7] LC_PAPER=zh_CN.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=zh_CN.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RevoUtils_11.0.2     RevoUtilsMath_11.0.0

loaded via a namespace (and not attached):
[1] BiocManager_1.30.16 compiler_4.0.2      tools_4.0.2

request to add support for pacbio unaligned bam files?

hi there,

I'm beginning to look at some PacBio CCS data. It looks like PacBio read data now come as bam files, even for unmapped reads:
https://pacbiofileformats.readthedocs.io/en/5.1/index.html
https://pacbiofileformats.readthedocs.io/en/5.1/BAM.html

I'd love to be able to read in this bam file to R directly (and include some of those tags, much like scanBam does for bam files of mapped reads).

scanBam seems like an obvious choice, but it won't read the PacBio bam files because the headers don't contain @sq lines (because the reads haven't yet been mapped to a genome). Would it be possible to remove this restriction on scanBam, that reads should have been mapped to a genome and therefore have @sq lines in the header? The asSam function seems to have the same restriction (but command-line 'samtools view' works fine on these files).

I realize that most of the fields usually returned by scanBam are not relevant for unmapped reads, but the infrastructure it provides for reading in the bam file and parsing extra tags seems really useful here. I know as a workaround I can use samtools on the command line and then 'scan', or convert the bam to fastq format, but using the bam file directly would be great in future.

An example of what I'd like to do is below.

thanks for considering!

Janet

## get an example file
wget https://downloads.pacbcloud.com/public/dataset/HG002_SV_and_SNV_CCS/consensusreads/m64011_181218_235052.consensusreads.bam
wget https://downloads.pacbcloud.com/public/dataset/HG002_SV_and_SNV_CCS/consensusreads/m64011_181218_235052.consensusreads.bam.bai    

## start R
R
library(Rsamtools)

bam <- scanBam("m64011_181218_235052.consensusreads.bam")
[samopen] no @SQ lines in the header.
Error in value[[3L]](cond) : 
  failed to open BamFile: SAM/BAM header missing or empty
  file: 'm64011_181218_235052.consensusreads.bam'


sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.5 LTS

Matrix products: default
BLAS/LAPACK: /app/easybuild/software/OpenBLAS/0.2.18-GCC-5.4.0-2.26-LAPACK-3.6.1/lib/libopenblas_prescottp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
[1] Rsamtools_2.2.0      Biostrings_2.54.0    XVector_0.24.0      
[4] GenomicRanges_1.36.0 GenomeInfoDb_1.20.0  IRanges_2.18.1      
[7] S4Vectors_0.22.0     BiocGenerics_0.32.0 

loaded via a namespace (and not attached):
[1] zlibbioc_1.30.0        compiler_3.6.1         tools_3.6.1           
[4] GenomeInfoDbData_1.2.1 RCurl_1.95-4.12        BiocParallel_1.18.0   
[7] bitops_1.0-6          

Using both "yileldSize" and "which" parameters

Dear developers, Hello.

I was wondering if the "yileldSize" in BamFile() is ignored when importing reads using readGAlignmentsList(), if also "which" parameter is defined in scanBamFlag(). In other words if we define "which" in scanBamFlag() function, readGAlignmentsList() will just import all the reads that have been mapped to the Genomic coordinates defined in which. Is there anyway to run readGAlignmentsList() so that it would import reads mapping to the coordinates defined using "which" parameter of scanBamFlag(), and load at maximum only 100,000 of these reads (i.e. considering the yieldSize=100000 param setting in BamFile() ). So would this sequence of codes take the yieldSize=100000 into account and load only 100000 paired reads ?

bf<- Rsamtools::BamFile(bamFile, yieldSize=100000, 
  asMates=TRUE)

scParam=Rsamtools::ScanBamParam(
  what=Rsamtools::scanBamWhat()[c(1,
    3,5,8,13,9, 10, 6, 4, 14, 15)], 
  flag=Rsamtools::scanBamFlag(isPaired=TRUE,
    isDuplicate=NA))

yld<- GenomicAlignments::readGAlignmentPairs(bf, 
  param=scParam)

Cheers,

Ali

Symbol not found: _bcf_float_missing

Hi,

Installation of Rsamtools from source (2.6.0) failed with a following error;

Error: package or namespace load failed for ‘Rsamtools’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/Users/ahoji/Documents/R_lib_4/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so':
  dlopen(/Users/ahoji/Documents/R_lib_4/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so, 6): Symbol not found: _bcf_float_missing
  Referenced from: /Users/ahoji/Documents/R_lib_4/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so
  Expected in: flat namespace
 in /Users/ahoji/Documents/R_lib_4/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so
Error: loading failed
Execution halted

I don't how to get a symbol, _bcf_float_missing found.

Also, I tried a binary version (2.6.0) and it did not load with a following error;

Error: package or namespace load failed for ‘Rsamtools’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/Users/ahoji/Documents/R_lib_4/Rsamtools/libs/Rsamtools.so':
  dlopen(/Users/ahoji/Documents/R_lib_4/Rsamtools/libs/Rsamtools.so, 6): Symbol not found: _lzma_code
  Referenced from: /Users/ahoji/Documents/R_lib_4/Rsamtools/libs/Rsamtools.so
  Expected in: /usr/local/Cellar/r/4.0.5/lib/R/lib/libR.dylib
 in /Users/Documents/R_lib_4/Rsamtools/libs/Rsamtools.so

This time it is a _lzma_code.

I'd appreciate any pointers to address this issue.

~ sw_vers
ProductName:	Mac OS X
ProductVersion:	10.14.6
BuildVersion:	18G8022
clang -v
clang version 12.0.0
Target: x86_64-apple-darwin18.7.0
Thread model: posix
InstalledDir: /usr/local/opt/llvm/bin
> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-apple-darwin18.7.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /usr/local/Cellar/r/4.0.5/lib/R/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] Biostrings_2.58.0           XVector_0.30.0              GenomicRanges_1.42.0       
 [4] GenomeInfoDb_1.26.7         IRanges_2.24.1              S4Vectors_0.28.1           
 [7] BiocGenerics_0.36.1         BiocManager_1.30.12         remotes_2.3.0              
[10] microbiomeMarker_0.0.1.9000 phyloseq_1.34.0             edgeR_3.32.1               
[13] limma_3.46.0               

loaded via a namespace (and not attached):
  [1] circlize_0.4.13.1001        plyr_1.8.6                  igraph_1.2.6               
  [4] lazyeval_0.2.2              splines_4.0.5               BiocParallel_1.24.1        
  [7] ggplot2_3.3.3.9000          digest_0.6.27               foreach_1.5.1              
 [10] fansi_0.4.2                 magrittr_2.0.1              memoise_2.0.0              
 [13] cluster_2.1.2               doParallel_1.0.16           ComplexHeatmap_2.7.10.9002 
 [16] annotate_1.68.0             matrixStats_0.58.0          metagenomeSeq_1.32.0       
 [19] prettyunits_1.1.1           colorspace_2.0-0            blob_1.2.1                 
 [22] rbibutils_2.1               xfun_0.22                   dplyr_1.0.5                
 [25] callr_3.7.0                 crayon_1.4.1                RCurl_1.98-1.3             
 [28] jsonlite_1.7.2              genefilter_1.72.1           survival_3.2-11            
 [31] iterators_1.0.13            ape_5.5                     glue_1.4.2                 
 [34] gtable_0.3.0                zlibbioc_1.36.0             GetoptLong_1.0.5           
 [37] DelayedArray_0.16.3         pkgbuild_1.2.0              Rhdf5lib_1.12.1            
 [40] shape_1.4.5                 scales_1.1.1                DBI_1.1.1                  
 [43] Rcpp_1.0.6                  xtable_1.8-4                progress_1.2.2             
 [46] clue_0.3-59                 tidytree_0.3.3              bit_4.0.4                  
 [49] glmnet_4.1-1                httr_1.4.2                  gplots_3.1.1               
 [52] RColorBrewer_1.1-2          ellipsis_0.3.1              pkgconfig_2.0.3            
 [55] XML_3.99-0.6                locfit_1.5-9.4              utf8_1.2.1                 
 [58] tidyselect_1.1.0            rlang_0.4.10                reshape2_1.4.4             
 [61] AnnotationDbi_1.52.0        munsell_0.5.0               tools_4.0.5                
 [64] cachem_1.0.4                cli_2.5.0                   generics_0.1.0             
 [67] RSQLite_2.2.7               ade4_1.7-16                 biomformat_1.18.0          
 [70] stringr_1.4.0               fastmap_1.1.0               yaml_2.2.1                 
 [73] ggtree_2.4.1                processx_3.5.1              bit64_4.0.5                
 [76] caTools_1.18.2              purrr_0.3.4                 ANCOMBC_1.0.5              
 [79] nlme_3.1-152                aplot_0.0.6                 compiler_4.0.5             
 [82] rstudioapi_0.13             curl_4.3                    png_0.1-7                  
 [85] treeio_1.14.3               tibble_3.1.1                geneplotter_1.68.0         
 [88] stringi_1.5.3               ps_1.6.0                    lattice_0.20-41            
 [91] Matrix_1.3-2                nloptr_1.2.2.2              vegan_2.5-7                
 [94] microbiome_1.13.9           permute_0.9-5               multtest_2.46.0            
 [97] vctrs_0.3.7                 pillar_1.6.0                lifecycle_1.0.0            
[100] rhdf5filters_1.2.0          Rdpack_2.1.1                GlobalOptions_0.1.2        
[103] data.table_1.14.1           bitops_1.0-7                patchwork_1.1.1            
[106] R6_2.5.0                    KernSmooth_2.23-18          codetools_0.2-18           
[109] MASS_7.3-53.1               gtools_3.8.2                assertthat_0.2.1           
[112] Wrench_1.8.0                rhdf5_2.34.0                SummarizedExperiment_1.20.0
[115] rprojroot_2.0.2             DESeq2_1.30.1               rjson_0.2.20               
[118] withr_2.4.2                 GenomeInfoDbData_1.2.4      mgcv_1.8-35                
[121] hms_1.0.0                   grid_4.0.5                  tidyr_1.1.3                
[124] rvcheck_0.1.8               MatrixGenerics_1.2.1        Cairo_1.5-12.2             
[127] Rtsne_0.15                  Biobase_2.50.0              tinytex_0.31    



Rsamtools::seqinfo() with corrupt BamFile messes up R session

I'm developing a package that uses seqinfo() on a BamFileList (but I've narrowed the problem down to BamFile), and apparently it messes up the R session so that it cannot make any more system calls.

To reproduce:

bamInfo <- Rsamtools::seqinfo(Rsamtools::BamFile('corrupt.bam'))
Sys.which('hello-there')  # breaks here

Output:

error: cannot popen '/usr/bin/which 'hello-there' 2>/dev/null', probable reason 'Cannot allocate memory'

I cannot even show you sessionInfo after this has happened:

> sessionInfo()
Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : 
  cannot popen '/usr/bin/which 'uname' 2>/dev/null', probable reason 'Cannot allocate memory'

This is definitely not a RAM problem--the system has got plenty. Strangely enough, the function seems to work still: I still get the seqinfo I need. I'm not sure exactly what's wrong with the one BAM file I've been using, but this hasn't happened with any others. I can get you the one that's been causing this problem if you'd like.

Rsamtools.so: undefined symbol: hts_open_format

I’m having the very same problem as the now closed issue #35, also when installing using BiocManager:

> library(BiocManager); BiocManager::install("Rsamtools")
Bioconductor version 3.15 (BiocManager 1.30.17), R 4.2.0 (2022-04-22)
Installing package(s) 'Rsamtools'
trying URL 'https://bioconductor.org/packages/3.15/bioc/src/contrib/Rsamtools_2.12.0.tar.gz'
Content type 'application/x-gzip' length 2870358 bytes (2.7 MB)
==================================================
downloaded 2.7 MB

* installing *source* package ‘Rsamtools’ ...
** using staged installation
** libs
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c Biostrings_stubs.c -o Biostrings_stubs.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c COMPAT_bcf_hdr_read.c -o COMPAT_bcf_hdr_read.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c IRanges_stubs.c -o IRanges_stubs.o
g++ -std=gnu++14 -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -flto=auto  -c PileupBuffer.cpp -o PileupBuffer.o
g++ -std=gnu++14 -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -flto=auto  -c PosCacheColl.cpp -o PosCacheColl.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c R_init_Rsamtools.c -o R_init_Rsamtools.o
g++ -std=gnu++14 -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -flto=auto  -c ResultManager.cpp -o ResultManager.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c S4Vectors_stubs.c -o S4Vectors_stubs.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c XVector_stubs.c -o XVector_stubs.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c as_bam.c -o as_bam.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c bam.c -o bam.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c bam_data.c -o bam_data.o
g++ -std=gnu++14 -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -flto=auto  -c bam_mate_iter.cpp -o bam_mate_iter.o
In file included from /usr/lib/R/library/Rhtslib/include/sam.h:29,
                 from Template.h:9,
                 from BamIterator.h:10,
                 from BamRangeIterator.h:7,
                 from bam_mate_iter.cpp:2:
BamRangeIterator.h: In member function ‘virtual void BamRangeIterator::finalize_inprogress(bamFile)’:
/usr/lib/R/library/Rhtslib/include/bam.h:57:41: warning: ignoring return value of ‘int64_t bgzf_seek(BGZF*, int64_t, int)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   57 | #define bam_seek(fp, pos, dir) bgzf_seek(fp, pos, dir)
      |                                ~~~~~~~~~^~~~~~~~~~~~~~
BamRangeIterator.h:138:16: note: in expansion of macro ‘bam_seek’
  138 |         (void) bam_seek(bfile, pos, SEEK_SET);
      |                ^~~~~~~~
BamIterator.h: In constructor ‘BamIterator::BamIterator(bamFile, const bam_index_t*)’:
/usr/lib/R/library/Rhtslib/include/bam.h:57:41: warning: ignoring return value of ‘int64_t bgzf_seek(BGZF*, int64_t, int)’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   57 | #define bam_seek(fp, pos, dir) bgzf_seek(fp, pos, dir)
      |                                ~~~~~~~~~^~~~~~~~~~~~~~
BamIterator.h:87:16: note: in expansion of macro ‘bam_seek’
   87 |         (void) bam_seek(bfile, 0, 0);
      |                ^~~~~~~~
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c bam_plbuf.c -o bam_plbuf.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c bam_sort.c -o bam_sort.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c bambuffer.c -o bambuffer.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c bamfile.c -o bamfile.o
bamfile.c: In function ‘bamfile_isincomplete’:
bamfile.c:168:20: warning: ignoring return value of ‘bgzf_seek’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
  168 |             (void) bgzf_seek(bfile->file->x.bam, offset, SEEK_SET);
      |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c bcffile.c -o bcffile.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c encode.c -o encode.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c fafile.c -o fafile.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c idxstats.c -o idxstats.o
In file included from /usr/lib/R/library/Rhtslib/include/sam.h:29,
                 from bamfile.h:5,
                 from idxstats.c:1:
idxstats.c: In function ‘idxstats_bamfile’:
/usr/lib/R/library/Rhtslib/include/bam.h:57:32: warning: ignoring return value of ‘bgzf_seek’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   57 | #define bam_seek(fp, pos, dir) bgzf_seek(fp, pos, dir)
      |                                ^~~~~~~~~~~~~~~~~~~~~~~
idxstats.c:20:12: note: in expansion of macro ‘bam_seek’
   20 |     (void) bam_seek(fp, 0, 0);
      |            ^~~~~~~~
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c io_sam.c -o io_sam.o
In file included from /usr/lib/R/library/Rhtslib/include/sam.h:29,
                 from io_sam.c:2:
io_sam.c: In function ‘_scan_bam_all’:
/usr/lib/R/library/Rhtslib/include/bam.h:57:32: warning: ignoring return value of ‘bgzf_seek’ declared with attribute ‘warn_unused_result’ [-Wunused-result]
   57 | #define bam_seek(fp, pos, dir) bgzf_seek(fp, pos, dir)
      |                                ^~~~~~~~~~~~~~~~~~~~~~~
io_sam.c:304:12: note: in expansion of macro ‘bam_seek’
  304 |     (void) bam_seek(bfile->file->x.bam, bfile->pos0, SEEK_SET);
      |            ^~~~~~~~
g++ -std=gnu++14 -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -flto=auto  -c pbuffer_wrapper.cpp -o pbuffer_wrapper.o
g++ -std=gnu++14 -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -Wp,-D_GLIBCXX_ASSERTIONS -flto=auto  -c pileup.cpp -o pileup.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c pileupbam.c -o pileupbam.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c sam.c -o sam.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c sam_opts.c -o sam_opts.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c sam_utils.c -o sam_utils.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c samtools_patch.c -o samtools_patch.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c scan_bam_data.c -o scan_bam_data.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c tabixfile.c -o tabixfile.o
tabixfile.c: In function ‘index_tabix’:
tabixfile.c:190:5: warning: ‘bgzf_is_bgzf’ is deprecated: Use bgzf_compression() or hts_detect_format() instead [-Wdeprecated-declarations]
  190 |     if (bgzf_is_bgzf(fn) != 1)
      |     ^~
In file included from tabixfile.c:3:
/usr/lib/R/library/Rhtslib/include/htslib/bgzf.h:243:9: note: declared here
  243 |     int bgzf_is_bgzf(const char *fn) HTS_DEPRECATED("Use bgzf_compression() or hts_detect_format() instead");
      |         ^~~~~~~~~~~~
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c tagfilter.c -o tagfilter.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c utilities.c -o utilities.o
gcc -I"/usr/include/R/" -DNDEBUG -D_FILE_OFFSET_BITS=64 -I'/usr/lib/R/library/Rhtslib/include' -I'/usr/lib/R/library/S4Vectors/include' -I'/usr/lib/R/library/IRanges/include' -I'/usr/lib/R/library/XVector/include' -I'/usr/lib/R/library/Biostrings/include' -I/usr/local/include   -fpic  -march=x86-64 -mtune=generic -O2 -pipe -fno-plt -fexceptions         -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security         -fstack-clash-protection -fcf-protection -flto=auto  -c zip_compression.c -o zip_compression.o
g++ -std=gnu++14 -shared -L/usr/lib64/R/lib -Wl,-O1,--sort-common,--as-needed,-z,relro,-z,now -flto=auto -o Rsamtools.so Biostrings_stubs.o COMPAT_bcf_hdr_read.o IRanges_stubs.o PileupBuffer.o PosCacheColl.o R_init_Rsamtools.o ResultManager.o S4Vectors_stubs.o XVector_stubs.o as_bam.o bam.o bam_data.o bam_mate_iter.o bam_plbuf.o bam_sort.o bambuffer.o bamfile.o bcffile.o encode.o fafile.o idxstats.o io_sam.o pbuffer_wrapper.o pileup.o pileupbam.o sam.o sam_opts.o sam_utils.o samtools_patch.o scan_bam_data.o tabixfile.o tagfilter.o utilities.o zip_compression.o /usr/lib/R/library/Rhtslib/usrlib/libhts.a -lcurl -L/usr/lib64/R/lib -lR
installing to /home/christoph/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-Rsamtools/00new/Rsamtools/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘Rsamtools’ in dyn.load(file, DLLpath = DLLpath, ...):
 unable to load shared object '/home/christoph/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so':
  /home/christoph/R/x86_64-pc-linux-gnu-library/4.2/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so: undefined symbol: hts_open_format
Error: loading failed
Execution halted
ERROR: loading failed
* removing ‘/home/christoph/R/x86_64-pc-linux-gnu-library/4.2/Rsamtools’

The downloaded source packages are in
	‘/tmp/RtmpmeoPUu/downloaded_packages’
Installation paths not writeable, unable to update packages
  path: /usr/lib/R/library
  packages:
    checkmate, GenomeInfoDbData, hwriter, IRanges, MASS
Warning message:
In install.packages(...) :
  installation of package ‘Rsamtools’ had non-zero exit status
> 

It seems that the linker ‘forgets’ about htslib somehow:

> ldd Rsamtools.so
	linux-vdso.so.1 (0x00007ffdd95a2000)
	libR.so => /usr/lib/R/lib/libR.so (0x00007f29e5abf000)
	libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x00007f29e5899000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x00007f29e587e000)
	libc.so.6 => /usr/lib/libc.so.6 (0x00007f29e5674000)
	libblas.so.3 => /usr/lib/libblas.so.3 (0x00007f29e561d000)
	libm.so.6 => /usr/lib/libm.so.6 (0x00007f29e5535000)
	libreadline.so.8 => /usr/lib/libreadline.so.8 (0x00007f29e54dd000)
	libpcre2-8.so.0 => /usr/lib/libpcre2-8.so.0 (0x00007f29e5444000)
	liblzma.so.5 => /usr/lib/liblzma.so.5 (0x00007f29e541a000)
	libbz2.so.1.0 => /usr/lib/libbz2.so.1.0 (0x00007f29e5407000)
	libz.so.1 => /usr/lib/libz.so.1 (0x00007f29e53ed000)
	libtirpc.so.3 => /usr/lib/libtirpc.so.3 (0x00007f29e53be000)
	libicuuc.so.71 => /usr/lib/libicuuc.so.71 (0x00007f29e51bd000)
	libicui18n.so.71 => /usr/lib/libicui18n.so.71 (0x00007f29e4e8b000)
	libgomp.so.1 => /usr/lib/libgomp.so.1 (0x00007f29e4e46000)
	/usr/lib64/ld-linux-x86-64.so.2 (0x00007f29e6007000)
	libgfortran.so.5 => /usr/lib/libgfortran.so.5 (0x00007f29e4b8b000)
	libncursesw.so.6 => /usr/lib/libncursesw.so.6 (0x00007f29e4b17000)
	libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f29e4b10000)
	libgssapi_krb5.so.2 => /usr/lib/libgssapi_krb5.so.2 (0x00007f29e4abb000)
	libicudata.so.71 => /usr/lib/libicudata.so.71 (0x00007f29e2db6000)
	libquadmath.so.0 => /usr/lib/../lib/libquadmath.so.0 (0x00007f29e2d6b000)
	libkrb5.so.3 => /usr/lib/libkrb5.so.3 (0x00007f29e2c92000)
	libk5crypto.so.3 => /usr/lib/libk5crypto.so.3 (0x00007f29e2c60000)
	libcom_err.so.2 => /usr/lib/libcom_err.so.2 (0x00007f29e2c5a000)
	libkrb5support.so.0 => /usr/lib/libkrb5support.so.0 (0x00007f29e2c4b000)
	libkeyutils.so.1 => /usr/lib/libkeyutils.so.1 (0x00007f29e2c44000)
	libresolv.so.2 => /usr/lib/libresolv.so.2 (0x00007f29e2c30000)

I’m also on Arch Linux, with R 4.2.0, Rhtslib 1.28.0, and htslib 1.15.1.

Originally posted by @christophfink in #35 (comment)

samtools markdup in Rsamtools

I'd like to use Rsamtools to remove duplicate reads from a bam file. I'm looking for a solution similar to samtools markdup -r -s in.bam out.bam. Can anyone tell me how to do this in R, preferably with Rsamtools.

thanks

using multiple threads for bam sorting, merging, and indexing

Is your feature request related to a problem? Please describe.
No, not a problem. However, samtools supports using multiple threads for some operations and as far as i can tell Rsamtools doesn't provide this as an option.

Describe the solution you'd like
Updating the bam sorting, indexing, and merging Rsamtools code to call newer samtools/htslib functions that provide a nthreads option. Exposing other new options available (such as sorting a BAM by tag value) could also be useful.

e.g.
for sorting use bam_sort_core_ext rather than bam_sort_core
for merging use bam_merge_core2 rather than bam_merge_core
for indexing call sam_index_build3 rather than bam_index_build

The sorting, merging, and indexing operations are mostly external calls to htslib/samtools code so they seem to be the most straightforward places to start adding updated options.

Describe alternatives you've considered
Using samtools at the command-line instead works fine if these options are needed.

Additional context
I could work on this and submit a PR if there is interest in these features.

What pileup options to use for nanopore sequencing data?

Questions:
Is there any way to index which reads contribute to pileup count data?
What options should we use to run pileup on Nanopore sequencing data (particularly for quality score option)?
Are there any other filters that pileup implements by default (other than minimum base quality [13], minimum mapping quality [0], and max depth [8000])?

Description:
We have previously published data using Rsamtools pileup to generate read counts overlapping mutation sites.
We are now using a tool called sam2tsv to index which read IDs belong to which counts. However when I count up the number of reads to the mutation in the sam2tsv data, the counts are not matching up with pileup counts despite filtering sam2tsv data based on pileup default filters and additional filters I applied in the command.

The quality scores in Nanopore do not fully match the ASCII_BASE 33 and 64 scores.
Quality score table: https://www.drive5.com/usearch/manual/quality_score.html
Our data quality scores partially follow ASCII 33 but you can see in the table that the score stops at "K = 42" being the highest score. However, our data continues all the way down to Z, with 57 being the highest score.

What options should we use in pileup to take into account Nanopore data?

Commands:

pileup( bf, a2g.snp,
                   scanBamParam=ScanBamParam( flag = scanBamFlag( isDuplicate=F ), 
                                              which = a2g.snp ),
                   pileupParam=PileupParam( distinguish_strands=F) )
pileup( bf, a2g.snp,
                   scanBamParam=ScanBamParam( flag = scanBamFlag( isDuplicate=F ),
                                              which = a2g.snp ),
                   pileupParam=PileupParam( distinguish_strands=F),
                   phred2ASCIIOffset(scheme= c("Sanger")))

The second command was based on reading in a forum that the default is ASCII 64, so I tried to change it to 33 by writing phred2ASCIIOffset(scheme= c("Sanger")) even though Nanopore only partially matches ASCII 33. Both commands seem to give the same output.

My expertise is not in bioinformatics, so your feedback for how to use pileup on Nanopore data would be greatly appreciated.

Bug in windows version of Rsamtools Fasta sequence retrieval

I discovered a bug in the windows version of Rsamtools, which does not occur in the same version on Ubuntu 18.04 or CentOS 7. It may be related to #5 and #12, but the examples I have might help with solving the issue.

I'm trying to extract sequences from a large reference database of virus references found here:
ftp://ftp.ncbi.nlm.nih.gov/genbank/gbvrl*.seq.gz (about 4.2GB in total) (my current database may differ a bit from the most recent database though, let me know if you would like to have the exact same database).

I'm using these same commands on the 3 systems, not all cases go wrong, but I have isolated a case in idx nr 2621935 which gives different results on windows as compared to Ubuntu and CentOS.

> gbvrl_fa <- open(FaFile("gbvrl.nt.fasta"))
> idx <- scanFaIndex(gbvrl_fa)
> sequence <- scanFa(gbvrl_fa, idx[2621935])

On windows I get this result:

  A DNAStringSet instance of length 1
    width seq                                                                                   names               
[1]  4979 CATGGGTATGGCATCAGTCCTAGATAAAGGGACTGGCAAGT...TTTAACTGTCGTATCTCTTGTCTGTGGTATACTTAGTCTAG EU851411.1

On the other two platforms I get this result (which is the correct sequence, the sequence on windows is incorrect):

A DNAStringSet instance of length 1
    width seq                                                                                   names
[1]  4979 TTGAACAAGTAACCAGTCGTAAG...CTTACGACTGGTTACTTGTTCAA EU851411.1

The accession numbers match, but the sequences don't...

A second hint is the last sequence in the index, which the 2 non-windows platforms retrieve just fine, but the windows version gives an error:

Windows:

> scanFa(gbvrl_fa, idx[length(idx)])
Error in value[[3L]](cond) : 
   record 1 (FN398484.1:1-903) contains invalid DNA letters
  file: gbvrl.nt.fasta

Other two:
image

The last hint is an error claiming that the reference is not present in the current index, which happens with some sequences:

Windows:

> scanFa(gbvrl_fa, idx[1400001])
Error in value[[3L]](cond) :  record 1 (KY058615.1:1-1166) failed
  file: gbvrl.nt.fasta

Other two:
image

Sequences in the index before around 1,300,000 seem to succeed at retrieving the correct sequence and sequences after around 1,400,000 seem to fail, I don't know if that makes sense.

I've checked the Rsamtools and dependencies's versions and they match. Also the md5sum of all the fasta and fasta.fai files match.

I hope that these examples shine some light on the issues.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] shinydashboard_0.7.1  rbokeh_0.5.0          shinyWidgets_0.4.8    shinycssloaders_0.2.0 stringr_1.4.0        
 [6] ggplot2_3.2.1         data.table_1.12.2     shiny_1.3.2           Rsamtools_2.0.3       Biostrings_2.52.0    
[11] XVector_0.24.0        GenomicRanges_1.36.1  GenomeInfoDb_1.20.0   IRanges_2.18.3        S4Vectors_0.22.0     
[16] BiocGenerics_0.30.0  

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.2             lattice_0.20-38        assertthat_0.2.1       digest_0.6.20          mime_0.7              
 [6] R6_2.4.0               evaluate_0.14          httr_1.4.1             pillar_1.4.2           zlibbioc_1.30.0       
[11] rlang_0.4.0            lazyeval_0.2.2         rstudioapi_0.10        hexbin_1.27.3          DT_0.9                
[16] rmarkdown_1.16         BiocParallel_1.18.1    htmlwidgets_1.5.1      RCurl_1.95-4.12        munsell_0.5.0         
[21] compiler_3.6.1         httpuv_1.5.1           xfun_0.10              gistr_0.4.2            pkgconfig_2.0.3       
[26] htmltools_0.3.6        tidyselect_0.2.5       tibble_2.1.3           GenomeInfoDbData_1.2.1 codetools_0.2-16      
[31] crayon_1.3.4           dplyr_0.8.3            withr_2.1.2            later_0.8.0            bitops_1.0-6          
[36] grid_3.6.1             jsonlite_1.6           xtable_1.8-4           gtable_0.3.0           magrittr_1.5          
[41] scales_1.0.0           stringi_1.4.3          pryr_0.1.4             promises_1.0.1         tools_3.6.1           
[46] glue_1.3.1             purrr_0.3.2            crosstalk_1.0.0        maps_3.3.0             yaml_2.2.0            
[51] colorspace_1.4-1       BiocManager_1.30.8     knitr_1.25

CentOS 7
image

Ubuntu 18.04
image

Installation Error

Describe the bug
When the lazy loading stage is reached it crashes:

sh: line 1: 76963 Illegal instruction     (core dumped) R_TESTS= '/nfs/nas22/fs2201/biol_micro_unix_modules/modules/software/R/4.2.0-foss-2020b/lib64/R/bin/R' --no-save --no-restore --no-echo 2>&1 < '/tmp/RtmphinURs/file12a391a5471f1'

 *** caught illegal operation ***
address 0x7f4211cafcd0, cause 'illegal operand'

To Reproduce
BiocManager::install("Rsamtools")

sessionInfo

Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS/LAPACK: /nfs/nas22/fs2201/biol_micro_unix_modules/modules/software/OpenBLAS/0.3.12-GCC-10.2.0/lib/libopenblas_haswellp-r0.3.12.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] BiocManager_1.30.19 compiler_4.2.0      tools_4.2.0

CRAM support - why not?

It appears CRAM is not supported by Rsamtools. I wonder why is that? Samtools support CRAM natively, so I would naively expect this should be a trivial task, no?

This issue came up from other programs that rely on Rsamtools, please see here vplagnol/ExomeDepth#17

installing error

here is the error message:

installing to /home/yu_liu/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rsamtools/00new/Rsamtools/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
Error: package or namespace load failed for ‘Rsamtools’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/home/yu_liu/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so':
/home/yu_liu/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rsamtools/00new/Rsamtools/libs/Rsamtools.so: undefined symbol: tbx_conf_gff
Error: loading failed
Execution halted
ERROR: loading failed

  • removing ‘/home/yu_liu/R/x86_64-pc-linux-gnu-library/4.0/Rsamtools’
    The downloaded source packages are in
    ‘/scratch/tmp/RtmpJyaMTr/downloaded_packages’
    Installation paths not writeable, unable to update packages
    path: /usr/lib/R/library
    packages:
    boot, class, cluster, codetools, foreign, KernSmooth, lattice, MASS,
    Matrix, mgcv, nlme, nnet, spatial, survival
    path: /usr/lib/R/site-library

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=C.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] Rhtslib_1.22.0 devtools_2.3.1 usethis_1.6.1

loaded via a namespace (and not attached):
[1] magrittr_2.0.1 zlibbioc_1.36.0 pkgload_1.1.0
[4] R6_2.5.1 rlang_0.4.12 tools_4.0.2
[7] pkgbuild_1.0.8 sessioninfo_1.1.1 cli_3.1.0
[10] withr_2.4.2 ellipsis_0.3.2 remotes_2.4.1
[13] digest_0.6.28 assertthat_0.2.1 rprojroot_2.0.2
[16] crayon_1.4.2 processx_3.4.3 BiocManager_1.30.16
[19] callr_3.4.3 fs_1.5.0 ps_1.3.3
[22] testthat_2.3.2 curl_4.3.2 memoise_1.1.0
[25] glue_1.4.2 compiler_4.0.2 desc_1.2.0
[28] prettyunits_1.1.1

Here is output from ldd:
ldd Rsamtools.so
linux-vdso.so.1 (0x00007fffe8fc0000)
libR.so => /usr/lib/libR.so (0x00007f162daba000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f162d8cc000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f162d8b1000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f162d6c0000)
libblas.so.3 => /usr/lib/x86_64-linux-gnu/libblas.so.3 (0x00007f162d465000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f162d316000)
libreadline.so.8 => /lib/x86_64-linux-gnu/libreadline.so.8 (0x00007f162d2c4000)
libpcre2-8.so.0 => /usr/lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007f162d23f000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f162d019000)
libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f162ce09000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f162cbec000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f162cbe6000)
libicuuc.so.67 => /usr/lib/x86_64-linux-gnu/libicuuc.so.67 (0x00007f162c9f8000)
libicui18n.so.67 => /usr/lib/x86_64-linux-gnu/libicui18n.so.67 (0x00007f162c6e6000)
libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f162c6ac000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f162c689000)
/lib64/ld-linux-x86-64.so.2 (0x00007f162dfcd000)
libopenblas.so.0 => /usr/lib/x86_64-linux-gnu/libopenblas.so.0 (0x00007f162a3e3000)
libtinfo.so.6 => /lib/x86_64-linux-gnu/libtinfo.so.6 (0x00007f162a3b1000)
libicudata.so.67 => /usr/lib/x86_64-linux-gnu/libicudata.so.67 (0x00007f1628898000)
libgfortran.so.4 => /usr/lib/x86_64-linux-gnu/libgfortran.so.4 (0x00007f16284b9000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f162846f000)

ERROR in scanBcfHeader [Reported to [email protected]]

Initially reported to [email protected]:

Dear maintener,

I am sorry to bother you, but I have some difficulty with Variantannotation package.
With the new version (corresponding to R6.0 and later) I have this message
Error in scanBcfHeader(bf) : no 'header' line "#CHROM POS ID..."?
with the function  readVcf(), but the same VCF did work with R5.1

I don't know if you can help me, but I will be very gratefull if you can.
Thanks a lot.

Christelle

See attached file. (note I couldn't directly attach a .vcf for I added .txt extension to I could post the file included in the email)
inputVCF.vcf.txt

Loading bam file in chunks

Is there a method for loading a specific reproducible chunk of a BAM/CRAM file?
This would be useful for very large BAM/CRAM files to avoid loading it all into memory, and in separate processes to load specific chunks, from 1 ... X, as the user defines.

scanbam, with many reginos in which, is very slow

Hello,
I have a target of 10,000 regions (10,000 different contigs, each with a different range), and scanBam with which is very slow. It basically never finishes in a reasonable time.

Is there inefficiency in scanBam that could be improved? Or is it at the limit of efficiency? Is there another way to speed it up?

Thanks.

Support for CSI indexes

Any chance Rsamtools will support CSI indexes anytime soon?

We have a genome with multigigabase size chromosomes and many R packages cant support it ultimately because they rely on Rsamtools for index handling.

min_nucleotide_depth parameter of PileupParam doesn't take 0

Hello

If a user wants all positions output even if a reference position has zero coverage, the min_nucleotide_depth parameter=0 does not work to output the position with zero coverage.

p_param <- PileupParam(min_nucleotide_depth=0L,min_minor_allele_depth=0L,distinguish_strands=TRUE,min_mapq=10L,min_base_quality =10L,max_depth=200000L,
distinguish_nucleotides=TRUE, ignore_query_Ns=FALSE,include_deletions=TRUE, include_insertions=TRUE)

.bai files are not found when "bam" is part of the file path

Hello,

the bam_index_load_local function from bam_index.c fails to find index files named my_file.bai if the substring "bam" is present anywhere in the path. For example, the following file pair fails to be recognized:

"/home/bioinfo5/Desktop/UPM-CBGP/analysis/201810_Brapa_ChIP_FL/bams/3_filtdupes_FL_C1.bai"
"/home/bioinfo5/Desktop/UPM-CBGP/analysis/201810_Brapa_ChIP_FL/bams/3_filtdupes_FL_C1.bam"

This is because when looking for the "bam" suffix, bam_index_load_local uses strstr, and fails if the returned index does not correspond to the last three characters. This could be fixed by looping while s is different than 0 when looking for the suffix (something like this, untested):

char *s = strstr(fn, "bam");
while(s) {
    if (s == fn + strlen(fn) - 3) {
        strcpy(fnidx, fn);
	fnidx[strlen(fn)-1] = 'i';
        fp = fopen(fnidx, "rb");
    } else {
        s = strstr(s, "bam");
    }
}

Cheers,
-Eric

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.