zhengxwen / gds2bgen Goto Github PK

View Code? Open in Web Editor NEW

3.0 2.0 1.0 40.7 MB

R package for the format conversion from bgen to gds

R 45.65% C 0.47% C++ 53.89%

gds bgen

gds2bgen's Introduction

gds2bgen: Format Conversion from BGEN to GDS

GNU General Public License, GPLv3

Description

This package provides functions for format conversion from bgen files to SeqArray GDS files.

Version

v0.9.3

Package Maintainer

Dr. Xiuwen Zheng ([email protected])

Installation

Requires R (≥ v3.5.0), gdsfmt (≥ v1.20.0), SeqArray (≥ v1.24.0)

Installation from Github:

library("devtools")
install_github("zhengxwen/gds2bgen")

The install_github() approach requires that you build from source, i.e. make and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.

Or manually intall the package

git clone https://github.com/zhengxwen/gds2bgen
cd gds2bgen/src
unzip bgen_v1.1.8.zip
cd bgen_v1.1.8
python2 ./waf configure
python2 ./waf
cp build/libbgen.a ..
cp build/3rd_party/zstd-1.1.0/libzstd.a ..
rm -rf build
sleep 1; touch ../libbgen.a
cd ../../..
R CMD INSTALL gds2bgen

Copyright Notice

This package includes the sources of the bgen library (https://enkre.net/cgi-bin/code/bgen/dir?ci=trunk), Boost (the C++ libraries, https://www.boost.org) and Zstandard (https://zstd.net).

Citations for GDS

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.

Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.

Examples

library(gds2bgen)

seqBGEN_Info()  # bgen library version
## "bgen_lib_v1.1.8"

bgen_fn <- system.file("extdata", "example.8bits.bgen", package="gds2bgen")
# or bgen_fn <- "your_bgen_file.bgen"
seqBGEN_Info(bgen_fn)

## File: gds2bgen/extdata/example.8bits.bgen
## # of samples: 500
## # of variants: 199
## Compression method: zlib
## Layout version: v1.2
## Unphased: TRUE
## # of bits: 8
## Ploidy: 2
## sample id: sample_001, sample_002, sample_003, sample_004, ...


# example.8bits.bgen ==> example.gds, using 4 cores
seqBGEN2GDS(bgen_fn, "example.gds",
    storage.option="LZMA_RA",  # compression option, e.g., ZIP_RA for zlib or LZ4_RA for LZ4
    float.type="packed8",      # 8-bit packed real numbers
    geno=FALSE,     # 2-bit integer genotypes, stored in 'genotype/data'
    dosage=TRUE,    # numeric alternative allele dosages, stored in 'annotation/format/DS'
    prob=FALSE,     # numeric genotype probabilities, stored in 'annotation/format/GP'
    parallel=4      # the number of cores
)


# show file structure
library(SeqArray)
(f <- seqOpen("example.gds"))
seqClose(f)

## File: example.gds (137.7K)
## +    [  ] *
## |--+ description   [  ] *
## |--+ sample.id   { Str8 500 LZMA_ra(7.02%), 393B } *
## |--+ variant.id   { Int32 199 LZMA_ra(33.9%), 277B } *
## |--+ position   { Int32 199 LZMA_ra(60.6%), 489B } *
## |--+ chromosome   { Str8 199 LZMA_ra(15.7%), 101B } *
## |--+ allele   { Str8 199 LZMA_ra(11.8%), 101B } *
## |--+ genotype   [  ] *
## |--+ phase   [  ]
## |--+ annotation   [  ]
## |  |--+ id   { Str8 199 LZMA_ra(18.6%), 321B } *
## |  |--+ qual   { Float32 199 LZMA_ra(11.8%), 101B } *
## |  |--+ filter   { Int32 199 LZMA_ra(11.3%), 97B } *
## |  |--+ info   [  ]
## |  \--+ format   [  ]
## |     |--+ DS   [  ] *
## |     |  \--+ data   { PackedReal8U 500x199 LZMA_ra(55.6%), 54.0K } *
## \--+ sample.annotation   [  ]

Also See

seqVCF2GDS() in the SeqArray package, conversion from VCF files to GDS files.

seqBED2GDS() in the SeqArray package, conversion from PLINK BED files to GDS files.

gds2bgen's People

Contributors

Stargazers

Watchers

Forkers

crerecombinase

gds2bgen's Issues

rename "variant.id" header

Hello,

Many thanks for the tool!! I have managed to convert my bgen files to gds using the pipeline suggested here. However, the downstream analysis of my project requires to read in the gds data and create a GenotypeData class object (I'm using GWASTools for that). However, I'm getting an error that it can't find a "snp.id" column, so I was wondering if I can manually change the variable.id name. Many apologies if this is not exactly the right place for my question.

Many thanks,
Olga

installation issues

Besides the default installation methods, is there any other way walk around it?

library("devtools")
install_github("zhengxwen/gds2bgen")

does not work on my clster.

I have encountered consistent issues with the installation. As follows

[50/53] Linking build/3rd_party/zstd-1.1.0/libzstd.a
../src/View.cpp: In member function ‘void genfile::bgen::View::setup(const string&)’:
../src/View.cpp:212:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if( m_stream->gcount() != m_postheader_data.size() ) {
^
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-c++11-long-long" [enabled by default]

[51/53] Linking build/3rd_party/sqlite3/libsqlite3.a
[52/53] Linking build/db/libdb.a
[53/53] Linking build/libbgen.a
Waf: Leaving directory `/tmp/RtmpEbogUY/R.INSTALL2b5b6584099d8/gds2bgen/src/bgen_v1.1.8/build'
'build' finished successfully (35.483s)
cp -f bgen_v1.1.8/build/libbgen.a .
cp -f bgen_v1.1.8/build/3rd_party/zstd-1.1.0/libzstd.a .
rm -rf bgen_v1.1.8/build
g++ -std=gnu++11 -shared -L/n/helmod/apps/centos7/Core/R_core/4.0.2-fasrc01/lib64/R/lib -L/usr/local/lib64 -o gds2bgen.so R_gds2bgen.o gds2bgen.o libbgen.a libzstd.a -L/n/helmod/apps/centos7/Core/R_core/4.0.2-fasrc01/lib64/R/lib -lR
installing to /n/user/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-gds2bgen/00new/gds2bgen/libs
** R
** inst
** byte-compile and prepare package for lazy loading
Error: (converted from warning) package ‘gdsfmt’ was built under R version 4.0.5
Execution halted
ERROR: lazy loading failed for package ‘gds2bgen’

removing ‘/n/user/R/x86_64-pc-linux-gnu-library/4.0/gds2bgen’
Error: Failed to install 'gds2bgen' from GitHub:
(converted from warning) installation of package ‘/tmp/RtmpH5dlPs/file17dc1414aa117/gds2bgen_0.9.2.tar.gz’ had non-zero exit status

GDS to BGEN

Hi,

I want to know if there is a way using your tool to go to GDS to bgen, because it seems that your tools does the opposite, bgen to GDS

Thanks

help with gds2bgen

Hello,

Many thanks for creating this tool. I have installed it in R - version 3.6 but I have not managed to make it work.

I was wondering what "extdata" is referring to in your example, as I am running it in the following way:

bgen_fn <- system.file("extdata","myworking.bgen", package="gds2bgen")

but then I'm getting an error:

seqBGEN_Info(bgen_fn)
Error in seqBGEN_Info(bgen_fn) : Can't open the file ''.

Many thanks again
Olga

genotype conversion doesn't work

It appears that while it is possible to obtain dosage information from bgen files using dosage=TRUE, using geno=TRUE doesn't work:

> bgen_fn <- system.file("extdata", "example.8bits.bgen", package="gds2bgen")
> seqBGEN2GDS(bgen_fn,"example.gds",geno=TRUE,dosage=FALSE,prob=FALSE,parallel=4)
...
> si <- seqOpen("example.gds")
> si
Object of class "SeqVarGDSClass"
File: /scratch/t.cri.nknoblauch/intersect_snplist/example.gds (6.8K)
+    [  ] *
|--+ description   [  ] *
|--+ sample.id   { Str8 500 LZMA_ra(7.02%), 393B } *
|--+ variant.id   { Int32 199 LZMA_ra(33.9%), 277B } *
|--+ position   { Int32 199 LZMA_ra(60.6%), 489B } *
|--+ chromosome   { Str8 199 LZMA_ra(15.7%), 101B } *
|--+ allele   { Str8 199 LZMA_ra(11.8%), 101B } *
|--+ genotype   [  ] *
|  |--+ data   { Bit2 2x500x0 LZMA_ra, 18B } *
|  |--+ extra.index   { Int32 3x0 LZMA_ra, 18B } *
|  \--+ extra   { Int16 0 LZMA_ra, 18B }
|--+ phase   [  ]
|  |--+ data   { Bit1 500x0 LZMA_ra, 18B } *
|  |--+ extra.index   { Int32 3x0 LZMA_ra, 18B } *
|  \--+ extra   { Bit1 0 LZMA_ra, 18B }
|--+ annotation   [  ]

Any thoughts as to what's happening here? Am i correct that without genotype information, I will be unable to export to plink/BED format?