Git Product home page Git Product logo

gds2bgen's Introduction

gds2bgen: Format Conversion from BGEN to GDS

GPLv3 GNU General Public License, GPLv3

Description

This package provides functions for format conversion from bgen files to SeqArray GDS files.

Version

v0.9.3

Package Maintainer

Dr. Xiuwen Zheng ([email protected])

Installation

Requires R (≥ v3.5.0), gdsfmt (≥ v1.20.0), SeqArray (≥ v1.24.0)

  • Installation from Github:
library("devtools")
install_github("zhengxwen/gds2bgen")

The install_github() approach requires that you build from source, i.e. make and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.

Or manually intall the package

git clone https://github.com/zhengxwen/gds2bgen
cd gds2bgen/src
unzip bgen_v1.1.8.zip
cd bgen_v1.1.8
python2 ./waf configure
python2 ./waf
cp build/libbgen.a ..
cp build/3rd_party/zstd-1.1.0/libzstd.a ..
rm -rf build
sleep 1; touch ../libbgen.a
cd ../../..
R CMD INSTALL gds2bgen

Copyright Notice

This package includes the sources of the bgen library (https://enkre.net/cgi-bin/code/bgen/dir?ci=trunk), Boost (the C++ libraries, https://www.boost.org) and Zstandard (https://zstd.net).

Citations for GDS

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.

Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.

Examples

library(gds2bgen)

seqBGEN_Info()  # bgen library version
## "bgen_lib_v1.1.8"

bgen_fn <- system.file("extdata", "example.8bits.bgen", package="gds2bgen")
# or bgen_fn <- "your_bgen_file.bgen"
seqBGEN_Info(bgen_fn)

## File: gds2bgen/extdata/example.8bits.bgen
## # of samples: 500
## # of variants: 199
## Compression method: zlib
## Layout version: v1.2
## Unphased: TRUE
## # of bits: 8
## Ploidy: 2
## sample id: sample_001, sample_002, sample_003, sample_004, ...


# example.8bits.bgen ==> example.gds, using 4 cores
seqBGEN2GDS(bgen_fn, "example.gds",
    storage.option="LZMA_RA",  # compression option, e.g., ZIP_RA for zlib or LZ4_RA for LZ4
    float.type="packed8",      # 8-bit packed real numbers
    geno=FALSE,     # 2-bit integer genotypes, stored in 'genotype/data'
    dosage=TRUE,    # numeric alternative allele dosages, stored in 'annotation/format/DS'
    prob=FALSE,     # numeric genotype probabilities, stored in 'annotation/format/GP'
    parallel=4      # the number of cores
)


# show file structure
library(SeqArray)
(f <- seqOpen("example.gds"))
seqClose(f)

## File: example.gds (137.7K)
## +    [  ] *
## |--+ description   [  ] *
## |--+ sample.id   { Str8 500 LZMA_ra(7.02%), 393B } *
## |--+ variant.id   { Int32 199 LZMA_ra(33.9%), 277B } *
## |--+ position   { Int32 199 LZMA_ra(60.6%), 489B } *
## |--+ chromosome   { Str8 199 LZMA_ra(15.7%), 101B } *
## |--+ allele   { Str8 199 LZMA_ra(11.8%), 101B } *
## |--+ genotype   [  ] *
## |--+ phase   [  ]
## |--+ annotation   [  ]
## |  |--+ id   { Str8 199 LZMA_ra(18.6%), 321B } *
## |  |--+ qual   { Float32 199 LZMA_ra(11.8%), 101B } *
## |  |--+ filter   { Int32 199 LZMA_ra(11.3%), 97B } *
## |  |--+ info   [  ]
## |  \--+ format   [  ]
## |     |--+ DS   [  ] *
## |     |  \--+ data   { PackedReal8U 500x199 LZMA_ra(55.6%), 54.0K } *
## \--+ sample.annotation   [  ]

Also See

seqVCF2GDS() in the SeqArray package, conversion from VCF files to GDS files.

seqBED2GDS() in the SeqArray package, conversion from PLINK BED files to GDS files.

gds2bgen's People

Contributors

crerecombinase avatar zhengxwen avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

crerecombinase

gds2bgen's Issues

rename "variant.id" header

Hello,

Many thanks for the tool!! I have managed to convert my bgen files to gds using the pipeline suggested here. However, the downstream analysis of my project requires to read in the gds data and create a GenotypeData class object (I'm using GWASTools for that). However, I'm getting an error that it can't find a "snp.id" column, so I was wondering if I can manually change the variable.id name. Many apologies if this is not exactly the right place for my question.

Many thanks,
Olga

installation issues

Besides the default installation methods, is there any other way walk around it?

library("devtools")
install_github("zhengxwen/gds2bgen")

does not work on my clster.

I have encountered consistent issues with the installation. As follows

[50/53] Linking build/3rd_party/zstd-1.1.0/libzstd.a
../src/View.cpp: In member function ‘void genfile::bgen::View::setup(const string&)’:
../src/View.cpp:212:53: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if( m_stream->gcount() != m_postheader_data.size() ) {
^
At global scope:
cc1plus: warning: unrecognized command line option "-Wno-c++11-long-long" [enabled by default]

[51/53] Linking build/3rd_party/sqlite3/libsqlite3.a
[52/53] Linking build/db/libdb.a
[53/53] Linking build/libbgen.a
Waf: Leaving directory `/tmp/RtmpEbogUY/R.INSTALL2b5b6584099d8/gds2bgen/src/bgen_v1.1.8/build'
'build' finished successfully (35.483s)
cp -f bgen_v1.1.8/build/libbgen.a .
cp -f bgen_v1.1.8/build/3rd_party/zstd-1.1.0/libzstd.a .
rm -rf bgen_v1.1.8/build
g++ -std=gnu++11 -shared -L/n/helmod/apps/centos7/Core/R_core/4.0.2-fasrc01/lib64/R/lib -L/usr/local/lib64 -o gds2bgen.so R_gds2bgen.o gds2bgen.o libbgen.a libzstd.a -L/n/helmod/apps/centos7/Core/R_core/4.0.2-fasrc01/lib64/R/lib -lR
installing to /n/user/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-gds2bgen/00new/gds2bgen/libs
** R
** inst
** byte-compile and prepare package for lazy loading
Error: (converted from warning) package ‘gdsfmt’ was built under R version 4.0.5
Execution halted
ERROR: lazy loading failed for package ‘gds2bgen’

  • removing ‘/n/user/R/x86_64-pc-linux-gnu-library/4.0/gds2bgen’
    Error: Failed to install 'gds2bgen' from GitHub:
    (converted from warning) installation of package ‘/tmp/RtmpH5dlPs/file17dc1414aa117/gds2bgen_0.9.2.tar.gz’ had non-zero exit status

GDS to BGEN

Hi,

I want to know if there is a way using your tool to go to GDS to bgen, because it seems that your tools does the opposite, bgen to GDS

Thanks

help with gds2bgen

Hello,

Many thanks for creating this tool. I have installed it in R - version 3.6 but I have not managed to make it work.

I was wondering what "extdata" is referring to in your example, as I am running it in the following way:

bgen_fn <- system.file("extdata","myworking.bgen", package="gds2bgen")

but then I'm getting an error:

seqBGEN_Info(bgen_fn)
Error in seqBGEN_Info(bgen_fn) : Can't open the file ''.

Many thanks again
Olga

genotype conversion doesn't work

It appears that while it is possible to obtain dosage information from bgen files using dosage=TRUE, using geno=TRUE doesn't work:

> bgen_fn <- system.file("extdata", "example.8bits.bgen", package="gds2bgen")
> seqBGEN2GDS(bgen_fn,"example.gds",geno=TRUE,dosage=FALSE,prob=FALSE,parallel=4)
...
> si <- seqOpen("example.gds")
> si
Object of class "SeqVarGDSClass"
File: /scratch/t.cri.nknoblauch/intersect_snplist/example.gds (6.8K)
+    [  ] *
|--+ description   [  ] *
|--+ sample.id   { Str8 500 LZMA_ra(7.02%), 393B } *
|--+ variant.id   { Int32 199 LZMA_ra(33.9%), 277B } *
|--+ position   { Int32 199 LZMA_ra(60.6%), 489B } *
|--+ chromosome   { Str8 199 LZMA_ra(15.7%), 101B } *
|--+ allele   { Str8 199 LZMA_ra(11.8%), 101B } *
|--+ genotype   [  ] *
|  |--+ data   { Bit2 2x500x0 LZMA_ra, 18B } *
|  |--+ extra.index   { Int32 3x0 LZMA_ra, 18B } *
|  \--+ extra   { Int16 0 LZMA_ra, 18B }
|--+ phase   [  ]
|  |--+ data   { Bit1 500x0 LZMA_ra, 18B } *
|  |--+ extra.index   { Int32 3x0 LZMA_ra, 18B } *
|  \--+ extra   { Bit1 0 LZMA_ra, 18B }
|--+ annotation   [  ]

Any thoughts as to what's happening here? Am i correct that without genotype information, I will be unable to export to plink/BED format?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.