Git Product home page Git Product logo

bionode-vcf's Introduction

bionode logo
bionode.io

bionode-vcf

a vcf parser in javascript

npm Travis Coveralls Dependencies npm Gitter

Install

You need to install the latest Node.JS first, please check nodejs.org or do the following:

# Ubuntu
sudo apt-get install npm
# Mac
brew install node
# Both
npm install -g n
n stable

To use bionode-vcf as a command line tool, you can install it globally with -g.

npm install bionode-vcf -g

Or, if you want to use it as a JavaScript library, you need to install it in your local project folder inside the node_modules directory by doing the same command without -g.

npm i bionode-vcf # 'i' can be used as shortcut to 'install'

Usage

vcf.read

  • vcf.read takes params: path
  • The supported filetypes are vcf, zip and gz.
var vcf = require('bionode-vcf');
vcf.read("/path/sample.vcf");
vcf.on('data', function(feature){
    console.log(feature);
})

vcf.on('end', function(){
    console.log('end of file')
})

vcf.on('error', function(err){
    console.error('it\'s not a vcf', err)
})

vcf.readStream

  • vcf.readStream takes params: stream and extension
  • The supported extension are vcf, zip and gz.
var vcf = require('bionode-vcf');
var fileStream = s3.getObject({
    Bucket: [BUCKETNAME],
    Key: [FILENAME]
}).createReadStream();  // or stream data from any other source

vcf.read(filestream, 'zip'); // default value is `vcf`
vcf.on('data', function(feature){
    console.log(feature);
})

vcf.on('end', function(){
    console.log('end of file')
})

vcf.on('error', function(err){
    console.error('it\'s not a vcf', err)
})

Documentation

VCF format specifications and more information about the fileds can be found at 1000 genomes webpage and samtools github page

Contributing

We welcome all kinds of contributions at all levels of experience, please read the CONTRIBUTING.md to get started!

bionode-vcf's People

Contributors

amblina avatar ayangromano avatar bmpvieira avatar doomedramen avatar rallapag avatar shyamrallapalli avatar stuntspt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bionode-vcf's Issues

is chai an unused dependency?

chai is listed as a devDependency in the package.json, but the only references to it in the repo are commented out. Can this dependency be removed?

Error: number of columns in the file are less than expected in vcf - there are valid vcf files with just 8 parameters

Please see the attached file format of the Build 37 human genome reference file:

image link

It consists of the following parameters:

#CHROM
POS
ID
REF
ALT
QUAL
FILTER
INFO

This is a valid VCF file, downloaded from Download link

Line 76 in /lib/index.js should have the number 9 change 8 instead:

if (info.length < 9) {

Line 82 in /lib/index.js should also change the number 8 to 7:

var formatIds = info[7].split(':')

to avoid this error thrown. Thank you!

A vcf file with multiple individuals gives duplicate row data

For a vcf file containing data for multiple individuals per line, the the vcf on data event returns correct individual names, but the data for each individual is a duplicate of the first individual's data.
To fix this issue, change line 89 in index.js from this:
var formatParts = info[9].split(':')
to this:
var formatParts = info[9 + j].split(':')

The for loop simply needs to loop to each unique individual in each line.

Add support to parse stream data provided by user

Current implementation takes in path (string) and uses fs module to read file and create stream to further parse it.

REQUEST: A method to parse stream data provided by user should also be available, so user can parse vcf data streamed from any other source.

E.g.: Read data from s3 and parse it directly

const fileStream = s3.getObject({
    Bucket: [BUCKETNAME],
    Key: [FILENAME]
}).createReadStream();

vcf.readStream(fileStream)

Also, it should support compressed files along with plain vcf file.

general question

Hi,

I'm looking for a js module that can read and query compressed and indexed vcf files from a remote location (e.g. S3-buckets). I was wondering if this is the scope of this module. If so, would it be possible to provide some documentation about the possiblities and how to use them?

Alternatively, if my question seems out of scope, do you know about a module that can do such thing?
Thanks
M

possible race condition if setup of event listeners is delayed?

I may just not be used to node/javascript idioms, but I think there is a race condition since there is no way to make sure that the event listeners are set up before events start getting triggered.

For instance, something like the following:

var VCF = require('./lib/index')

var allFeatures = []

var v = VCF.read('test/sample.vcf')

setTimeout(function () {
  v.on('data', (f) => allFeatures.push(f))
  v.on('end', () => console.log(allFeatures))
}, 1000)

will not write out any data because the vcf object emits all of its 'data' events before the 'data' listener is setup. But if the delay parameter (1000) is changed to '0', then the 'data' and 'end' listeners are setup in time.

I don't know if this is a problem in the real-world (if the .on('data'... directly follows the .read call, then maybe there is no way for the race to occur?)

Apologies if I just mis-understand the idiom.

Document data callback

The README should document the structure of the feature value provided to the callback in:

vcf.on('data', function (feature) {
   ... // what is feature?
});

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.