stanfordbioinformatics / googva Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
When the end position of a genomic region is specified in the INFO field of the VCF file it causes reference loci to be incorrectly blocked by gvcf-mapper-cl.py.
Input:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 400470164
1 1 . N <NON_REF> . . END=10033 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
1 10034 . C <NON_REF> . . END=10035 GT:DP:GQ:MIN_DP:PL 0/0:94:6:91:0,6,90
1 10036 . C <NON_REF> . . END=10036 GT:DP:GQ:MIN_DP:PL 0/0:96:9:96:0,9,135
1 10037 . T <NON_REF> . . END=10037 GT:DP:GQ:MIN_DP:PL 0/0:99:12:99:0,12,180
1 10038 . A <NON_REF> . . END=10038 GT:DP:GQ:MIN_DP:PL 0/0:104:0:104:0,0,2807
1 10039 . A <NON_REF> . . END=10039 GT:DP:GQ:MIN_DP:PL 0/0:106:18:106:0,18,270
1 10040 . C <NON_REF> . . END=10041 GT:DP:GQ:MIN_DP:PL 0/0:111:24:110:0,24,360
1 10042 . C <NON_REF> . . END=10043 GT:DP:GQ:MIN_DP:PL 0/0:117:33:115:0,33,495
1 10044 . A <NON_REF> . . END=10044 GT:DP:GQ:MIN_DP:PL 0/0:120:36:120:0,36,540
Output:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 400470164
1 1 . N <NON_REF> . . END=1 GT:DP:GQ:MIN_DP:PL 0/0:0:0:0:0,0,0
1 10034 . C <NON_REF> . . END=10040 GT:DP:GQ:MIN_DP:PL 0/0:94:6:91:0,6,90
1 10042 . C <NON_REF> . . END=10045 GT:DP:GQ:MIN_DP:PL 0/0:117:33:115:0,33,495
1 10049 . T <NON_REF> . . END=10050 GT:DP:GQ:MIN_DP:PL 0/0:129:45:129:0,45,675
(Adding issue here for the sake of logging)
End position incorrectly listed as 1 in some cases when using gvcf-mapper-cl.py. This throws error when trying to import into Google Genomics Variant Set.
$ gcloud alpha genomics operations describe operations/CJ3k0-yGHBCMn-DDBRiZ-5Hw3vP42kg
done: true
error:
code: 2
message: 'Invalid value for field "[variant.start, variant.end)": negative interval
length, start > end for variant at start: 59118900 referenceName: "19"'
metadata:
'@type': type.googleapis.com/google.genomics.v1.OperationMetadata
clientId: ''
createTime: '2017-01-12T23:21:48Z'
endTime: '2017-01-12T23:40:35.737Z'
events: []
labels: {}
projectId: gbsc-gcp-project-mvp
request:
'@type': type.googleapis.com/google.genomics.v1.ImportVariantsRequest
format: FORMAT_VCF
infoMergeConfig:
FS: MOVE_TO_CALLS
MQ: MOVE_TO_CALLS
MQRankSum: MOVE_TO_CALLS
PL: MOVE_TO_CALLS
QD: MOVE_TO_CALLS
ReadPosRankSum: MOVE_TO_CALLS
VQSLOD: MOVE_TO_CALLS
culprit: MOVE_TO_CALLS
normalizeReferenceNames: false
sourceUris:
- gs://gbsc-gcp-project-mvp-group/test/dockerflow_test/bigquery_import/gvcf_mapper_out.vcf
variantSetId: '8016995100636263441'
name: operations/CJ3k0-yGHBCMn-DDBRiZ-5Hw3vP42kg
$ grep '59118901' gvcf_mapper_out.vcf
14 59118900 . T <NON_REF> . . END=59118901 GT:DP:GQ:MIN_DP:PL 0/0:40:96:40:0,96,1440
19 59118901 . G <NON_REF> . . END=1 GT:DP:GQ:MIN_DP:PL 0/0:14:21:14:0,21,315
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.