Comments (5)
Fixed with #8
from germline.
After some more testing, I noticed bcftools convert
doesn't do much to change the GVCF to VCF and still emits a huge file containing non-ref sites. Even after filtering with bcftools view
some sites are still present and the sites containing actual variants still got the additional <NON-REF>
as ALT nucleotide.
I'll look for a new tool (e.g. gvcftools) that can perform this conversion in an easier way.
from germline.
the extract_variants
tool from gvcftools
isn't able to parse multisample gvcf files (sequencing/gvcftools#9) and it doesn't seem like it will be fixed soon...
from germline.
bcftools merge
does not seem to work with GenotypeGVCFs
every time
08:14:49.261 INFO GenotypeGVCFs - HTSJDK Version: 2.24.1
08:14:49.261 INFO GenotypeGVCFs - Picard Version: 2.27.1
08:14:49.261 INFO GenotypeGVCFs - Built for Spark Version: 2.4.5
08:14:49.261 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
08:14:49.261 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
08:14:49.261 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
08:14:49.261 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
08:14:49.261 INFO GenotypeGVCFs - Deflater: IntelDeflater
08:14:49.261 INFO GenotypeGVCFs - Inflater: IntelInflater
08:14:49.261 INFO GenotypeGVCFs - GCS max retries/reopens: 20
08:14:49.261 INFO GenotypeGVCFs - Requester pays: disabled
08:14:49.261 INFO GenotypeGVCFs - Initializing engine
08:14:49.535 INFO FeatureManager - Using codec VCFCodec to read file file:///vscmnt/gent_kyukon_scratch/_kyukon_scratch_gent/vo/000/gvo00082/vsc44804/nxf.Sq00u7K81e/Proband_12345.vcf.gz
08:14:49.720 INFO GenotypeGVCFs - Done initializing engine
08:14:49.792 INFO ProgressMeter - Starting traversal
08:14:49.793 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute
08:14:49.832 WARN ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location chr1:13613 the annotation AC=[1, 0] was not a numerical value and was ignored
08:14:49.890 WARN InbreedingCoeff - InbreedingCoeff will not be calculated at position chr1:13613 and possibly subsequent; at least 10 samples must have called genotypes
08:14:50.177 INFO GenotypeGVCFs - Shutting down engine
[July 28, 2022 at 8:14:50 AM GMT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.02 minutes.
Runtime.totalMemory()=2097152000
java.lang.IllegalStateException: Number of alleles and number of allele-specific entries do not match. Allele-specific annotations should have an entry for each allele including the reference.
at org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.AS_StrandBiasTest.parseRawDataString(AS_StrandBiasTest.java:158)
at org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.AS_StrandBiasTest.combineRawData(AS_StrandBiasTest.java:118)
at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.combineAnnotations(VariantAnnotatorEngine.java:234)
at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.mergeAttributes(ReferenceConfidenceVariantContextMerger.java:326)
at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:150)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.callRegion(GenotypeGVCFsEngine.java:133)
at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:283)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$0(VariantLocusWalker.java:135)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.base/java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:502)
at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:132)
at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
at org.broadinstitute.hellbender.Main.main(Main.java:289)
from germline.
Comparison of GenotypeGVCFs with bcftools view/convert
=> The GVCFs were merged using CombineGVCFs
CPU hours | tool(s) runtime | tool(s) memory | tool(s) cpus | |
---|---|---|---|---|
GenotypeGVCFs | 12.0 | 0.6m | 1.4GB | 1.5 |
bcftools view & convert | 11.5 | 0.0m + 0.2m = 0.2m | 20MB + 8MB = 28MB | 1.8 + 1.0 = 2.8 |
from germline.
Related Issues (20)
- Investigate non-coding variants annotation
- Make it possible to only get gvcfs
- Output samplesheet by default HOT 1
- Optimize resources for `SAMTOOLS_MERGE` HOT 1
- Improve efficiency of CRAM merging HOT 1
- Remove the need to supply a CRAM file when a GVCF is supplied HOT 1
- Increase default memory in some processes
- Add VarDict calling HOT 5
- Add somatic calling
- Update coverage_fast parameter HOT 1
- Add normalization and decomposing to VCF2DB HOT 4
- Use contigs BED file for genomicsdb HOT 2
- Add downloading of the VEP cache when not given HOT 1
- Create profiles for different analysis HOT 1
- Make Vardict min AF variable per sample HOT 1
- Add `vcf_check` as a safety guard HOT 1
- Save a process, create index within the module HOT 2
- Explore `VariantRecalibrator` and `ApplyVQSR` for better filtering HOT 2
- Add support for bigger families in UPDio HOT 1
- Add `--ped` option
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from germline.