Git Product home page Git Product logo

Comments (5)

nvnieuwk avatar nvnieuwk commented on September 2, 2024

Fixed with #8

from germline.

nvnieuwk avatar nvnieuwk commented on September 2, 2024

After some more testing, I noticed bcftools convert doesn't do much to change the GVCF to VCF and still emits a huge file containing non-ref sites. Even after filtering with bcftools view some sites are still present and the sites containing actual variants still got the additional <NON-REF> as ALT nucleotide.

I'll look for a new tool (e.g. gvcftools) that can perform this conversion in an easier way.

from germline.

nvnieuwk avatar nvnieuwk commented on September 2, 2024

the extract_variants tool from gvcftools isn't able to parse multisample gvcf files (sequencing/gvcftools#9) and it doesn't seem like it will be fixed soon...

from germline.

nvnieuwk avatar nvnieuwk commented on September 2, 2024

bcftools merge does not seem to work with GenotypeGVCFs every time

  08:14:49.261 INFO  GenotypeGVCFs - HTSJDK Version: 2.24.1
  08:14:49.261 INFO  GenotypeGVCFs - Picard Version: 2.27.1
  08:14:49.261 INFO  GenotypeGVCFs - Built for Spark Version: 2.4.5
  08:14:49.261 INFO  GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 2
  08:14:49.261 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
  08:14:49.261 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
  08:14:49.261 INFO  GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
  08:14:49.261 INFO  GenotypeGVCFs - Deflater: IntelDeflater
  08:14:49.261 INFO  GenotypeGVCFs - Inflater: IntelInflater
  08:14:49.261 INFO  GenotypeGVCFs - GCS max retries/reopens: 20
  08:14:49.261 INFO  GenotypeGVCFs - Requester pays: disabled
  08:14:49.261 INFO  GenotypeGVCFs - Initializing engine
  08:14:49.535 INFO  FeatureManager - Using codec VCFCodec to read file file:///vscmnt/gent_kyukon_scratch/_kyukon_scratch_gent/vo/000/gvo00082/vsc44804/nxf.Sq00u7K81e/Proband_12345.vcf.gz
  08:14:49.720 INFO  GenotypeGVCFs - Done initializing engine
  08:14:49.792 INFO  ProgressMeter - Starting traversal
  08:14:49.793 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
  08:14:49.832 WARN  ReferenceConfidenceVariantContextMerger - Detected invalid annotations: When trying to merge variant contexts at location chr1:13613 the annotation AC=[1, 0] was not a numerical value and was ignored
  08:14:49.890 WARN  InbreedingCoeff - InbreedingCoeff will not be calculated at position chr1:13613 and possibly subsequent; at least 10 samples must have called genotypes
  08:14:50.177 INFO  GenotypeGVCFs - Shutting down engine
  [July 28, 2022 at 8:14:50 AM GMT] org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs done. Elapsed time: 0.02 minutes.
  Runtime.totalMemory()=2097152000
  java.lang.IllegalStateException: Number of alleles and number of allele-specific entries do not match.  Allele-specific annotations should have an entry for each allele including the reference.
  	at org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.AS_StrandBiasTest.parseRawDataString(AS_StrandBiasTest.java:158)
  	at org.broadinstitute.hellbender.tools.walkers.annotator.allelespecific.AS_StrandBiasTest.combineRawData(AS_StrandBiasTest.java:118)
  	at org.broadinstitute.hellbender.tools.walkers.annotator.VariantAnnotatorEngine.combineAnnotations(VariantAnnotatorEngine.java:234)
  	at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.mergeAttributes(ReferenceConfidenceVariantContextMerger.java:326)
  	at org.broadinstitute.hellbender.tools.walkers.ReferenceConfidenceVariantContextMerger.merge(ReferenceConfidenceVariantContextMerger.java:150)
  	at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFsEngine.callRegion(GenotypeGVCFsEngine.java:133)
  	at org.broadinstitute.hellbender.tools.walkers.GenotypeGVCFs.apply(GenotypeGVCFs.java:283)
  	at org.broadinstitute.hellbender.engine.VariantLocusWalker.lambda$traverse$0(VariantLocusWalker.java:135)
  	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:183)
  	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
  	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177)
  	at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
  	at java.base/java.util.Iterator.forEachRemaining(Iterator.java:133)
  	at java.base/java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
  	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
  	at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
  	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
  	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)
  	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
  	at java.base/java.util.stream.ReferencePipeline.forEachOrdered(ReferencePipeline.java:502)
  	at org.broadinstitute.hellbender.engine.VariantLocusWalker.traverse(VariantLocusWalker.java:132)
  	at org.broadinstitute.hellbender.engine.GATKTool.doWork(GATKTool.java:1085)
  	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140)
  	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192)
  	at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211)
  	at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160)
  	at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203)
  	at org.broadinstitute.hellbender.Main.main(Main.java:289)

from germline.

nvnieuwk avatar nvnieuwk commented on September 2, 2024

Comparison of GenotypeGVCFs with bcftools view/convert

=> The GVCFs were merged using CombineGVCFs

CPU hours tool(s) runtime tool(s) memory tool(s) cpus
GenotypeGVCFs 12.0 0.6m 1.4GB 1.5
bcftools view & convert 11.5 0.0m + 0.2m = 0.2m 20MB + 8MB = 28MB 1.8 + 1.0 = 2.8

from germline.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.