Git Product home page Git Product logo

graphtyper's Introduction

Build Status Format Status Conda Conda

graphtyper

graphtyper is a graph-based variant caller capable of genotyping population-scale short read data sets. It represents a reference genome and known variants of a genomic region using an acyclic graph structure (a "pangenome reference"), which high-throughput sequence reads are re-aligned to for the purpose of discovering and genotyping SNPs, small indels, and structural variants.

Maintainer: Hannes Pétur Eggertsson ([email protected])

Installation

Static binary release

The easiest way to install GraphTyper is go to "Releases" and download the latest binary, here: https://github.com/DecodeGenetics/graphtyper/releases

The binary is linked statically and therefore does not require any runtime libraries. If you prefer, you can also install graphtyper via bioconda: http://bioconda.github.io/recipes/graphtyper/README.html

Building from source

You may also want to build graphtyper from source, for example if you want to make changes to the code. In this case, you'll first need the following:

  • C++ compiler with full AVX512 support (GCC 8+)
  • Boost>=1.57.0
  • zlib>=1.2.8
  • libbz2
  • liblzma
  • Autotools, Automake, libtool, Make, and CMake>=3.2 (if you want to use our build system)

All other dependencies are submodules of this repository. Make sure have the CXX environment variable set as the same compiler as which g++ returns (because some of the submodules use the compiler directed by the CXX variable while other ignore it). Also set the BOOST_ROOT variable to the root of BOOST which should already be compiled with the same compiler. Graphtyper is linked with BOOST dynamically, but other libraries statically.

For the purpose of demonstration, we assume you want to clone graphtyper to ~/git/graphtyper and build it in ~/git/graphtyper/release-build.

mkdir -p ~/git && cd ~/git
git clone --recursive https://github.com/DecodeGenetics/graphtyper.git graphtyper && cd graphtyper
mkdir -p release-build && cd release-build
cmake ..
make -j4 graphtyper # The 'j' argument specifies how many compilation threads to use, you can change this if you have more threads available. Also, the compilation will take awhile... consider getting coffee at this point.
bin/graphtyper # Will run graphtyper for the very first time!

And that's all. If you are lucky enough to have administrative access, you can run sudo make install to install graphtyper system-wide.

Usage

The recommended way of genotyping small variants (SNP+indels) is using the genotype subcommand

./graphtyper genotype <REFERENCE.fa> --sams=<BAMLIST_OR_CRAMLIST> --region=<chrA:begin-end> --threads=<T>

and use the genotype_sv subcommand for genotyping structural variants

./graphtyper genotype_sv <REFERENCE.fa> <input.vcf.gz> --sams=<BAMLIST_OR_CRAMLIST> --region=<chrA:begin-end> --threads=<T>

See the graphtyper user guide for more details.

Citation

Small variant genotyping

Hannes P. Eggertsson, Hakon Jonsson, Snaedis Kristmundsdottir, Eirikur Hjartarson, Birte Kehr, Gisli Masson, Florian Zink, Kristjan E. Hjorleifsson, Aslaug Jonasdottir, Adalbjorg Jonasdottir, Ingileif Jonsdottir, Daniel F. Gudbjartsson, Pall Melsted, Kari Stefansson, Bjarni V. Halldorsson. Graphtyper enables population-scale genotyping using pangenome graphs. Nature Genetics 49, 1654–1660 (2017). doi:10.1038/ng.3964

Strucural variant genotyping

Eggertsson, H.P., Kristmundsdottir, S., Beyter, D. et al. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nature Communications 10, 5402 (2019) doi:10.1038/s41467-019-13341-9

License

MIT License

graphtyper's People

Contributors

h-2 avatar hannespetur avatar martin-g avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

graphtyper's Issues

Segmentation fault genotype_sv

Dear Hannes,

I am getting a line 17: 23836 Segmentation fault (core dumped) when genotyping SVs in two samples.

I have previously identified SVs with Manta v1.6.0 (default settings), converted the Manta output into the old inversion format (<v1.4.0 as described here), and merged with svimmer (today's version). I have tried to run genotype_sv v2.2.0 with both the output of svimmer (allGenomes.PunNye3.0.bwamem.SVs.vcf.gz) and a second file after applying the fix from issue #28 (allGenomes.PunNye3.0.bwamem.SVsfixed.vcf.gz), but get the same segfault.

Here is the graphtyper command I used:

ls *.bam > bams
graphtyper genotype_sv PunNye3.0.fasta allGenomes.PunNye3.0.bwamem.SVs.vcf.gz \
--sams=bams --region=chr9:3000000-4000000 --threads=20 --log GraphTyperSV.Run.9.log \
-v --output=allGenomes.SV

While the first regions on the chromosome worked smoothly, the segfault always happened in this interval for which I am attaching the data below:
[2020-03-31 16:14:22.369083] <info> Padded region is: chr9:2999000-4201000

You can find the input files for the problematic interval here (DMtest.tar.gz) and the reference genome here (PunNye3.0.fasta.gz, PunNye3.0.fasta.fai). The input bam files are just subsets covering the interval chr9:2700000-4300000 (samtools view file.bam chr9:2700000-4300000), the issue was the same whether I used the full data bam files or just this inerval.

Thank you for your help!

Best,
David

genotype with >1 VCF input file

I'm working on bacteria. My use case is that I have more than one VCF file, made with different tools on the same sample and ref genome. They contain calls that conflict with each other (eg REF=A, tool 1 says ALT=G, tool2 says ALT=AT). Is it possible to use VCFs like this as input to graphtyper, so it makes a graph of all the variants and then genotypes? Or wondering if I can cat them (and then sort/bgzip/index) and use that in the --vcf option of graphtyper genotype?

Minimum read length needed for genotype_sv?

I've been running tests on various data sets, and have had a few cases where graphtyper genotype_sv makes VCFs with almost every site having zero depth and a genotypes of "./.". Whereas the identical command for graphtyper genotype results in the calls I would expect. Example output:

NC_000962.3 558 NC_000962.3:558:SG G T 0.0 LowQUAL ABHet=-1;ABHom=-1;AC=0;AF=0;AN=0;CR=0;LOGF=nan;MQ=255;MaxAAS=0;MaxAASR=0;NHet=0;NHomAlt=0;NHomRef=0;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=0;RefLen=1;SB=-1;SBAlt=-1;SBF=0,0;SBF1=0,0;SBF2=0,0;SBR=0,0;SBR1=0,0;SBR2=0,0;SeqDepth=0;VarType=SG GT:AD:MD:DP:GQ:PL ./.:0,0:0:0:0:0,0,0

The only common thing I can find between my data sets is that this happens precisely on my 75bp reads data sets, never on my 100bp or longer reads. Is this expected? If so, are there any options that would get it working with 75bp reads?

v2.5 genotype segfault on toy data

Hi there,

I'm trying to run genotype on a tiny toy data set before running on real data, and getting a segfault. I'm using the v2.5 linux binary:

$ graphtyper genotype ref.fa --output graphtyper_out -vverbose --sam 01.map_reads.rmdup.bam --region=ref.1 --threads 1
[2020-05-15 13:43:04.723288] <info> Running the 'genotype' subcommand.
[2020-05-15 13:43:04.724787] <info> Genotyping region ref.1:1-1000
[2020-05-15 13:43:04.724828] <info> Path to genome is 'ref.fa'
[2020-05-15 13:43:04.724836] <info> Running with up to 1 threads.
[2020-05-15 13:43:04.724842] <info> Copying data from 1 input SAM/BAM/CRAMs to local disk.
[2020-05-15 13:43:04.724947] <info> Temporary folder is /tmp/graphtyper_200515_134304_ref.1_000000001.ZV7sLN
[2020-05-15 13:43:04.732289] <info> Finished copying data. Thread work: 1
[2020-05-15 13:43:04.732358] <info> Skipping merging step. Max files open are 1000
[2020-05-15 13:43:04.732375] <info> Number of bamShrinked files are 1 running accross 1 threads.
[2020-05-15 13:43:04.732387] <info> Initial variant discovery step starting.
[2020-05-15 13:43:04.738165] <info> Finished initial variant discovery step. Thread work info: 1
[2020-05-15 13:43:04.740876] <info> Further variant discovery step starting.
[2020-05-15 13:43:04.746890] <info> Finished calling. Thread work: 1
[2020-05-15 13:43:04.749655] <info> Call step 1 starting.
[2020-05-15 13:43:04.754768] <info> Finished calling. Thread work: 1
[2020-05-15 13:43:04.759509] <info> Call step 2 starting.
[2020-05-15 13:43:04.765364] <info> Finished calling. Thread work: 1
[2020-05-15 13:43:04.765483] <info> Merging output VCFs.
Segmentation fault (core dumped)

I've attached a zip containing the ref and bam files. Any ideas?

Thanks,
Martin

test_files.zip

Failure messages running genotype_sv on cram

When running the genotype sv command, I am getting several Failure messeages. Despite the Failure messages, the program continues to run and create vcf files. Any suggestion for this issue?

./graphtyper genotype_sv ref/GRCh38_full_analysis_set_plus_decoy_hla.fa  adsp5k.manta/adsp5k.manta.sv.pass.norm.ruth.vcf.gz  --region chr19 --sam /restricted/projectnb/casa/wgs.hg38/adni/cram/ADNI_002_S_0413.hg38.realign.bqsr.cram -O test
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 60010..87993

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 986715..1021450

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 1975426..2006880

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 2983871..3015930

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 3976507..4007866

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 4979982..5010913

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 5995460..6026559

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 6976042..7005815

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 7977979..8007779

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 8991052..9021057

[E::cram_next_slice] Failure to decode slice
[E::cram_get_ref] Failed to populate reference for id 18
[E::cram_decode_slice] Unable to fetch reference #18 9997526..10028057

Variant INFO/END position is before POS

Graphtyper can create variants where the END position is lower than POS. This results in a warning from bcftools concat:

Concatenating ./chr10/010000001-011000000.vcf.gz[W::vcf_parse]
INFO/END=9321046 is smaller than POS at chr10:9321047

and an error from bcftools index:

[E::hts_idx_push] Invalid record on sequence #4: end 140362273 < begin 140362275
index: failed to create index for "1000G_manta.diploidSV_graphtyper_test.vcf.gz"

The 2 problematic examples from the graphtyper output VCF:

chr4  140362275  chr4:140362275:OG  TA  G]chr11:131789668]  0  LowQUAL  ...END=140362273;...
chr10   9321047  chr10:9321047:OG   T   G]chr8:135381059]   0  LowQUAL  ...END=9321046;...

These variants were not in the svimmer output passed to graphtyper-- they were newly created by graphtyper.

genotype_sv on insertions

Hello,
I am running Graphtyper2 to genotype SVs that were discovered in another cohort. The VCF file from the discovery does not include the inserted sequence (it just uses the <INS> tag in the column for the alternate allele), though it does include the length of the insertion. Is it possible to use graphtyper to genotype such insertions without knowing the insertion sequence?

Thanks,
--Aakrosh

Graphtyper appropriate for low-coverage data (1x) ?

I am studying SVs in non-model species and interested to analyze SV at populations- scale, hence to be able to genotype a large number of individuals (~1000-2000). Contrary to humans, to have a lot of whole-genome samples we usually save money by working at low coverage (1-2x). To work on SNPs, this is good because more and more tools use genotype likelihoods. But I fail to find the equivalent for SV. Yet, Graphtyper strikes me as the kind of tools that work for that purpose and be more widely used in non-model species or ecological studies!

Do you think it can leverage the power to have large sample sizes to catalog and genotype SV, even if certainty at the individual level is poor due to shallow-coverage? Have you tried to apply this Graphtyper to low-coverage data (1-2x)? Do you think it would perform well or is it too risky? If I proceed to do so on my data, do you have specific recommendations to account for such a shallow coverage? Are you aware of other studies or methods that have done so?

Thanks a lot for your help and attention
Claire

FILTER=PASS but FT=FAIL1

Why is the filter a PASS for these 3 variants but the FT=FAIL1. Also what is the best software to compare these calls with the GIAB benchmark? The Ref All is very different than those found in the GIAB vcf. Is there some software available to make that comparison?

chr10   185506  chr10:185506:DG.2       N       <DEL:SVSIZE=57:AGGREGATED>      24      PASS    ABHet=0.325;ABHom=-1;AC=1;AF=0.5;AN=2;CR=0;END=185580;LOGF=2.214e-12;MaxAAS=13;MaxAASR=0.325;MaxAltPP=0;NHet=1;NHomAlt=0;NHomRef=0;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=2.4;RefLen=1;SB=1;SBAlt=-1;SBF=4,0
;SBF1=2,0;SBF2=2,0;SBR=0,0;SBR1=0,0;SBR2=0,0;SEQ=AGCACTTTGGGAGGCTG;SVLEN=57;SVMODEL=AGGREGATED;SVSIZE=57;SVTYPE=DEL;SV_ID=140;SeqDepth=40;VarType=DG    GT:FT:AD:MD:DP:RA:PP:GQ:PL      0/1:FAIL1:27,13:0:40:0,0:0:24:24,0,136
chr10   185506  chr10:185506:DG.5       N       <DEL:SVSIZE=74:AGGREGATED>      24      PASS    ABHet=0.325;ABHom=-1;AC=1;AF=0.5;AN=2;CR=0;END=185580;LOGF=2.214e-12;MaxAAS=13;MaxAASR=0.325;MaxAltPP=0;NHet=1;NHomAlt=0;NHomRef=0;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=2.4;RefLen=1;SB=1;SBAlt=-1;SBF=4,0
;SBF1=2,0;SBF2=2,0;SBR=0,0;SBR1=0,0;SBR2=0,0;SVLEN=74;SVMODEL=AGGREGATED;SVSIZE=74;SVTYPE=DEL;SV_ID=139;SeqDepth=40;VarType=DG  GT:FT:AD:MD:DP:RA:PP:GQ:PL      0/1:FAIL1:27,13:0:40:0,0:0:24:24,0,136
chr10   264506  chr10:264506:DG N       <DEL:SVSIZE=118:AGGREGATED>     33      PASS    ABHet=0.3171;ABHom=-1;AC=1;AF=0.5;AN=2;CR=0;END=264624;LOGF=3.776e-12;MaxAAS=13;MaxAASR=0.3171;MaxAltPP=0;NHet=1;NHomAlt=0;NHomRef=0;PASS_AC=0;PASS_AN=0;PASS_ratio=0;QD=3.3;RefLen=1;SB=0.2143;SBAlt=0.5;SBF=2,1
;SBF1=0,1;SBF2=2,0;SBR=10,1;SBR1=6,0;SBR2=4,1;SVLEN=118;SVMODEL=AGGREGATED;SVSIZE=118;SVTYPE=DEL;SV_ID=187;SeqDepth=41;VarType=DG       GT:FT:AD:MD:DP:RA:PP:GQ:PL      0/1:FAIL1:28,13:0:41:0,0:0:33:33,0,213

graphtyper genotyper_sv :<error> No regions specified

Hi,

I have 400 WGS samples and use manta+graphtyper2 to call SV. But I got an error.
graphtyper genotyper_sv : No regions specified. Either use --region or --region_file option to specify regions.

I want to know whether --region or --regio_file is necessary. If necessary, how can I get a regional file. I have no specific region and just want to call SV in whole genome.

Best!
Yi

[bug] 1, 2, 3, contig names not working

(reported via e-mail)

Reference genomes with 1, 2, 3, ... contig names are not handled properly (only reference genomes that use chr1, chr2, chr3, ... work).

Realignment stats

Hi Hannes,

Would it be possible to include a feature for obtaining some mapping statistics after read realignment to the pangenome?

Use case: Comparing the coverage between two regions after the reads have been mapped to the graph structure instead of relying on coverage stats from the original linear reference alignment.

Regards,
David

Segmentation fault when running graphtyper call

Hi,

I've been following the instructions in the Readme for creating a graph, indexing it and running graphtyper call, but I always end up getting segmentation fault after running graphtyper call. I've made a small test case that should be reproducible that always ends up with segmentation fault using this small fasta file and [this sam file](wget http://folk.uio.no/ivargry/graphtyper_testdata/test.sam).

wget http://folk.uio.no/ivargry/graphtyper_testdata/test.fa
wget http://folk.uio.no/ivargry/graphtyper_testdata/test.sam

samtools faidx test.fa
graphtyper construct testgraph test.fa 21:1-1,000,000
graphtyper index testgraph
graphtyper call testgraph --sam test.sam 21:1-1,000,000

My guess is that I am using graphtyper wrongly, but I am not able to see how. Have I misunderstood how to run graphtyper? I note that my fasta file has chromosome name "21" and not "chr21", thus I specify 21:1-1000000 when running the commands, but I guess that shouldn't matter?

I am using version 1.4 (50d2a57), but experience the same also when using version 2.0.

Any help is appreciated!

genotype_sv

Hi there,

I ran genotype_sv (V. 2.5.0) and got seg fault.

The command is
graphtyper genotype_sv $REF $VCF --sam $BAM --threads 10 --region chr20 --vverbose --log=log

SVs were detected by smoove, and I only kept <DEL>, <DUP>, <INV>, or <INS> in vcf.

The last lines of log are
`
[2020-05-15 17:44:34.458556] Finished calling. Thread work: 1

[2020-05-15 17:44:34.458629] caller.cpp:229 Finished calling all samples.

[2020-05-15 17:44:34.458642] Merging output VCFs.

[2020-05-15 17:44:34.465860] vcf_operations.cpp:317 Read 880 variants.
`

Thank you so much for your help.

memory corruption error on vcf_merge

I was able to successfully run graphtyper using the pipeline script (make_graphtyper_pipeline.sh) with a set of input bam alignments, which chunked and analyzed my genome (a haploid bacterial genome with two chromosomes) in 100kb fragments. I have the results of this pipeline in two folders, results and haps. I would like to merge the chunked vcf files into one final output with the variant calls, so I ran the vcf_merge command. however this dies with a memory corruption error:

graphtyper vcf_merge results/*/*.vcf.gz | bgzip -c > calls_merged.vcf.gz

*** Error in `graphtyper': malloc(): memory corruption: 0x0000000005bc01e0 ***
======= Backtrace: =========
/usr/lib64/libc.so.6(+0x7dd4d)[0x2ab3313a7d4d]
/usr/lib64/libc.so.6(__libc_malloc+0x4c)[0x2ab3313a9fbc]
/weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6(_Znwm+0x18)[0x2ab330b11888]
/weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6(_ZNSs4_Rep9_S_createEmmRKSaIcE+0x59)[0x2ab330b520e9]
/weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6(_ZNSs4_Rep8_M_cloneERKSaIcEm+0x1b)[0x2ab330b52f5b]
/weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6(_ZNSs7reserveEm+0x34)[0x2ab330b53004]
/weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6(_ZNSt15basic_stringbufIcSt11char_traitsIcESaIcEE8overflowEi+0xb7)[0x2ab330b4aa57]
/weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6(_ZNSt15basic_streambufIcSt11char_traitsIcEE6xsputnEPKcl+0x7b)[0x2ab330b9dfbb]
/weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6(ZNKSt7num_putIcSt19ostreambuf_iteratorIcSt11char_traitsIcEEE13_M_insert_intImEES3_S3_RSt8ios_basecT+0xde)[0x2ab330b8425e]
/weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6(ZNSo9_M_insertImEERSoT+0x75)[0x2ab330b8f895]
graphtyper(_ZN5gyper7Variant14generate_infosEv+0x43e)[0x6d096e]
graphtyper(_ZN5gyper9vcf_mergeERSt6vectorISsSaISsEERKSs+0x179b)[0x6f5b7b]
graphtyper(main+0x6204)[0x555a84]
/usr/lib64/libc.so.6(__libc_start_main+0xf5)[0x2ab33134bb35]
graphtyper[0x616441]
======= Memory map: ========
00400000-00a54000 r-xp 00000000 00:3a 4799808 /weisberga/bin/graphtyper
00c54000-00c55000 r--p 00654000 00:3a 4799808 /weisberga/bin/graphtyper
00c55000-00c57000 rw-p 00655000 00:3a 4799808 /weisberga/bin/graphtyper
00c57000-00c63000 rw-p 00000000 00:00 0
00ed4000-05be7000 rw-p 00000000 00:00 0 [heap]
2ab32fdde000-2ab32fde2000 rw-p 00000000 00:00 0
2ab32fde2000-2ab32fe02000 r-xp 00000000 08:05 1249276 /usr/lib64/ld-2.17.so
2ab32fe02000-2ab32fe08000 rw-p 00000000 00:00 0
2ab330001000-2ab330002000 r--p 0001f000 08:05 1249276 /usr/lib64/ld-2.17.so
2ab330002000-2ab330003000 rw-p 00020000 08:05 1249276 /usr/lib64/ld-2.17.so
2ab330003000-2ab330004000 rw-p 00000000 00:00 0
2ab33000a000-2ab330021000 r-xp 00000000 08:05 1248532 /usr/lib64/libpthread-2.17.so
2ab330021000-2ab330220000 ---p 00017000 08:05 1248532 /usr/lib64/libpthread-2.17.so
2ab330220000-2ab330221000 r--p 00016000 08:05 1248532 /usr/lib64/libpthread-2.17.so
2ab330221000-2ab330222000 rw-p 00017000 08:05 1248532 /usr/lib64/libpthread-2.17.so
2ab330222000-2ab330226000 rw-p 00000000 00:00 0
2ab33022a000-2ab330240000 r-xp 00000000 00:3a 64489206 /weisberga/lib/libz.so.1.2.8
2ab330240000-2ab33043f000 ---p 00016000 00:3a 64489206 /weisberga/lib/libz.so.1.2.8
2ab33043f000-2ab330440000 rw-p 00015000 00:3a 64489206 /weisberga/lib/libz.so.1.2.8
2ab330442000-2ab330451000 r-xp 00000000 08:05 1248868 /usr/lib64/libbz2.so.1.0.6
2ab330451000-2ab330650000 ---p 0000f000 08:05 1248868 /usr/lib64/libbz2.so.1.0.6
2ab330650000-2ab330651000 r--p 0000e000 08:05 1248868 /usr/lib64/libbz2.so.1.0.6
2ab330651000-2ab330652000 rw-p 0000f000 08:05 1248868 /usr/lib64/libbz2.so.1.0.6
2ab330652000-2ab330659000 r-xp 00000000 08:05 1248536 /usr/lib64/librt-2.17.so
2ab330659000-2ab330858000 ---p 00007000 08:05 1248536 /usr/lib64/librt-2.17.so
2ab330858000-2ab330859000 r--p 00006000 08:05 1248536 /usr/lib64/librt-2.17.so
2ab330859000-2ab33085a000 rw-p 00007000 08:05 1248536 /usr/lib64/librt-2.17.so
2ab33085a000-2ab33087e000 r-xp 00000000 08:05 1248832 /usr/lib64/liblzma.so.5.0.99
2ab33087e000-2ab330a7d000 ---p 00024000 08:05 1248832 /usr/lib64/liblzma.so.5.0.99
2ab330a7d000-2ab330a7e000 r--p 00023000 08:05 1248832 /usr/lib64/liblzma.so.5.0.99
2ab330a7e000-2ab330a7f000 rw-p 00024000 08:05 1248832 /usr/lib64/liblzma.so.5.0.99
2ab330a82000-2ab330bf6000 r-xp 00000000 00:3a 373877393 /weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6.0.25
2ab330bf6000-2ab330df6000 ---p 00174000 00:3a 373877393 /weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6.0.25
2ab330df6000-2ab330e00000 r--p 00174000 00:3a 373877393 /weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6.0.25
2ab330e00000-2ab330e02000 rw-p 0017e000 00:3a 373877393 /weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libstdc++.so.6.0.25
2ab330e02000-2ab330e06000 rw-p 00000000 00:00 0
2ab330e0a000-2ab330f0a000 r-xp 00000000 08:05 1248514 /usr/lib64/libm-2.17.so
2ab330f0a000-2ab33110a000 ---p 00100000 08:05 1248514 /usr/lib64/libm-2.17.so
2ab33110a000-2ab33110b000 r--p 00100000 08:05 1248514 /usr/lib64/libm-2.17.so
2ab33110b000-2ab33110c000 rw-p 00101000 08:05 1248514 /usr/lib64/libm-2.17.so
2ab331112000-2ab331129000 r-xp 00000000 00:3a 373876576 /weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libgcc_s.so.1
2ab331129000-2ab331328000 ---p 00017000 00:3a 373876576 /weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libgcc_s.so.1
2ab331328000-2ab331329000 r--p 00016000 00:3a 373876576 /weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libgcc_s.so.1
2ab331329000-2ab33132a000 rw-p 00017000 00:3a 373876576 /weisberga/Software/gcc-8.2.0/build/usr/local/lib64/libgcc_s.so.1
2ab33132a000-2ab3314e0000 r-xp 00000000 08:05 1248506 /usr/lib64/libc-2.17.so
2ab3314e0000-2ab3316e0000 ---p 001b6000 08:05 1248506 /usr/lib64/libc-2.17.so
2ab3316e0000-2ab3316e4000 r--p 001b6000 08:05 1248506 /usr/lib64/libc-2.17.so
2ab3316e4000-2ab3316e6000 rw-p 001ba000 08:05 1248506 /usr/lib64/libc-2.17.so
2ab3316e6000-2ab3317fc000 rw-p 00000000 00:00 0
2ab334000000-2ab334021000 rw-p 00000000 00:00 0
2ab334021000-2ab338000000 ---p 00000000 00:00 0
7ffc8c106000-7ffc8c12a000 rw-p 00000000 00:00 0 [stack]
7ffc8c16a000-7ffc8c16c000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

construct error

Hi,
I just run the graphtyper on test data "index_test.fa" via command graphtyper construct index_test_graph index_test.fa --vcf index_test.vcf.gz, however, there are error messages

ERROR: Argument 'regions' was not passed but is required.
graphtyper  {OPTIONS} [construct] [GRAPH] [REFERENCE.fa] [REGION]

  Graphtyper's graph construction tool

OPTIONS:

    -h, --help                        Display this help.
    construct                         Graphtyper command to execute.
    GRAPH                             Graph file to use.
    REFERENCE.fa                      Reference FASTA file to use.
    REGION                            Region to use.
    --vcf=[FILE.vcf.gz]               VCF bgzipped file with variants.
    --log=[LOG.txt]                   Log filename. If none specified logs are
                                      written to standard output.
    --sv_graph=[GRAPH]                Graph used for SV variant predictions.
    "--" can be used to terminate flag options and force all following
    arguments to be treated as positional options

Also, I tried the command graphtyper construct index_test_graph index_test.fa --vcf=index_test.vcf.gz and received the same error message.

Best
Xiaofei

Call population SVs and Genotype

Dear @hannespetur

Thank you for this nice tool. I have some questions about how to process the population short sequencing data. My plan like following. Hope you can give me some suggestions. Thank you.

1, for each sample, I will use 3 independent tools to call SVs: Delly, Lumpy and Manta.

2, I will filter for SVs passing the quality filters suggested by DELLY, Lumpy and Manta (flag PASS).

3, only SVs called by at least 2 of the 3 tools were retained. Here, I do not know if svimmer can realize this process? I know SURVIVOR can, but its output can not be as input for graphtyper.

Through above processes, I will get one intergrated VCF file for each sample.

4, Using svimmer to merge the intergrated VCF across different samples into a single VCF file for this population data.

5, Genotype the SVs for this population using graphtyper with --sams all.bam.list and svimmer merged VCF file.

From NC publication (https://doi.org/10.1038/s41467-019-13341-9), I found you only used Manta to call SVs and kept all the variants including those that did not pass in Manta's filter. I do not if my plan can work fine.

Sincerely.
Zhuqing

Increasing sensitivity for variants with limited read support

Hi Hannes,

I am running graphtyper genotype. However, there are some heterozygous variants being missed (compared with truth data) probably due to the alt allele depth being low (~10) compared to the ref allele depth (~98 or more). Is there an option I can use to promote sensitivity for such variants?

Thanks

David.

cmake cannot find pthread

I'm having trouble building graphtyper:

Determining if the pthread_create exist failed with the following output:
Change Dir: /lustre/scratch115/realdata/mdt3/projects/graphs/graphtyper/release-build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_a4ad3/fast"
/usr/bin/make -f CMakeFiles/cmTC_a4ad3.dir/build.make CMakeFiles/cmTC_a4ad3.dir/build
make[1]: Entering directory `/lustre/scratch115/realdata/mdt3/projects/graphs/graphtyper/release-build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_a4ad3.dir/CheckSymbolExists.c.o
/nfs/users/nfs_e/eg10/graphs/bin/gcc    -o CMakeFiles/cmTC_a4ad3.dir/CheckSymbolExists.c.o   -c /lustre/scratch115/realdata/mdt3/projects/graphs/graphtyper/release-build
/CMakeFiles/CMakeTmp/CheckSymbolExists.c
Linking C executable cmTC_a4ad3
/lustre/scratch115/realdata/mdt3/projects/graphs/bin/cmake -E cmake_link_script CMakeFiles/cmTC_a4ad3.dir/link.txt --verbose=1
/nfs/users/nfs_e/eg10/graphs/bin/gcc       CMakeFiles/cmTC_a4ad3.dir/CheckSymbolExists.c.o  -o cmTC_a4ad3 -rdynamic
CMakeFiles/cmTC_a4ad3.dir/CheckSymbolExists.c.o: In function `main':
CheckSymbolExists.c:(.text+0x16): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status
make[1]: *** [cmTC_a4ad3] Error 1
make[1]: Leaving directory `/lustre/scratch115/realdata/mdt3/projects/graphs/graphtyper/release-build/CMakeFiles/CMakeTmp'
make: *** [cmTC_a4ad3/fast] Error 2

It seems that cmake can't find pthread, but I apparently have it on my system. Has anyone else run into this problem?

Segmentation fault on a small SV example

I'm getting Segmentation fault (core dumped) when I try to genotype SVs on a small simulated dataset.
Do you see anything that could cause this from my input files?

simsv.zip contains:

  • ref.fa
  • truth.vcf.gz
  • s0.bam

I tried running the following commands:

graphtyper genotype_sv ref.fa truth.vcf.gz --sam=s0.bam --region=chr1:1-2852430

and

graphtyper construct my_graph ref.fa chr1 --vcf=truth.vcf.gz --sv_graph
graphtyper index my_graph
mkdir calls
graphtyper call my_graph . --sam=s0.bam --no_new_variants --output=calls

I'm using the v2.1 graphtyper binary.

Fatal error "FAI index has no entry" for GRCh38 HLA alt contig

Running graphtyper on GRCh38 with alt contigs, I get the following fatal error on chr6:

<error> [constructor.cpp:159] FAI index has no entry for contig/chromosome 'HLA-DRB1*13'

Both the FASTA file and its FAI index actually contain these contigs:
HLA-DRB1*13:01:01
HLA-DRB1*13:02:01

Is graphtyper truncating the contig name at the colon?

support for haploid/different ploidy

Hi,

I am really interested in using graphtyper for my bacterial genome SNP calling as there is a lot of diversity in some lineages and typical SNP calling pipelines run into issues when strains are fairly diverged from a reference. I was able to run graphtyper on a bacterial genome dataset, and manual inspection of the output shows several heterozygous variant calls (0/1) which are incorrect with a haploid genome. I was wondering if there is an option to set the ploidy level for SNP calls? Thanks!

Best,
Alex

make error "stdlib.h"

Hello,
I'm trying to make graphtyper, but I get the following error in ubuntu 18.04, although I believe I have all the dependencies installed. Do you have any idea? Am I missing something?

Thanks,
Arda

[ 44%] No install step for 'project_rocksdb'
[ 44%] Completed 'project_rocksdb'
[ 44%] Built target project_rocksdb
Scanning dependencies of target graphtyper_objects
[ 45%] Building CXX object src/CMakeFiles/graphtyper_objects.dir/graph/absolute_position.cpp.o
[ 48%] Building CXX object src/CMakeFiles/graphtyper_objects.dir/graph/constructor.cpp.o
[ 48%] Building CXX object src/CMakeFiles/graphtyper_objects.dir/graph/genomic_region.cpp.o
[ 48%] Building CXX object src/CMakeFiles/graphtyper_objects.dir/graph/genotype.cpp.o
In file included from /usr/include/c++/7/ext/string_conversions.h:41:0,
from /usr/include/c++/7/bits/basic_string.h:6361,
from /usr/include/c++/7/string:52,
from /home/asoylev/apps/graphtyper/include/graphtyper/graph/absolute_position.hpp:3,
from /home/asoylev/apps/graphtyper/src/graph/absolute_position.cpp:4:
/usr/include/c++/7/cstdlib:75:15: fatal error: stdlib.h: No such file or directory
#include_next <stdlib.h>
^~~~~~~~~~
compilation terminated.
src/CMakeFiles/graphtyper_objects.dir/build.make:62: recipe for target 'src/CMakeFiles/graphtyper_objects.dir/graph/absolute_position.cpp.o' failed
make[3]: *** [src/CMakeFiles/graphtyper_objects.dir/graph/absolute_position.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs....
In file included from /usr/include/c++/7/ext/string_conversions.h:41:0,
from /usr/include/c++/7/bits/basic_string.h:6361,
from /usr/include/c++/7/string:52,
from /usr/include/c++/7/bits/locale_classes.h:40,
from /usr/include/c++/7/bits/ios_base.h:41,
from /usr/include/c++/7/ios:42,
from /usr/include/c++/7/istream:38,
from /usr/include/c++/7/sstream:38,
from /home/asoylev/apps/graphtyper/src/graph/constructor.cpp:2:
/usr/include/c++/7/cstdlib:75:15: fatal error: stdlib.h: No such file or directory
#include_next <stdlib.h>
^~~~~~~~~~
compilation terminated.
src/CMakeFiles/graphtyper_objects.dir/build.make:86: recipe for target 'src/CMakeFiles/graphtyper_objects.dir/graph/constructor.cpp.o' failed
make[3]: *** [src/CMakeFiles/graphtyper_objects.dir/graph/constructor.cpp.o] Error 1
In file included from /usr/include/c++/7/ext/string_conversions.h:41:0,
from /usr/include/c++/7/bits/basic_string.h:6361,
from /usr/include/c++/7/string:52,
from /usr/include/c++/7/bits/locale_classes.h:40,
from /usr/include/c++/7/bits/ios_base.h:41,
from /usr/include/c++/7/ios:42,
from /usr/include/c++/7/ostream:38,
from /home/asoylev/apps/graphtyper/src/graph/genomic_region.cpp:1:
/usr/include/c++/7/cstdlib:75:15: fatal error: stdlib.h: No such file or directory
#include_next <stdlib.h>
^~~~~~~~~~
compilation terminated.
src/CMakeFiles/graphtyper_objects.dir/build.make:110: recipe for target 'src/CMakeFiles/graphtyper_objects.dir/graph/genomic_region.cpp.o' failed
make[3]: *** [src/CMakeFiles/graphtyper_objects.dir/graph/genomic_region.cpp.o] Error 1
In file included from /usr/include/c++/7/ext/string_conversions.h:41:0,
from /usr/include/c++/7/bits/basic_string.h:6361,
from /usr/include/c++/7/string:52,
from /usr/include/c++/7/bits/locale_classes.h:40,
from /usr/include/c++/7/bits/ios_base.h:41,
from /usr/include/c++/7/ios:42,
from /usr/include/c++/7/ostream:38,
from /usr/include/boost/archive/binary_oarchive.hpp:19,
from /home/asoylev/apps/graphtyper/src/graph/genotype.cpp:3:
/usr/include/c++/7/cstdlib:75:15: fatal error: stdlib.h: No such file or directory
#include_next <stdlib.h>
^~~~~~~~~~
compilation terminated.
src/CMakeFiles/graphtyper_objects.dir/build.make:134: recipe for target 'src/CMakeFiles/graphtyper_objects.dir/graph/genotype.cpp.o' failed
make[3]: *** [src/CMakeFiles/graphtyper_objects.dir/graph/genotype.cpp.o] Error 1
CMakeFiles/Makefile2:314: recipe for target 'src/CMakeFiles/graphtyper_objects.dir/all' failed
make[2]: *** [src/CMakeFiles/graphtyper_objects.dir/all] Error 2
CMakeFiles/Makefile2:367: recipe for target 'src/CMakeFiles/graphtyper.dir/rule' failed
make[1]: *** [src/CMakeFiles/graphtyper.dir/rule] Error 2
Makefile:266: recipe for target 'graphtyper' failed
make: *** [graphtyper] Error 2

bioconda package

Hi there,

I am currently working on making graph genome tools more accessible for non-bioinformaticians, together with @AlexanderDilthey. We think that bioconda has gained a lot of support in the community, and so we would like to add a graphtyper package to that channel. Would you be interested in collaborating on this?

One region in Trio has very slow genotyping

Using a large library of breakpoints, a job was submitted on the GIAB HG002 sample for each chromosome using 8 threads. Most chromosomes completed in under 8 hours except chromosome 4 in one region. Chr 4 is still running after 2 more days. This also occurred with the two parents.

Any suggestions on this?

-rw-r--r-- 1 farrell casa  726 Jul  7 01:20 workarea/HG002/chr4/048000001-049000000.vcf.gz.tbi
-rw-r--r-- 1 farrell casa 2.8M Jul  7 01:20 workarea/HG002/chr4/048000001-049000000.vcf.gz

-rw-r--r-- 1 farrell casa  664 Jul  5 14:56 workarea/HG002/chr4/047000001-048000000.vcf.gz.tbi
-rw-r--r-- 1 farrell casa 2.0M Jul  5 14:56 workarea/HG002/chr4/047000001-048000000.vcf.gz

SV end points for "OG" variants

Hi @hannespetur,

Thanks again for fixing #43!

Browsing through my first VCFs, I found five variant classes ("VarType"): OG (very long/interchromosomal), UG (duplications), DG (deletions), FG (insertions), XG (complex indels?). For OG-type variants, I consistently obtain REF allele counts only (i.e. ABHet=-1;ABHom=1). This was very unexpected because some of these calls are definitely true heterozygotes.

So I dipped into the VCF's INFO field a bit more, and realise that for OGs the specified SV "END" point is actually the same as the "POS" identifier in column 2. This is not the case for the other two potential long-range VarTypes UG and DG. If this "END" point is indicative of GraphTyper's internal use, then to me it feels like the script might currently "go circular" for OGs, checking starting point <> starting point combinations which therefore nearly always render 100 % REF.

Could you please have a look at this? Happy to test any changes to this again.

Best,
Max

genotype_sv [W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/6aef897c3d6ff0c78aff06ac189178dd": Protocol not supported

hi: my jo
$graphtyper genotype_sv /zfssz2/ST_MCHRI/BIGDATA/USER/lizhichao/cnvnator/testdata/hg38/Homo_sapiens_assembly38.fasta merged.vcf.gz --sams=test_svimmer.list --region_file=/zfssz2/ST_MCHRI/BIGDATA/USER/lizhichao/cnvnator/software/svimmer/region_file
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/6aef897c3d6ff0c78aff06ac189178dd": Protocol not supported
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/6aef897c3d6ff0c78aff06ac189178dd": Protocol not supported
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/6aef897c3d6ff0c78aff06ac189178dd": Protocol not supported
[W::find_file_url] Failed to open reference "https://www.ebi.ac.uk/ena/cram/md5/6aef897c3d6ff0c78aff06ac189178dd": Protocol not supported

Applying variants with symbolic alleles made by genotype_sv

Hi!

I'm using graphtyper genotype_sv for genotyping deletions together with SNPs/indels, and then applying the genotyped variants to the ref genome using

samtools faidx ref:start-stop ref.fasta | bcftools consensus genotype_sv_out.vcf.gz

And bcftools errors on a genotyped deletion on a toy example with Symbolic alleles other than <DEL> are currently not supported: <DEL:SVSIZE=2304:AGGREGATED>

The deletion in question has three records, all with same POS, ref of N and successively as ALT: <DEL:SVSIZE=2304:AGGREGATED>, <DEL:SVSIZE=2304:BREAKPOINT>, <DEL:SVSIZE=2304:COVERAGE>

i) Have you ever run into such an application (applying variants from genotype_sv), and if so how do you deal with such cases?
ii) I noticed replacing the above <DEL:SVSIZE=2304:AGGREGATED> with <DEL> restores bcftools's ability to apply the deletion to the ref sequence. I'm considering doing that as a hack?

Best

explanation of parameters in graphtyper genotype

Hi, I am wondering about this argument to graphtyper genotype:

  --avg_cov_by_readlen=value
      File with average boverage by read length.

and several others. Can I leave that empty?

Also, if I specify a --vcf, will graphtyper automatically subset it to the requested --region ?
thank you.

multiple errors while compiling.

Is a bioconda version planned? This would be a great help since I have already spent some time on this and am nowhere near completion.

Any ideas on these errors ? Ubuntu 16.04

Thanks.

CXX=/usr/bin/g++
echo $CXX

BOOST_ROOT=/usr/include/boost
echo $BOOST_ROOT

Ubuntu 16.04 packages

sudo apt install liblz4-dev
sudo apt install liblzma-dev

cd /mnt/ngsnfs/tools
git clone --recursive https://github.com/DecodeGenetics/graphtyper.git

cmake ..
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Checking if C linker supports --verbose
-- Checking if C linker supports --verbose - yes
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Checking if CXX linker supports --verbose
-- Checking if CXX linker supports --verbose - yes
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build type:
-- Building in release mode.
-- Using GCC
-- CXX flags are: -march=core2 -mtune=generic -W -Wall -std=c++11 -DSEQAN_USE_HTSLIB=1 -DSEQAN_HAS_ZLIB=1 -O3 -DNDEBUG -DSEQAN_ENABLE_TESTING=0 -DSEQAN_ENABLE_DEBUG=0 -lrt
-- Checking for zlib
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.8")
-- Checking for bzip2
-- Found BZip2: /mnt/ngsnfs/tools/miniconda2/lib/libbz2.a (found version "1.0.6")
-- Looking for BZ2_bzCompressInit
-- Looking for BZ2_bzCompressInit - found
-- Checking for Boost
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Boost version: 1.58.0
-- Found the following Boost libraries:
-- iostreams
-- log
-- thread
-- serialization
-- system
-- regex
-- log_setup
-- date_time
-- filesystem
-- chrono
-- atomic
-- snappy target location is /mnt/ngsnfs/tools/graphtyper/snappy/.libs/libsnappy.a
-- htslib target location is /mnt/ngsnfs/tools/graphtyper/htslib/libhts.a
-- StatGen target location is /mnt/ngsnfs/tools/graphtyper/statgen/libStatGen.a
-- Performing Test LZ4_GOOD_VERSION
-- Performing Test LZ4_GOOD_VERSION - Success
-- Found LZ4: /usr/lib/x86_64-linux-gnu/liblz4.so
-- Found LZ4: /usr/lib/x86_64-linux-gnu/liblz4.so
-- Compiling graphtyper's source files
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/ngsnfs/tools/graphtyper/release-build
rcug@hpc01:/mnt/ngsnfs/tools/graphtyper/release-build$
rcug@hpc01:/mnt/ngsnfs/tools/graphtyper/release-build$
rcug@hpc01:/mnt/ngsnfs/tools/graphtyper/release-build$ make -j16 graphtyper
Scanning dependencies of target project_sparsehash
Scanning dependencies of target project_snappy
Scanning dependencies of target project_statgen
Scanning dependencies of target project_htslib
[ 3%] Creating directories for 'project_snappy'
[ 3%] Creating directories for 'project_sparsehash'
[ 3%] Creating directories for 'project_statgen'
[ 5%] Creating directories for 'project_htslib'
[ 7%] No download step for 'project_snappy'
[ 7%] No download step for 'project_sparsehash'
[ 7%] No download step for 'project_statgen'
[ 8%] No download step for 'project_htslib'
[ 16%] No patch step for 'project_sparsehash'
[ 16%] No update step for 'project_statgen'
[ 16%] No patch step for 'project_snappy'
[ 16%] No update step for 'project_sparsehash'
[ 16%] No patch step for 'project_statgen'
[ 16%] No update step for 'project_snappy'
[ 18%] No update step for 'project_htslib'
[ 18%] No patch step for 'project_htslib'
[ 21%] Performing configure step for 'project_snappy'
[ 22%] No configure step for 'project_statgen'
[ 22%] Performing configure step for 'project_sparsehash'
[ 23%] No configure step for 'project_htslib'
[ 25%] Performing build step for 'project_statgen'
[ 26%] Performing build step for 'project_htslib'
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... ar: u' modifier ignored since D' is the default (see U') yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking for g++... g++ checking whether the C++ compiler works... yes checking for C++ compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C++ compiler... /mnt/ngsnfs/tools/graphtyper/snappy/autogen.sh: 10: /mnt/ngsnfs/tools/graphtyper/snappy/autogen.sh: libtoolize: not found CMakeFiles/project_snappy.dir/build.make:105: recipe for target '../snappy/src/project_snappy-stamp/project_snappy-configure' failed make[3]: *** [../snappy/src/project_snappy-stamp/project_snappy-configure] Error 127 CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/project_snappy.dir/all' failed make[2]: *** [CMakeFiles/project_snappy.dir/all] Error 2 make[2]: *** Waiting for unfinished jobs.... yes checking whether g++ accepts -g... yes checking for style of include used by make... GNU checking dependency style of g++... gcc3 checking for gcc... gcc checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking dependency style of gcc... gcc3 checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... /bin/grep checking for egrep... /bin/grep -E checking for ANSI C header files... yes checking for memcpy... yes checking for memmove... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for uint16_t... yes checking for u_int16_t... yes checking for __uint16... no checking for long long... yes checking sys/resource.h usability... yes checking sys/resource.h presence... yes checking for sys/resource.h... yes checking for unistd.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking sys/utsname.h usability... yes checking sys/utsname.h presence... yes checking for sys/utsname.h... yes checking how to run the C++ preprocessor... g++ -E checking google/malloc_extension.h usability... no checking google/malloc_extension.h presence... no checking for google/malloc_extension.h... no checking whether the compiler implements namespaces... yes checking the location of hash_map... <tr1/unordered_map> checking how to include hash_fun directly... <tr1/functional> configure: creating ./config.status config.status: creating Makefile config.status: creating src/config.h config.status: executing depfiles commands test/sam.c: In function ‘faidx1’: test/sam.c:186:5: warning: ‘faidx_fetch_nseq’ is deprecated: Please use faidx_nseq instead [-Wdeprecated-declarations] n = faidx_fetch_nseq(fai); ^ In file included from test/sam.c:32:0: ./htslib/faidx.h:94:9: note: declared here int faidx_fetch_nseq(const faidx_t *fai) HTS_DEPRECATED("Please use faidx_nseq instead"); ^ [ 27%] Performing build step for 'project_sparsehash' /mnt/ngsnfs/tools/graphtyper/sparsehash/missing: line 52: aclocal-1.11: command not found WARNING: aclocal-1.11' is missing on your system. You should only need it if
you modified acinclude.m4' or configure.ac'. You might want
to install the Automake' and Perl' packages. Grab them from
any GNU archive site.
cd . && /bin/bash /mnt/ngsnfs/tools/graphtyper/sparsehash/missing --run automake-1.11 --gnu
/mnt/ngsnfs/tools/graphtyper/sparsehash/missing: line 52: automake-1.11: command not found
WARNING: automake-1.11' is missing on your system. You should only need it if you modified Makefile.am', acinclude.m4' or configure.ac'.
You might want to install the Automake' and Perl' packages.
Grab them from any GNU archive site.
aclocal.m4:16: warning: this file was generated for autoconf 2.68.
You have another version of autoconf. It may work, but is not guaranteed to.
If you have problems, you may need to regenerate the build system entirely.
To do so, use the procedure documented by the package, typically autoreconf'. running CONFIG_SHELL=/bin/bash /bin/bash /mnt/ngsnfs/tools/graphtyper/sparsehash/configure --no-create --no-recursion checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... ar: u' modifier ignored since D' is the default (see U')
/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking for g++... g++
checking whether the C++ compiler works... [ 28%] No install step for 'project_htslib'
[ 30%] Completed 'project_htslib'
yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables... [ 30%] Built target project_htslib

checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking for style of include used by make... GNU
checking dependency style of g++... gcc3
checking for gcc... gcc
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking dependency style of gcc... ar: u' modifier ignored since D' is the default (see U') gcc3 checking how to run the C preprocessor... gcc -E checking for grep that handles long lines and -e... ar: u' modifier ignored since D' is the default (see U')
/bin/grep
checking for egrep... /bin/grep -E
checking for ANSI C header files... yes
checking for memcpy... yes
checking for memmove... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... ar: u' modifier ignored since D' is the default (see U') yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for uint16_t... yes checking for u_int16_t... yes checking for __uint16... no checking for long long... yes checking sys/resource.h usability... yes checking sys/resource.h presence... yes checking for sys/resource.h... yes checking for unistd.h... (cached) yes checking sys/time.h usability... yes checking sys/time.h presence... yes checking for sys/time.h... yes checking sys/utsname.h usability... yes checking sys/utsname.h presence... yes checking for sys/utsname.h... yes checking how to run the C++ preprocessor... g++ -E checking google/malloc_extension.h usability... no checking google/malloc_extension.h presence... no checking for google/malloc_extension.h... no checking whether the compiler implements namespaces... yes checking the location of hash_map... <tr1/unordered_map> checking how to include hash_fun directly... <tr1/functional> configure: creating ./config.status ar: u' modifier ignored since D' is the default (see U')
[ 31%] No install step for 'project_statgen'
/bin/bash ./config.status
[ 32%] Completed 'project_statgen'
[ 32%] Built target project_statgen
config.status: creating Makefile
config.status: creating src/config.h
config.status: src/config.h is unchanged
config.status: executing depfiles commands
config.status: creating src/config.h
config.status: src/config.h is unchanged

Make errors: No download step for 'project_xxx'

Hi,

We've been getting errors when running make with the source code from v1.4. It looks like it might be related to the submodules.

BOOST_ROOT didn't pick up the non-default boost so I added -DBOOST_ROOT:PATHNAME to cmake.

Versions:

GCC 4.8.5
Boost 1.57.0
cmake 3.0.0

GitHub was having issues so we went for the release download instead of the git clone.

Any ideas on where I'm missing something?

Many thanks in advance!!

wget https://github.com/DecodeGenetics/graphtyper/archive/v1.4.tar.gz
tar -xf v1.4.tar.gz && cd graphtyper-1.4

export PATH=/software/gcc-4.8.1/bin:$PATH
export LD_LIBRARY_PATH=/software/gcc-4.8.1/lib64:$LD_LIBRARY_PATH
export CC=/software/gcc-4.8.1/bin/gcc
export CXX=/software/gcc-4.8.1/bin/g++
export BOOST_ROOT=/software/boost-1.57

mkdir -p release-build && cd release-build
cmake -DBOOST_ROOT:PATHNAME=/software/boost-1.57 ..
make -j4 graphtyper
Scanning dependencies of target project_statgen
Scanning dependencies of target project_snappy
Scanning dependencies of target project_sparsehash
Scanning dependencies of target project_htslib
[  3%] [  4%] [  4%] [  4%] Creating directories for 'project_statgen'
Creating directories for 'project_htslib'
Creating directories for 'project_snappy'
Creating directories for 'project_sparsehash'
[  5%] [  9%] [  9%] [  9%] No download step for 'project_statgen'
No download step for 'project_htslib'
No download step for 'project_sparsehash'
No download step for 'project_snappy'
[ 10%] [ 14%] [ 14%] [ 14%] No patch step for 'project_statgen'
No patch step for 'project_htslib'
No patch step for 'project_sparsehash'
No patch step for 'project_snappy'
[ 15%] [ 19%] [ 19%] [ 19%] No update step for 'project_statgen'
No update step for 'project_sparsehash'
No update step for 'project_snappy'
No update step for 'project_htslib'
[ 20%] [ 23%] [ 23%] [ 23%] No configure step for 'project_statgen'
Performing configure step for 'project_sparsehash'
Performing configure step for 'project_snappy'
No configure step for 'project_htslib'
/bin/sh: /lustre/scratch118/infgen/pathdev/vo1/install_test/graphtyper-1.4/snappy/autogen.sh: No such file or directory
/bin/sh: /lustre/scratch118/infgen/pathdev/vo1/install_test/graphtyper-1.4/sparsehash/configure: No such file or directory
make[3]: *** [../snappy/src/project_snappy-stamp/project_snappy-configure] Error 127
make[3]: *** [../sparsehash/src/project_sparsehash-stamp/project_sparsehash-configure] Error 127
make[2]: *** [CMakeFiles/project_snappy.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....
make[2]: *** [CMakeFiles/project_sparsehash.dir/all] Error 2
[ 25%] Performing build step for 'project_statgen'
[ 26%] make[4]: *** No targets specified and no makefile found. Stop.
make[3]: *** [../statgen/src/project_statgen-stamp/project_statgen-build] Error 2
make[2]: Performing build step for 'project_htslib'
*** [CMakeFiles/project_statgen.dir/all] Error 2
make[4]: *** No rule to make target `libhts.a'. Stop.
make[3]: *** [../htslib/src/project_htslib-stamp/project_htslib-build] Error 2
make[2]: *** [CMakeFiles/project_htslib.dir/all] Error 2
make[1]: *** [src/CMakeFiles/graphtyper.dir/rule] Error 2
make: *** [graphtyper] Error 2

core dumped

Hi:
when i run graphtyper,it broken with the error,that means the computer node's cpu cant satisfy the graphtyper runnning? if so,when i run the script with 350 samples, what is the necessary computing resource for runnning the software?
hoping your reply,thank you!

the error::
vimmer_graphtyper.sh: line 6: 9481 Aborted (core dumped) /zfssz2/ST_MCHRI/BIGDATA/USER/lizhichao/pythonenv/python3_pakages/bin/graphtyper genotype_sv /hwfssz1/BIGDATA_COMPUTING/GaeaProject/reference/hg38_noalt_withrandom/hg38.fa $outsvimmer --sams=/zfssz2/ST_MCHRI/BIGDATA/USER/lizhichao/cnvnator/testdata/332_cram_bam.path --region_file=/zfssz2/ST_MCHRI/BIGDATA/USER/lizhichao/cnvnator/software/svimmer/region_file --output=$outdirgraphtyper

How to realign short reads to the graph pan-genome (constructed by GraphTyper construct) ?

Hello graphtyper team,

I want to use graphtyper to genotype SVs (detected by assemblies aligments) in a large population (350 accessions, short reads). I can use construct subcommand to construct graph pangenome, however, there is no subcommnad to map all short reads to the graph pangenome to obtain the bam files. Thank you in advance for your response.

Best.
Xu

https instead of git for unprivileged

Hi,

interesting sounding tool.

Installation

The https clone will likely work better than the git:// URL for most unprivileged users.

-Works fine
git clone --recursive https://github.com/DecodeGenetics/graphtyper.git
Cloning into 'graphtyper'...

Error:

git clone --recursive [email protected]:DecodeGenetics/graphtyper.git graphtyper && cd graphtyper
Cloning into 'graphtyper'...
Warning: Permanently added the RSA host key for IP address '192.30.253.112' to the list of known hosts.
Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights

cheers

Error while loading some BAM indexes

@hannespetur Graphtyper is throwing an error for BAM files that have indexes named as file.bai. See example below;

input files: file.bam, file.bai - instead of file.bam, file.bam.bai

$ graphtyper genotype reference.fasta --sam=file.bam --region=<value> --output=test
$ Could not load local index file 'file.bam.bai'
<error> bamshrink.cpp:713 Could not read index file file.bam.bai

Is it possible to include support for such BAM/index pairs as some groups do name their indexes using that system i.e. file.bai?

Thanks

David.

How do I connect the input and output of the three software (manta+svimmer+graphtyper)

Hi
Manta has three variants file:diploidSV.vcf.gz ,candidateSV.vcf.gz and candidateSmallIndels.vcf.gz
so ,should i merge the corresponding which file type(candidateSV.vcf.gz ? discard diploidSV.vcf.gz andcandidateSmallIndels.vcf.gz) to the config.sh of make_graphtyper_pipeline.sh?
In addition, the make_graphtyper_pipeline.sh is suitable for connecting to manta+svimmer to run genotype ? can you give me a example of config.sh and running make_graphtyper_pipeline.sh ?

Thanks ,Looking forward to your reply!

Manta Inversion Split

Hi,

Congrats on your new paper and tool.

  1. I would like to genotype SVs identified by Manta. In the new version of Manta (1.6), Inversions are reported as breakends by default. "For a simple reciprocal inversion, four breakends will be reported, and they shall share the same EVENT INFO tag". A script is supplied to convert these inversions into single inverted sequence as previous versions have. My question is should the inversions be converted to be used by graphtyper or is the new format fine.

  2. If I only want to genotype SVs, do I need to include SNPs and indels in the graphs? Also, are crams supported?

Best wishes,
Mo

Hello, I don't think it should need that much, less than 4GB per thread should be enough for 350 samples (assuming 30x coverage).

Hello, I don't think it should need that much, less than 4GB per thread should be enough for 350 samples (assuming 30x coverage).

For me I typically get "Killed" message if it failed on memory though. Could you rerun with --verbose (or even --vverbose) to get a better idea where in the process it is failing.

Best,
Hannes

Originally posted by @hannespetur in #58 (comment)

Genotyping SNPs that are non-variable within dataset

Hi,

I am using graphtyper to genotype a specific list of SNPs supplied as a VCF (--vcf=) using the genotype option in about 100 samples.

I noticed that not all SNPs that are within the supplied VCFs are genotyped. If the position is not variable within the samples being genotyped, does graphtyper output such a site in the multi-sample VCF? (i.e. non-variable, all SNVs genotype ref/ref).

Thanks

Using a SV vcf from other callers than Manta

Hello,
I'm trying to use Graphtyper as it seems to be fast and to scale well to many samples. Very easy to install and run so far.. but I'm unsure that I'll be able to make the most of it.

My understanding from this issue #42 is that it is mostly able to use as a catalogue of variants the output of Manta (which can be filtered by discovery with other tools as suggested in issue 42).
Is this still up-to-date or is it now possible to use vcf from other tools? I'm thinking of course in Sv detection by long-reads (perhaps the output of sniffles?)...

I am also trying right now to use a SV database built with Smoove but I'm afraid I'm facing the problem of the lack of information (SVINSSEQ?) because all SV genotyped by Grpahtyper were called "LowQUAL".

Has anyone suggestions or advice to make the most of existing Sv database for genotyping Sv with Graphtyper?

BonusQuestion: How is the quality of the SV call determined? Is it based on coverage and is it a parameter that we could/should tune depending on the dataset?

Thanks a lot for your help
Best regards
Claire

Run full genotype pipeline with input vcf

When the genotype --vcf option is used then GraphTyper genotypes only using a graph created based on that VCF. Another important usecase is to also run the "normal" genotyping pipeline (with discovery) using an input VCF. Suggested in #25 .

make error

Dear,
I have cloned the graphtyper and to make it. The gcc version I used is "gcc (GCC) 4.8.5". However, there are some errors, following is the error messages. Please help me to solve it?

[  9%] Performing build step for 'project_sparsehash'
[ 29%] Built target project_snappy
[ 29%] [ 29%] Built target project_statgen
Built target project_htslib
[ 30%] Performing build step for 'project_rocksdb'
Makefile:1721: target `utilities/merge_operators/cassandra/test_utils.d' given more than once in the same rule.
  GEN      util/build_version.cc
  CC       db/builder.o
  CC       cache/sharded_cache.o
  CC       cache/clock_cache.o
  CC       db/c.o
In file included from ./db/memtable.h:30:0,
                 from ./db/memtable_list.h:17,
                 from ./db/column_family.h:19,
                 from ./db/version_set.h:33,
                 from ./db/compaction.h:13,
                 from ./db/compaction_iterator.h:16,
                 from db/builder.cc:18:
./util/dynamic_bloom.h: In member function ‘void rocksdb::DynamicBloom::AddHash(uint32_t, const OrFunc&)’:
./util/dynamic_bloom.h:169:18: internal compiler error: Bus error
   const uint32_t delta = (h >> 17) | (h << 15);  // Rotate right 17 bits
                  ^
In file included from cache/clock_cache.cc:32:0:
/usr/include/tbb/concurrent_hash_map.h: In member function ‘bool tbb::interface5::concurrent_hash_map<Key, T, HashCompare, A>::exclude(tbb::interface5::concurrent_hash_map<Key, T, HashCompare, A>::const_accessor&) [with Key = rocksdb::{anonymous}::CacheKey; T = rocksdb::{anonymous}::CacheHandle*; HashCompare = rocksdb::{anonymous}::CacheKey; Allocator = tbb::tbb_allocator<std::pair<rocksdb::{anonymous}::CacheKey, rocksdb::{anonymous}::CacheHandle*> >]’:
/usr/include/tbb/concurrent_hash_map.h:1083:1: internal compiler error: Bus error
 }
 ^
g++: internal compiler error: Bus error (program cc1plus)
g++: internal compiler error: Killed (program cc1plus)
0x409a70 execute
        ../.././gcc/gcc.c:2823
0x409da4 do_spec_1
        ../.././gcc/gcc.c:4615
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40a9af do_spec_1
        ../.././gcc/gcc.c:5374
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[4]: *** [db/builder.o] Error 4
make[4]: *** Waiting for unfinished jobs....
0x409a70 execute
        ../.././gcc/gcc.c:2823
0x409da4 do_spec_1
        ../.././gcc/gcc.c:4615
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40a9af do_spec_1
        ../.././gcc/gcc.c:5374
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[5]: *** [hashtable_test.o] Error 4
make[4]: *** [all] Error 2
0x8c0c9f crash_signal
        ../.././gcc/toplev.c:332
make[3]: *** [../sparsehash/src/project_sparsehash-stamp/project_sparsehash-build] Error 2
0xa7cdc0 ix86_decompose_address(rtx_def*, ix86_address*)
        ../.././gcc/config/i386/i386.c:11633
0xa7cfc6 ix86_delegitimize_tls_address
        ../.././gcc/config/i386/i386.c:13529
0xa4cd73 vt_expand_1pvar
        ../.././gcc/var-tracking.c:8459
0xa4d21f emit_note_insn_var_location
        ../.././gcc/var-tracking.c:8509
0xce7a67 htab_traverse_noresize
        ../.././libiberty/hashtab.c:784
0xa4dd0c emit_notes_for_changes
        ../.././gcc/var-tracking.c:8873
0xa4dead emit_notes_for_differences
        ../.././gcc/var-tracking.c:8987
0xa4dead vt_emit_notes
        ../.././gcc/var-tracking.c:9367
0xa4e866 variable_tracking_main_1
        ../.././gcc/var-tracking.c:10175
0xa4e866 variable_tracking_main()
        ../.././gcc/var-tracking.c:10189
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[4]: *** [cache/clock_cache.o] Error 1
make[2]: *** [CMakeFiles/project_sparsehash.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs....
g++: internal compiler error: Bus error (program cc1plus)
0x409a70 execute
        ../.././gcc/gcc.c:2823
0x409da4 do_spec_1
        ../.././gcc/gcc.c:4615
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40a9af do_spec_1
        ../.././gcc/gcc.c:5374
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
0x40acb7 do_spec_1
        ../.././gcc/gcc.c:5269
0x40c715 process_brace_body
        ../.././gcc/gcc.c:5872
0x40c715 handle_braces
        ../.././gcc/gcc.c:5786
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.
make[4]: *** [db/c.o] Error 4
make[3]: *** [../rocksdb/src/project_rocksdb-stamp/project_rocksdb-build] Error 2
make[2]: *** [CMakeFiles/project_rocksdb.dir/all] Error 2
make[1]: *** [src/CMakeFiles/graphtyper.dir/rule] Error 2
make: *** [graphtyper] Error 2

genotype_sv crash with v2.4

I tried running genotype_sv with the new release 2.4, but it immediately crashes and issues the following error message "cannot create std::vector larger than max_size()". I tried the same command with the older release 2.2 and it started running fine.

Here was the command I used (I renamed the binayr graphtyper2.4 on my machine):

graphtyper2.4 genotype_sv ref.fasta manta.SVs.vcf.gz --sams=bams --region_file=region.9.file --threads=1 --log GraphTyperSV.Run.9.log -v --output=allGenomes.SV

Given that I used all bam files, it would be not that easy to share the data... let me know if I need to try it again on a smaller dataset able to share with you for reproducing the issue.

Best,
David

regenotyped samples produces all Genptype (GT) with ./.

Hi, I have regenotyped 9000+ samples and gotten a small set of 72 samples with all genotype (GT) results to unknown (./.)

The other 9000 samples have around 2400 structural variant (SV) which are having unknown genotype out of a grand total of ~200,000 structural variant (SV) loci

I have included herein a list of log file for 1 sample with -vverbose in Grapyhtyper run from a scatter-gather approach.
work_Graphtyper2.zip

The command run is found below with shell variables to the respective needed items
graphtyper genotype_sv $REF_FA $MERGED_VCF
--region $region --sam $INPUT_CRAM --output $output_dir --log $output_dir/log.txt
--vverbose

These samples have a relevant Manta processed VCF too which managed to call out Genotypes.

I am unable to provide the VCF file due to non-disclosure clause for these samples.

Please do advise on what other files are useful in diagnosing the problem.

cmake error

Hi, I try to cmake grapthtyper, however, it is failed. Following is the cmake messages. I have installed the lz4 library.

cmake ../
-- Build type:
-- Building in release mode.
-- CXX flags are: -Wall -Wextra -Wfatal-errors -pedantic -Wno-variadic-macros -std=c++11 -DSEQAN_HAS_ZLIB=1 -DSEQAN_USE_HTSLIB=1 -DSEQAN_ENABLE_TESTING=0 -O3 -DNDEBUG -DSEQAN_ENABLE_DEBUG=0 -march=core2 -mtune=generic
-- Checking for zlib
-- Checking for bzip2
-- Checking for Boost
CMake Warning at /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:567 (message):
  Imported targets and dependency information not available for Boost version
  (all versions older than 1.33)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:907 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:1542 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:94 (find_package)


CMake Warning at /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:567 (message):
  Imported targets and dependency information not available for Boost version
  (all versions older than 1.33)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:907 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:1542 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:94 (find_package)


CMake Warning at /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:567 (message):
  Imported targets and dependency information not available for Boost version
  (all versions older than 1.33)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:907 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:1542 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:94 (find_package)


CMake Warning at /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:567 (message):
  Imported targets and dependency information not available for Boost version
  (all versions older than 1.33)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:907 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:1542 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:94 (find_package)


CMake Warning at /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:567 (message):
  Imported targets and dependency information not available for Boost version
  (all versions older than 1.33)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:907 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:1542 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:94 (find_package)


CMake Warning at /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:567 (message):
  Imported targets and dependency information not available for Boost version
  (all versions older than 1.33)
Call Stack (most recent call first):
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:907 (_Boost_COMPONENT_DEPENDENCIES)
  /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:1542 (_Boost_MISSING_DEPENDENCIES)
  CMakeLists.txt:94 (find_package)


CMake Error at /usr/local/share/cmake-3.10/Modules/FindBoost.cmake:1928 (message):
  Unable to find the requested Boost libraries.

  Unable to find the Boost header files.  Please set BOOST_ROOT to the root
  directory containing Boost or BOOST_INCLUDEDIR to the directory containing
  Boost's headers.
Call Stack (most recent call first):
  CMakeLists.txt:94 (find_package)


-- snappy target location is /home/xfyang/software/graphtyper/snappy/.libs/libsnappy.a
-- htslib target location is /home/xfyang/software/graphtyper/htslib/libhts.a
-- StatGen target location is /home/xfyang/software/graphtyper/statgen/libStatGen.a
-- Using GCC
-- Could NOT find LZ4 (missing: LZ4_GOOD_VERSION)
-- Using third-party bundled LZ4
-- ZSTD: /usr/local/include
-- Libraries: rocksdb;snappy;htslib;statgen;-lpthread;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libbz2.so;rt;/usr/lib/x86_64-linux-gnu/liblzma.so;/usr/local/lib/libzstd.so
-- Compiling graphtyper's source files
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
Boost_INCLUDE_DIR (ADVANCED)
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/src
   used as include directory in directory /home/xfyang/software/graphtyper/test
   used as include directory in directory /home/xfyang/software/graphtyper/test
   used as include directory in directory /home/xfyang/software/graphtyper/test
   used as include directory in directory /home/xfyang/software/graphtyper/test
   used as include directory in directory /home/xfyang/software/graphtyper/test
   used as include directory in directory /home/xfyang/software/graphtyper/test
   used as include directory in directory /home/xfyang/software/graphtyper/test
   used as include directory in directory /home/xfyang/software/graphtyper/test
   used as include directory in directory /home/xfyang/software/graphtyper/test/graph
   used as include directory in directory /home/xfyang/software/graphtyper/test/graph
   used as include directory in directory /home/xfyang/software/graphtyper/test/graph
   used as include directory in directory /home/xfyang/software/graphtyper/test/graph
   used as include directory in directory /home/xfyang/software/graphtyper/test/graph
   used as include directory in directory /home/xfyang/software/graphtyper/test/graph
   used as include directory in directory /home/xfyang/software/graphtyper/test/graph
   used as include directory in directory /home/xfyang/software/graphtyper/test/graph
   used as include directory in directory /home/xfyang/software/graphtyper/test/index
   used as include directory in directory /home/xfyang/software/graphtyper/test/index
   used as include directory in directory /home/xfyang/software/graphtyper/test/index
   used as include directory in directory /home/xfyang/software/graphtyper/test/index
   used as include directory in directory /home/xfyang/software/graphtyper/test/index
   used as include directory in directory /home/xfyang/software/graphtyper/test/index
   used as include directory in directory /home/xfyang/software/graphtyper/test/index
   used as include directory in directory /home/xfyang/software/graphtyper/test/index
   used as include directory in directory /home/xfyang/software/graphtyper/test/typer
   used as include directory in directory /home/xfyang/software/graphtyper/test/typer
   used as include directory in directory /home/xfyang/software/graphtyper/test/typer
   used as include directory in directory /home/xfyang/software/graphtyper/test/typer
   used as include directory in directory /home/xfyang/software/graphtyper/test/typer
   used as include directory in directory /home/xfyang/software/graphtyper/test/typer
   used as include directory in directory /home/xfyang/software/graphtyper/test/typer
   used as include directory in directory /home/xfyang/software/graphtyper/test/typer
   used as include directory in directory /home/xfyang/software/graphtyper/test/utilities
   used as include directory in directory /home/xfyang/software/graphtyper/test/utilities
   used as include directory in directory /home/xfyang/software/graphtyper/test/utilities
   used as include directory in directory /home/xfyang/software/graphtyper/test/utilities
   used as include directory in directory /home/xfyang/software/graphtyper/test/utilities
   used as include directory in directory /home/xfyang/software/graphtyper/test/utilities
   used as include directory in directory /home/xfyang/software/graphtyper/test/utilities
   used as include directory in directory /home/xfyang/software/graphtyper/test/utilities

-- Configuring incomplete, errors occurred!
See also "/home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeOutput.log".
See also "/home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeError.log".

Following is the CMakeError.log

Determining if the pthread_create exist failed with the following output:
Change Dir: /home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_38143/fast"
/usr/bin/make -f CMakeFiles/cmTC_38143.dir/build.make CMakeFiles/cmTC_38143.dir/build
make[1]: Entering directory `/home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_38143.dir/CheckSymbolExists.c.o
/usr/bin/cc    -o CMakeFiles/cmTC_38143.dir/CheckSymbolExists.c.o   -c /home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp/CheckSymbolExists.c
Linking C executable cmTC_38143
/usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_38143.dir/link.txt --verbose=1
/usr/bin/cc      -rdynamic CMakeFiles/cmTC_38143.dir/CheckSymbolExists.c.o  -o cmTC_38143
CMakeFiles/cmTC_38143.dir/CheckSymbolExists.c.o: In function `main':
CheckSymbolExists.c:(.text+0x16): undefined reference to `pthread_create'
collect2: error: ld returned 1 exit status
make[1]: *** [cmTC_38143] Error 1
make[1]: Leaving directory `/home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp'
make: *** [cmTC_38143/fast] Error 2

File /home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
/* */
#include <pthread.h>

int main(int argc, char** argv)
{
  (void)argv;
#ifndef pthread_create
  return ((int*)(&pthread_create))[argc];
#else
  (void)argc;
  return 0;
#endif
}

Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_d3e66/fast"
/usr/bin/make -f CMakeFiles/cmTC_d3e66.dir/build.make CMakeFiles/cmTC_d3e66.dir/build
make[1]: Entering directory `/home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_d3e66.dir/CheckFunctionExists.c.o
/usr/bin/cc   -DCHECK_FUNCTION_EXISTS=pthread_create   -o CMakeFiles/cmTC_d3e66.dir/CheckFunctionExists.c.o   -c /usr/local/share/cmake-3.10/Modules/CheckFunctionExists.c
Linking C executable cmTC_d3e66
/usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_d3e66.dir/link.txt --verbose=1
/usr/bin/cc  -DCHECK_FUNCTION_EXISTS=pthread_create    -rdynamic CMakeFiles/cmTC_d3e66.dir/CheckFunctionExists.c.o  -o cmTC_d3e66 -lpthreads
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status
make[1]: *** [cmTC_d3e66] Error 1
make[1]: Leaving directory `/home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp'
make: *** [cmTC_d3e66/fast] Error 2


Performing C SOURCE FILE Test LZ4_GOOD_VERSION failed with the following output:
Change Dir: /home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_5db01/fast"
/usr/bin/make -f CMakeFiles/cmTC_5db01.dir/build.make CMakeFiles/cmTC_5db01.dir/build
make[1]: Entering directory `/home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_5db01.dir/src.c.o
/usr/bin/cc   -DLZ4_GOOD_VERSION   -o CMakeFiles/cmTC_5db01.dir/src.c.o   -c /home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp/src.c
Linking C executable cmTC_5db01
/usr/local/bin/cmake -E cmake_link_script CMakeFiles/cmTC_5db01.dir/link.txt --verbose=1
/usr/bin/cc  -DLZ4_GOOD_VERSION    -rdynamic CMakeFiles/cmTC_5db01.dir/src.c.o  -o cmTC_5db01 /usr/lib/x86_64-linux-gnu/liblz4.so
make[1]: Leaving directory `/home/xfyang/software/graphtyper/release-build/CMakeFiles/CMakeTmp'

Return value: 1
Source file was:

#include <lz4.h>
int main() {
  int good = (LZ4_VERSION_MAJOR > 1) ||
    ((LZ4_VERSION_MAJOR == 1) && (LZ4_VERSION_MINOR >= 7));
return !good;
}

The cmake version is cmake version 3.10.0-rc5, and system is 14.04.1-Ubuntu SMP Mon Apr 16 18:40:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I don't know how to solve these problems, could you please help me to solve it.

Thanks

Some variants missed in the genotype_sv results

Dear @hannespetur,

According to the previous post (#42 (comment)), I finished most of the processes. However, a few issues occurred to me during running genotype_sv. I ran graphtyper genotype_sv for each reference chromosome separately.

First, I found some variants presented in the candidate VCF file but missed in the genotype_sv results.

Second, I cannot find the log file when using the following command
graphtyper genotype_sv ${ref} ${candidate.vcf.gz} --output=${out} --sams=${bamlist} --region=${chr} --threads=8 --log=log

Third, as you mentioned in the previous post, we can require FT format field to be "PASS" to get a accurate set of variants and calls. However, I found that few variants can meet above conditions. Thus, I decided to filter by PASS_ratio using following command, and then change the GT to ./. if its FT is FAILN.
bcftools view -f "PASS" -i 'INFO/SVMODEL="AGGREGATED" && INFO/PASS_ratio>=0.9'

Sincerely,
Zhuqing

Support for .csi indexed vcfs

Hi Hannes,

Relating to svimmer #2: when genotyping SVs with at least one breakpoint reaching into a chromosome segment > 512Mb, Graphtyper currently stops and raises the following error:

[E::hts_idx_check_range] Region 283784..714781126 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
[2020-06-26 12:04:41.161796] vcf.cpp:1191 Could not build VCF index
cp: cannot stat '/tmp/graphtyper_200626_120435_1_000280000.7yzvuk/graphtyper.vcf.gz.tbi': No such file or directory

So it looks like the first task lies in tailoring the Vcf::write_tbi_index() function in vcf.cpp towards a .csi output.

You mentioned that the main challenge overall lies in Graphtyper's current use of a cpp VCF library which doesn't support CSI indeces, could you please point me to this? Happy to get in touch with our team's cpp experts, and can also share the test data with you.

Many thanks,
Max

LowQUAL for all homozygous reference sites

The VCF filter field is always assigned "LowQUAL" for all the homozygous reference variants larger than 50bp. Not one had a FILTER=PASS.
Here is an example. This was not an issue for the het or homozygous non-reference.

chr1    21000   chr1:21000:DG   N       <DEL:SVSIZE=5000:AGGREGATED>    0       LowQUAL ABHet=-1;ABHom=1;AC=0;AF=0;AN=2;CR=0;END=26000;LOGF=0.07167;MaxAAS=0;MaxAASR=0;MaxAltPP=0;NHet=0;NHomAlt=0;NHomRef=1;PASS_AC=0;PASS_AN=2;PASS_ratio=1;QD=0;
RefLen=1;SB=0.2174;SBAlt=-1;SBF=10,0;SBF1=4,0;SBF2=6,0;SBR=36,0;SBR1=22,0;SBR2=14,0;SVLEN=5000;SVMODEL=AGGREGATED;SVSIZE=5000;SVTYPE=DEL;SV_ID=12;SeqDepth=46;VarType=DG        GT:FT:AD:MD:DP:RA:PP:GQ:PL      0/0:PASS:46,0:0:46:46,0:0:99:0,138,
255
chr1    21000   chr1:21000:DG.0 N       <DEL:SVSIZE=5000:BREAKPOINT>    0       LowQUAL ABHet=-1;ABHom=1;AC=0;AF=0;AN=2;CR=0;END=26000;LOGF=0.07167;MaxAAS=0;MaxAASR=0;MaxAltPP=0;NHet=0;NHomAlt=0;NHomRef=1;PASS_AC=0;PASS_AN=2;PASS_ratio=1;QD=0;
RefLen=1;SB=0.2174;SBAlt=-1;SBF=10,0;SBF1=4,0;SBF2=6,0;SBR=36,0;SBR1=22,0;SBR2=14,0;SVLEN=5000;SVMODEL=BREAKPOINT;SVSIZE=5000;SVTYPE=DEL;SV_ID=12;SeqDepth=46;VarType=DG        GT:FT:AD:MD:DP:RA:PP:GQ:PL      0/0:PASS:46,0:0:46:46,0:0:99:0,138,
255
chr1    21000   chr1:21000:DG.1 N       <DEL:SVSIZE=5000:COVERAGE>      0       LowABHom;LowQUAL        ABHet=-1;ABHom=0.7879;AC=0;AF=0;AN=2;CR=0;END=26000;LOGF=0.000571;MaxAAS=14;MaxAASR=0.2121;MaxAltPP=0;NHet=0;NHomAlt=0;NHomRef=1;PASS_AC=0;
PASS_AN=2;PASS_ratio=1;QD=0;RefLen=1;SB=0.2174;SBAlt=-1;SBF=10,0;SBF1=4,0;SBF2=6,0;SBR=36,0;SBR1=22,0;SBR2=14,0;SVLEN=5000;SVMODEL=COVERAGE;SVSIZE=5000;SVTYPE=DEL;SV_ID=12;SeqDepth=66;VarType=DG      GT:FT:AD:MD:DP:RA:PP:GQ:PL      0/0:PASS:52
,14:0:66:0,0:0:45:0,45,255


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.