sauloal avatar sauloal commented on June 9, 2024 1

I'm running now but i have a cVCF with 85 samples so takes several hours to split. Unfortunately TileDB-VCF still can't handle cVCF ;-)

Shelnutt2 avatar Shelnutt2 commented on June 9, 2024

@sauloal Thanks for reporting this, it looks like the segfault happens in htslib. Can you tell us which version of htslib you have installed? Looks like you are in conda, so can you paste the output of conda list ?

sauloal avatar sauloal commented on June 9, 2024

Thanks for the quick reply.

htslib 1.11 hd3b49d5_2 bioconda

Please find below the quite large list.

Shelnutt2 avatar Shelnutt2 commented on June 9, 2024

@sauloal Thank you for the information. At this time I'm not able to reproduce the crash directly. My suspicious is there is something with your VCF files headers which might be related the warning you get in the stats and export from htslib, [W::bcf_hdr_check_sanity] GL should be declared as Number=G. Is there anyway you could share an example VCF file that produces this error so we can debug it further? If you can't share the VCF publicly, if you want to email us at [email protected] we are happy to take a look.

Without reproducing another quick test would be to downgrade htslib from 1.11 to 1.10:
conda install -c bioconda htslib==1.10

sauloal avatar sauloal commented on June 9, 2024

Can't install

$ mamba install -f --no-deps -c conda-forge -c bioconda -c tiledb htslib==1.10

Problem: package libtiledbvcf-0.8.0-hbab4e3b_0 requires htslib >=1.11,<1.12.0a0, but none of the providers can be installed
UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions

Package libgcc-ng conflicts for:
htslib==1.10 -> openssl[version='>=1.1.1a,<1.1.2a'] -> libgcc-ng[version='>=7.2.0|>=9.3.0']
htslib==1.10 -> libgcc-ng[version='>=7.3.0']

Package openssl conflicts for:
htslib==1.10 -> openssl[version='>=1.1.1a,<1.1.2a']
htslib==1.10 -> libcurl[version='>=7.64.1,<8.0a0'] -> openssl[version='>=1.1.1b,<1.1.2a|>=1.1.1c,<1.1.2a|>=1.1.1d,<1.1.2a|>=1.1.1g,<1.1.2a']

Package zlib conflicts for:
python=3.8 -> zlib[version='>=1.2.11,<1.3.0a0']
htslib==1.10 -> zlib[version='>=1.2.11,<1.3.0a0']

aaronwolen avatar aaronwolen commented on June 9, 2024

@sauloal if it's not possible to share one of your VCF files could you check to see what the VCF format version number is? It also might help to take a look at the header data for one of the files, or at least the line defining the GL field.

sauloal avatar sauloal commented on June 9, 2024

Here is the header of the file.

What is strange is that exporting to TSV works without a problem.

##samtoolsVersion=0.1.14 (r933:170)
##INFO=<ID=CI95,Number=2,Type=Float,Description="Equal-tail Bayesian credible interval of the site allele frequency at the 95% level">
##INFO=<ID=RP,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##SnpEffCmd="SnpEff  -no-upstream -no-downstream -ud 0 -csvStats Slyc2.40 /home/assembly/tomato150/reseq/mapped/Heinz/RF_104_SZAXPI008751-74.vcf.gz "
##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##SnpEffVersion="3.2 (build 2013-05-23), by Pablo Cingolani"
##SnpEffCmd="SnpEff  -no-upstream -no-downstream -ud 0 -csvStats Slyc2.40 /home/assembly/tomato150/reseq/mapped/Heinz/RF_105_SZAXPI009358-45.vcf.gz "
##INFO=<ID=EFF,Number=.,Type=String,Description="Predicted effects for this variant.Format: 'Effect ( Effect_Impact | Functional_Class | Codon_Change | Amino_Acid_change| Amino_Acid_length | Gene_Name | Transcript_BioType | Gene_Coding | Transcript_ID | Exon  | GenotypeNum [ | ERRORS | WARNINGS ] )'">
##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to sourceFiles, f when filtered)">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##SnpEffVersion="4.3t (build 2017-11-24 10:18), by Pablo Cingolani"
##SnpEffCmd="SnpEff  S_lycopersicum_v2.50 /vcf-data/merge.vcf.gz "
##INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
##INFO=<ID=LOF,Number=.,Type=String,Description="Predicted loss of function effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
##INFO=<ID=NMD,Number=.,Type=String,Description="Predicted nonsense mediated decay effects for this variant. Format: 'Gene_Name | Gene_ID | Number_of_transcripts_in_gene | Percent_of_transcripts_affected'">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  /panfs/ANIMAL/group001/minjiumeng/tomato_reseq/SZAXPI008746-45  /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009284-57        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009285-62        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009286-74        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009287-75        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009288-79        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009289-84        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009290-87        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009291-88        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009292-89        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009293-90        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009294-93        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009295-94        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009296-95        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009297-102       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009298-108
       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009299-109       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009300-113       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009301-123       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009302-129       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009303-133       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009304-136       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009305-140       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009306-142       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009307-158       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009308-166       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009309-169       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009310-62        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009311-74        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009312-75        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009313-79        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009314-84        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009315-87
        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009316-88        /panfs/ANIMAL/group001/minjiumeng/tomato_reseq/SZAXPI008747-46  /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009317-89        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009318-90        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009319-93        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009320-94        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009321-95        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009322-102       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009323-108       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009324-109       /panfs/ANIMAL/group001/minjiumeng/tomato_reseq/SZAXPI008748-47  /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009326-113       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009327-123       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009328-129       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009329-133       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009330-136       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009331-140
       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009332-142       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009333-158       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009334-166       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009359-46        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009335-169       /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009336-14        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009337-15        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009338-16-2      /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009339-17-2      /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009340-18        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009341-19        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009342-21        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009343-22-2      /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009344-23        /panfs/ANIMAL/group001/minjiumeng/tomato_reseq/SZAXPI008749-56  /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009345-24        /panfs/ANIMAL/group001/minjiumeng/tomato_reseq/SZAXPI008752-75  /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009346-25        /panfs/ANIMAL/group001/minjiumeng/tomato_reseq/SZAXPI008753-79  /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009347-26        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009348-27        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009349-30        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009350-31        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009351-32        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009352-35        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009325-56        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009353-36        /panfs/ANIMAL/group001/minjiumeng/tomato_reseq/SZAXPI008750-57  /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009354-37        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009355-39        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009356-41        /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009357-44        /panfs/ANIMAL/group001/minjiumeng/tomato_reseq/SZAXPI008751-74  /ifshk5/PC_PA_EU/PMO/Tomato_reseq/01.BWA/SZAXPI009358-45
SL2.50ch00      280     .       A       C       30.8    .       AC1=2;AC=2;AF1=1;AN=2;DP4=0,0,2,0;DP=17;EFF=INTERGENIC(MODIFIER||||||||||1);FQ=-33;MQ=60;SF=50;VDB=0.0198;ANN=C|intergenic_region|MODIFIER|CHR_START-Solyc00g005000.2|CHR_START-gene:Solyc00g005000.2|intergenic_region|CHR_START-gene:Solyc00g005000.2|||n.280A>C||||||        GT:GQ:DP:PL     .       .       .       .       .       .       .       .       .       .       .       .       .       .       .
       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .
       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       1/1:10:2:62,6,0 .
       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .       .
       .       .       .       .       .       .       .       .       .       .       .       .       .

aaronwolen avatar aaronwolen commented on June 9, 2024

Thanks! It seems like this might be the same issue discussed here. Could you try running bcftools reheader as suggested and then re-ingesting to see if that fixes the export.

If you have vcftools installed you could also run the vcf-validator to make sure there are no other issues with the files.

sauloal avatar sauloal commented on June 9, 2024

I had to split my cVCF into single sample files therefore I get a lot of AN/AC errors but otherwise the file is fine.

Leading or trailing space in attr_key-attr_value pairs is discouraged:
        [Description] [Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ]
        INFO=<ID=ANN,Number=.,Type=String,Description="Functional annotations: 'Allele | Annotation | Annotation_Impact | Gene_Name | Gene_ID | Feature_Type | Feature_ID | Transcript_BioType | Rank | HGVS.c | HGVS.p | cDNA.pos / cDNA.length | CDS.pos / CDS.length | AA.pos / AA.length | Distance | ERRORS / WARNINGS / INFO' ">
The header tag 'reference' not present. (Not required but highly recommended.)
SL2.50ch00:3235 .. AN is 118, should be 2
SL2.50ch00:3235 .. AC is 118, should be 2
SL2.50ch00:4314 .. AN is 128, should be 2
SL2.50ch00:4314 .. AC is 128, should be 2

aaronwolen avatar aaronwolen commented on June 9, 2024

Any luck with bcftools reheader?

sauloal avatar sauloal commented on June 9, 2024


In summary. It solved the proble. Recoding the file with BCF tools and fixing the header manually. Not really high throughput but works.


I've added just a few files and it half works (at least gives a different error).

tiledbvcf export --uri 150_debug_cli --verbose --output-format z --sample-names SZAXPI008746-45,SZAXPI009284-57,SZAXPI009285-62,SZAXPI009286-74 --regions SL2.50ch00:1-100000 --output-dir 150_debug_cli_query
Sorted 1 regions in 4.62e-05 seconds.
Allocating 11 fields (17 buffers) of size 63161283 bytes (60.2353MB)
Initialized TileDB query with 1 start_pos ranges,4 samples for contig SL2.50ch00 (contig batch 1/1, sample batch 1/1).
Processed 44 cells in 0.0005062 sec. Reported 44 cells.
[E::bcf_fmt_array] Unexpected type 0

real    0m3.985s
user    0m0.331s
sys     0m0.815s

And it extracts 1 instead of 4 files

aaronwolen avatar aaronwolen commented on June 9, 2024

Thanks for the update. This looks like another VCF format error coming from htslib. If you’re able to share a couple of your files we’d be happy to help track down the issue.

sauloal avatar sauloal commented on June 9, 2024

So, I've finished adding all 84 samples

I've consolidated the database and vaccum it (6 hours to do so).

+ tiledbvcf utils consolidate fragment_meta --uri 150_debug_cli

real    0m4.862s
user    0m0.329s
sys     0m1.496s
+ tiledbvcf utils consolidate fragments --uri 150_debug_cli

real    357m27.628s
user    438m14.372s
sys     243m45.570s
+ tiledbvcf utils vaccum fragment_meta --uri 150_debug_cli

real    0m1.039s
user    0m0.161s
sys     0m0.117s
+ tiledbvcf utils vaccum fragments --uri 150_debug_cli

real    0m26.796s
user    0m0.452s
sys     0m21.811s

Still the same result:

tiledbvcf export --uri 150_debug_cli --verbose --output-format z --sample-names SZAXPI008746-45,SZAXPI009284-57,SZAXPI009285-62,SZAXPI009286-74 --regions SL2.50ch00:1-500000 --output-dir 150_debug_cli_query
Sorted 1 regions in 3.73e-05 seconds.
Allocating 11 fields (17 buffers) of size 63161283 bytes (60.2353MB)
Initialized TileDB query with 1 start_pos ranges,4 samples for contig SL2.50ch00 (contig batch 1/1, sample batch 1/1).
Processed 95 cells in 0.0003633 sec. Reported 95 cells.
Processed 92 cells in 0.0003643 sec. Reported 92 cells.
Processed 61 cells in 0.0005126 sec. Reported 61 cells.
Processed 97 cells in 0.0005054 sec. Reported 97 cells.
[E::bcf_fmt_array] Unexpected type 0

real    0m21.077s
user    0m1.040s
sys     0m0.312s

Each file ranges from 50 to 500 Mb compressed. How can I send it to you, let's say, 5 of them?


sauloal avatar sauloal commented on June 9, 2024

The plot thickens.
Exporting to BCF and TSV works. Just VCF crashes.

Shelnutt2 avatar Shelnutt2 commented on June 9, 2024

@sauloal thank you for the continued information. TileDB-VCF relies on htslib for both the BCF and VCF export. We build the in-memory record structure then pass things to htslib for putting it into the proper format in the file. TSV export is handled entirely inside TileDB-VCF. It seems that once we get a sample of your VCF files, we'll be able to track down the exact cause and push a fix into htslib to prevent the segfault and potentially also make some adjustment in on our side to help this export succeed.

Each file ranges from 50 to 500 Mb compressed. How can I send it to you, let's say, 5 of them?

If you can upload them to google drive/drop box that would work. You can email us at [email protected] with private links. If that isn't an option we can also give you temporary access an FTPS site where you could upload them. Lastly we can also provide a shared S3 bucket where you can upload, if you are an AWS user. Please let us know which you prefer.

I've consolidated the database and vaccum it (6 hours to do so).

One note here, you don't need to consolidate the fragments. Consolidating the fragment metadata is an important step to reduce the overhead when opening the array. Consolidating the fragments themselves is not needed, and this time consuming step can be avoided for your testing. Even in general with TileDB-VCF arrays, you should not need to consolidate the fragments in most use cases. TileDB efficiently prunes the fragments that do not intersect a query, so having a large number does not harm the read performance in most cases.

sauloal avatar sauloal commented on June 9, 2024

@aaronwolen @Shelnutt2
Thanks for your message. I've sent and email with the data.

Regarding the consolidation, I'm investigating using tiledb to a large deployment so I want to test its speed and reliability, motly curiosity and expectation to need to run it after inserting large amounts of data.

I've also noticed that consolidating the fragment metadata reduced the insertion time massively so i've made my scrip always do that after each insertion. after that insertion time remainined constant.

Shelnutt2 avatar Shelnutt2 commented on June 9, 2024

@sauloal We've identified the issue and adjusted TileDB-VCF to avoid the problem in htslib. @aaronwolen and I have validated the fix against your sample data. We are wrapping up a few other open pull requests now and will look to cut a release with the fix tomorrow morning. We'll let you know as soon as the conda package is available.

Fix: #263

sauloal avatar sauloal commented on June 9, 2024

@aaronwolen @Shelnutt2

I can confirm it is working and exporting successfully.

Thanks for the great work!

