Git Product home page Git Product logo

tiddit's Introduction

DESCRIPTION

TIDDIT: Is a tool to used to identify chromosomal rearrangements using Mate Pair or Paired End sequencing data. TIDDIT identifies intra and inter-chromosomal translocations, deletions, tandem-duplications and inversions, using supplementary alignments as well as discordant pairs. TIDDIT searches for discordant reads and splti reads (supplementary alignments). The supplementary alignments are assembled and aligned using a fermikit-like workflow. Next all signals (contigs, split-reads, and discordant pairs) are clustered using DBSCAN. The resulting clusters are filtered and annotated, and reported as SV depending on the statistics. TIDDIT has two analysis modules. The sv mode, which is used to search for structural variants. And the cov mode that analyse the read depth of a bam file and generates a coverage report. On a 30X human genome, the TIDDIT SV module typically completetes within 5 hours, and requires less than 10Gb ram.

INSTALLATION

TIDDIT requires python3 (=> 3.8), cython, pysam, and Numpy.

By default, tiddit will require, bwa, fermi2 and ropebwt2 for local assembly; local assembly may be disabled through the "--skip_assembly" parameter.

Installation

Cloning from Git Hub:

git clone https://github.com/SciLifeLab/TIDDIT.git

To install TIDDIT:

cd tiddit
pip install -e .

Next install fermi2, ropebwt2, and bwa, I recommend using conda:

conda install fermi2 ropebwt2 bwa

You may also compile bwa, fermi2, and ropebwt2 yourself. Remember to add executables to path, or provide path through the command line parameters.

tiddit --help
tiddit --sv --help
tiddit --cov --help

TIDDIT may be installed using bioconda:

conda install tiddit

or using the docker image on biocontainers

docker pull quay.io/biocontainers/tiddit:<tag>

visit https://quay.io/repository/biocontainers/tiddit?tab=tags for tags.

The SV module

The main TIDDIT module, detects structural variant using discordant pairs, split reads and coverage information

python tiddit --sv [Options] --bam in.bam --ref reference.fa

Where bam is the input bam or cram file. And reference.fasta is the reference fasta used to align the sequencing data: TIDDIT will crash if the reference fasta is different from the one used to align the reads. The reads of the input bam file must be sorted on genome position.

TIDDIT may be fine-tuned by altering these optional parameters:

-o	output prefix(default=output)
-i	paired reads maximum allowed insert size. Pairs aligning on the same chr at a distance higher than this are considered candidates for SV (default= 99.9th percentile of insert size)
-d	expected reads orientations, possible values "innie" (-> <-) or "outtie" (<- ->). Default: major orientation within the dataset
-p	Minimum number of supporting pairs in order to call a variant (default 3)
-r	Minimum number of supporting split reads to call a variant (default 3)
--threads	Number of threads (default 1)
-q	Minimum mapping quality to consider an alignment (default 5)
-n	the ploidy of the organism,(default = 2)
-e	clustering distance parameter, discordant pairs closer than this distance are considered to belong to the same variant(default = sqrt(insert-size*2)*12)
-c	average coverage, overwrites the estimated average coverage (useful for exome or panel data)
-l	min-pts parameter (default=3),must be set >= 2
-s	Number of reads to sample when computing library statistics(default=25000000)
-z 	minimum variant size (default=50), variants smaller than this will not be printed ( z < 10 is not recomended)
--force_ploidy	force the ploidy to be set to -n across the entire genome (i.e skip coverage normalisation of chromosomes)
--n_mask	exclude regions from coverage calculation if they contain more than this fraction of N (default = 0.5)
--skip_assembly	Skip running local assembly, tiddit will perform worse, but wont require fermi2, bwa, ropebwt and bwa indexed ref
--bwa	path to bwa executable file(default=bwa)
--fermi2	path to fermi2 executable file (default=fermi2)
--ropebwt2	path to ropebwt2 executable file (default=ropebwt2)
--p_ratio	minimum discordant pair/normal pair ratio at the breakpoint junction(default=0.1)
--r_ratio	minimum split read/coverage ratio at the breakpoint junction(default=0.1)
--max_coverage	filter call if X times higher than chromosome average coverage (default=4)
--min_contig	 Skip calling on small contigs (default < 10000 bp)

output:

TIDDIT SV module produces two output files, a vcf file containing SV calls, and a tab file dscribing the estimated ploidy and coverage across each contig.

The cov module

Computes the coverge of different regions of the bam file

python TIDDIT.py --cov [Options] --bam bam

optional parameters:

-o - the prefix of the output files
-z - compute the coverage within bins of a specified size across the entire genome, default bin size is 500
-w - generate a wig file instead of bed

--ref - reference sequence (fasta), required for reading cram file.

Filters

TIDDIT uses four different filters to detect low quality calls. The filter field of variants passing these tests are set to "PASS". If a variant fail any of these tests, the filter field is set to the filter failing that variant. These are the four filters empoyed by TIDDIT:

Expectedlinks
Less than <p_ratio> fraction of the spanning pairs or <r_ratio> fraction reads support the variant
FewLinks
    The number of discordant pairs supporting the variant is too low compared to the number of discordant pairs within that genomic region.
Unexpectedcoverage
    High coverage

Failed Variants may be removed using tools such as VCFtools or grep. Removing these variants greatly improves the precision of TIDDIT, but may reduce the sensitivity. It is adviced to remove filtered variants or prioritize the variants that have passed the quality checks. This command may be usedto filter the TIDDIT vcf:

grep -E "#|PASS" input.vcf > output.filtered.vcf

Quality column

The scores in the quality column are calculated using non parametric sampling: 1000 points/genomic positions are sampled across each chromosome. And the number of read-pairs and reads spanning these points are counted. The variant support of each call is compared to these values, and the quality column is set to he lowest percentile higher than the (variant support*ploidy).

Note: SVs usually occur in repetetive regions, hence these scores are expected to be relatively low. A true variant may have a low score, and the score itself depends on the input data (mate-pair vs pe for instance).

Merging the vcf files

I usually merge vcf files using SVDB (https://github.com/J35P312)

svdb --merge --vcf file1.vcf file2.vcf --bnd_distance 500 --overlap 0.6 > merged.vcf

Merging of vcf files could be useful for tumor-normal analysis or for analysing a pedigree. But also to combine the output of multiple callers.

Tumor normal example

run the tumor sample using a lower ratio treshold (to allow for subclonal events, and to account for low purity)

python TIDDIT.py --sv --p_ratio 0.10 --bam tumor.bam -o tumor --ref reference.fasta grep -E "#|PASS" tumor.vcf > tumor.pass.vcf

run the normal sample

python TIDDIT.py --sv --bam normal.bam -o normal --ref reference.fasta grep -E "#|PASS" normal.vcf > normal.pass.vcf

merge files:

svdb --merge --vcf tumor.pass.vcf normal.pass.vcf --bnd_distance 500 --overlap 0.6 > Tumor_normal.vcf

The output vcf should be filtered further and annotated (using a local-frequency database for instance)

Annotation

genes may be annotated using vep or snpeff. NIRVANA may be used for annotating CNVs, and SVDB may be used as a frequency database

Algorithm

Discordant pairs, split reads (supplementary alignments), and contigs are extracted. A discordant pair is any pair having a larger insert size than the -i paramater, or a pair where the reads map to different chromosomes. supplementary alignments and discordant pairs are only extracted if their mapping quality exceed the -q parameter. Contigs are generated by assembling all reads with supplementary alignment using fermi2

The most recent version of TIDDIT uses an algorithm similar to DBSCAN: A cluster is formed if -l or more signals are located within the -e distance. Once a cluster is formed, more signals may be added if these signals are within the -e distance of -l signals within a cluster.

A cluster is rejected if it contains less than -r plus -p signals. If the cluster is rejected, it will not be printed to the vcf file.

If the cluster is not rejected, it will be printed to file, even if it fails any quality filter.

The sensitivity and precision may be controlled using the -q,r,p, and -l parameters.

LICENSE

All the tools distributed with this package are distributed under GNU General Public License version 3.0 (GPLv3).

tiddit's People

Contributors

countdigi avatar dnil avatar j35p312 avatar jemten avatar jessime avatar kristinebilgrav avatar vezzi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tiddit's Issues

Missing SVLEN

It seems like tiddit skips SVLEN now and then. It would be nice if it was included.

[solution] Build failure with gcc-11

Hi,

TIDDIT does not build with GCC-11, leaving this error:

> /usr/bin/c++  -I/usr/include/gtest -Wall -g -O3 -pthread -MD -MT CMakeFiles/runUnitTests.dir/stats_unittest.cpp.o -MF CMakeFiles/runUnitTests.dir/stats_unittest.cpp.o.d -o CMakeFiles/runUnitTests.dir/stats_unittest.cpp.o -c /<<PKGBUILDDIR>>/stats_unittest.cpp
> /<<PKGBUILDDIR>>/fastq_unittest.cpp: In member function ‘virtual void Fastq_ReadFromFile_Test::TestBody()’:
> /<<PKGBUILDDIR>>/fastq_unittest.cpp:53:78: error: taking address of rvalue [-fpermissive]
>    53 |         string expectedName = static_cast<ostringstream*>( &(ostringstream() << counter) )->str();
>       |                                                             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~
> /<<PKGBUILDDIR>>/fasta_unittest.cpp: In member function ‘virtual void Fasta_ReadFromFile_Test::TestBody()’:
> /<<PKGBUILDDIR>>/fasta_unittest.cpp:97:78: error: taking address of rvalue [-fpermissive]
>    97 |         string expectedName = static_cast<ostringstream*>( &(ostringstream() << counter) )->str();
>       |                                                             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~
> make[3]: *** [CMakeFiles/runUnitTests.dir/build.make:93: CMakeFiles/runUnitTests.dir/fastq_unittest.cpp.o] Error 1

This stems from int2str function, but when I grepped between versions, it looks like this function is not being used since version 2.0.1, hence this could simply be removed. Could you do so?

Patch:

--- a/src/data_structures/Translocation.cpp
+++ b/src/data_structures/Translocation.cpp
@@ -9,11 +9,6 @@
 #include <string>
 #include <cmath>  
 
-string int2str(int to_be_converted){
-	string converted= static_cast<ostringstream*>( &(ostringstream() << to_be_converted) )->str();
-	return(converted);
-}
-
 void Window::initTrans(SamHeader head) {
 	uint32_t contigsNumber = 0;
 	SamSequenceDictionary sequences  = head.Sequences;

and if you don't want to remove it, here is the proper fix:

diff --git a/src/data_structures/Translocation.cpp b/src/data_structures/Translocation.cpp
index b46ee54..691a4ac 100644
--- a/src/data_structures/Translocation.cpp
+++ b/src/data_structures/Translocation.cpp
@@ -10,7 +10,9 @@
 #include <cmath>  
 
 string int2str(int to_be_converted){
-	string converted= static_cast<ostringstream*>( &(ostringstream() << to_be_converted) )->str();
+	ostringstream ret_ostream;
+	ret_ostream << to_be_converted;
+	string converted(ret_ostream.str());
 	return(converted);
 }

Thanks!

================================================================================================
CC: @J35P312

VCF header

When I run TIDDIT v2.8.1, the header of the generated VCF has a line describing the INFO field 'OR' as

##INFO=<ID=OR,Number=4,Type=Integer,Description="Orientation of the pairs (FF,FR,RF,FR)">

Should it be

##INFO=<ID=OR,Number=4,Type=Integer,Description="Orientation of the pairs (FF,FR,RF,RR)">

or am I mistaken?

bwa index error

Hi all,

I am using bwa version 0.7.17-r1188, and the output of bwa index is this following for my genome file:

kn99_chr1.fasta.0123         kn99_chr1.fasta.ann          kn99_chr1.fasta.pac
kn99_chr1.fasta.amb          kn99_chr1.fasta.bwt.2bit.64

My working directory from which I am launching TIDDIT looks like this (the index files are all symlinked into the $PWD)

A1_35_8_markdup.bam          kn99_chr1.fasta.0123         kn99_chr1.fasta.pac
A1_35_8_markdup.bam.bai      kn99_chr1.fasta.amb          versions.yml
bwamem2                      kn99_chr1.fasta.ann
kn99_chr1.fasta              kn99_chr1.fasta.bwt.2bit.64

and my TIDDIT command looks like this:

tiddit \
    --sv \
    -n 1 \
    --bam A1_35_8_markdup.bam \
    --ref kn99_chr1.fasta \
    -o A1_35_8

I am getting this error message:

error, The reference must be indexed using bwa index

which is occurring in __main__.py between lines 72 to 74:


			if not os.path.isfile(args.ref+".bwt") and not os.path.isfile(args.ref+".64.bwt"):
				print ("error, The reference must be indexed using bwa index")
				quit()

Am I doing something wrong with bwa index, or is TIDDIT expecting a different version than I am using?

FORMAT/RD missing from header

I recently used tiddit but FORMAT/RD was missing from the header:

[...]
##FILTER=<ID=Density,Description="The discordant reads cluster too tightly">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of paired-ends that support the event">
##FORMAT=<ID=RV,Number=1,Type=Integer,Description="Number of split reads that support the event">
##FORMAT=<ID=DR,Number=2,Type=Integer,Description="Number of paired-ends that supporting the reference allele (breakpoint A, and B)">
##FORMAT=<ID=RR,Number=2,Type=Integer,Description="Number of reads supporting the reference allele (breakpoint A, and B)">
##FORMAT=<ID=COV,Number=3,Type=Float,Description="Coverage (at A,B, and between)">
##FORMAT=<ID=LQ,Number=2,Type=Float,Description="Fraction of low quality reads">
##LibraryStats=TIDDIT-3.3.0 Coverage=35.559999987483025  ReadLength=150.7220850111166 MeanInsertSize=424.15720315371186 STDInsertSize=794.6088180408016 Reverse_Forward=False
[...]

Assertion failed, Abort trap: 6

When I run this command on my local:

$ tiddit --sv --bam /Users/islekbro/Desktop/SV/deneme2/variants.bam -o /Users/islekbro/Desktop/SV/deneme2/out --ref /Users/islekbro/Desktop/SV/deneme2/reference.fa --fermi2 /Users/islekbro/miniconda3/bin/fermi2 --ropebwt2 /Users/islekbro/miniconda3/bin/ropebwt2

I got this error:

LIBRARY STATISTICS
Pair orientation = Forward-Reverse
Average Read length = 150.0
Average insert size = 346.165
Stdev insert size = 50.355182867969674
99.95 percentile insert size = 490.20400000000063

('total', 0.00027108192443847656)
Writing signals to file
extracted signals in:
-0.0012137889862060547
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3432: RuntimeWarning: Mean of empty slice.
return _methods._mean(a, axis=axis, dtype=dtype,
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in double_scalars
ret = ret.dtype.type(ret / rcount)
calculated coverage in:
0.0018231868743896484
[M::main_ropebwt2] constructed FM-index in 0.000 sec, 0.000 CPU sec
[M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (0, 0, 0, 0, 0, 0)
[M::main_ropebwt2] rld: (tot, $, A, C, G, T, N) = (0, 0, 0, 0, 0, 0, 0)
[M::main] Version: r187
[M::main] CMD: /Users/islekbro/miniconda3/bin/ropebwt2 -dNCr /Users/islekbro/Desktop/SV/deneme2/out_tiddit/clips.fa
[M::main] Real time: 0.004 sec; CPU: 0.007 sec
Assertion failed: (e->mcnt[1] >= n_threads * 2), function fm6_unitig, file unitig.c, line 414.
sh: line 1: 76755 Done /Users/islekbro/miniconda3/bin/ropebwt2 -dNCr /Users/islekbro/Desktop/SV/deneme2/out_tiddit/clips.fa
76756 Abort trap: 6 | /Users/islekbro/miniconda3/bin/fermi2 assemble -t 1 -l 81 - > /Users/islekbro/Desktop/SV/deneme2/out_tiddit/clips.fa.assembly.mag
Clip read assembly in:
0.06505513191223145
generated clusters in
0.0006840229034423828
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/tiddit", line 11, in
load_entry_point('tiddit', 'console_scripts', 'tiddit')()
File "/Users/islekbro/TIDDIT/tiddit/main.py", line 167, in main
variants=tiddit_variant.main(bam_file_name,sv_clusters,args,library,min_mapq,samples,coverage_data,contig_number,max_ins_len)
File "tiddit/tiddit_variant.pyx", line 546, in tiddit.tiddit_variant.main
File "<array_function internals>", line 180, in percentile
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4166, in percentile
return _quantile_unchecked(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked
r, k = _ureduce(a,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce
r = func(a, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func
result = _quantile(arr,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4699, in _quantile
take(arr, indices=-1, axis=DATA_AXIS)
File "<array_function internals>", line 180, in take
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 190, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

I am wondering if I am missing a detail, how can we proceed this run?

wrong outputs?

Hi,
I use the following cmd "TIDDIT --sv -b NA12878_S10_R1_001.bam -o NA12878_S10_R1_001.bam.out.vcf". The results are NA12878_S10_R1_001.bam.out.vcf.tab and NA12878_S10_R1_001.bam.out.vcf.signals.tab; however, there is no sv vcf output. The NA12878_S10_R1_001.bam.out.vcf.tab are the coverage data. The NA12878_S10_R1_001.bam.out.vcf.signals.tab seems to be the parsing result of a bam for each read. I think I may miss something. Is there a vcf of output SVs? Thanks a lot!

Yue

VCF output: mismatch between header and FORMAT fields

Hello,

I am using TIDDIT as part of nf-core sarek pipeline (version 3.0.1). I noted a discrepancy between commented header description and actual FORMAT fields.
Here is the commented header:
##FORMAT=<ID=CN,Number=1,Type=Integer,Description="Copy number genotype for imprecise events">
##FORMAT=<ID=COV,Number=3,Type=Float,Description="Coverage (at A,B, and between)">
##FORMAT=<ID=DR,Number=2,Type=Integer,Description="Number of paired-ends that supporting the reference allele (breakpoint A, and B)">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of paired-ends that support the event">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=LQ,Number=2,Type=Float,Description="Fraction of low quality reads">
##FORMAT=<ID=RR,Number=2,Type=Integer,Description="Number of reads supporting the reference allele (breakpoint A, and B)">
##FORMAT=<ID=RV,Number=1,Type=Integer,Description="Number of split reads that support the event">

Here is the actual FORMAT line for each event:
GT:CN:COV:DR:SR:LQ:RR:RD

As you can see,, DV and RD fields are absent from description . Also, according to the description DR should have 2 integers, however each SV event has only one:

1/1:4:83,148,244:0:3:0.0,0.0:39,223:56,285

Thank you,
Vladimir

QUAL missing

Dear all,
is there a reason with SVs called by TIDDIT do not have a QUAL value?
Would it make sense to use SVtyper?

Stuck after Constructed GC wig

Hello,
I'm running TIDDIT sv with this command "python ./TIDDIT.py --sv -o sim10x.tiddit -p 5 --bam sim_10x.bam --ref human_g1k_v37.fasta". It works fine, however it is stuck close to end as given below. Any idea?

Thanks

working on seqence GL000212.1
working on seqence GL000222.1
working on seqence GL000200.1
working on seqence GL000193.1
working on seqence GL000194.1
working on seqence GL000225.1
working on seqence GL000192.1
signal extraction time consumption= 1116s
Generating GC wig file
Constructed GC wig in 316.929698944 sec

Dependency is Python 3.8 or higher, doesn't work with any python3 version

Hello,

Thank you for the tool, really appreciate it!
After installing all the dependencies, TIDDIT fails to complete its run, it exits with a statistics error:

File "python3.6/statistics.py", line 507, in mode
    'no unique mode; found %d equally common values' % len(table)
statistics.StatisticsError: no unique mode; found 2 equally common values

From I could find, in Python 3.8 the behaviour of statistics.mode has changed:

Changed in version 3.8: Now handles multimodal datasets by returning the first mode encountered.
Formerly, it raised StatisticsError when more than one mode was found.

This makes me think that your dependency is Python 3.8 at minimum.

Parameters for exome seq data

Hello TIDDIT Developers,

I'm planning to use TIDDIT for structural variant (SV) calling from exome sequencing data. I have a few questions regarding the parameters:

  1. When using TIDDIT for SV calling from exome sequencing data, do I need to specify any additional parameters specific to exomes? Or will the tool handle the analysis automatically based on the input data?

  2. The -c parameter allows setting the average coverage. The help mentions that it would be useful for exome or panel data. How should I set the coverage for exome sequencing data? Are there any guidelines or recommendations?

  3. Apart from the -c parameter, are there any additional parameters I should consider adjusting for exome data?

Please provide guidance on the appropriate parameter settings for exome SV calling. Thank you!

Best regards,
Matthias

Document coverage features..

In particular, it would be nice to show the '-a' option and note that span coverage can be produced / is produced by default? 😜

TiDDIT seems to be more sensitive in the version of index files

Hi,

I encountered an issue while using TiDDIT/Manta/smoove in a Nextflow workflow to call variants. During the execution of TiDDIT, I received an error message regarding older versions of BAI files. Interestingly, the other callers worked fine and completed successfully.

To resolve the problem with TiDDIT, I re-indexed and managed to complete the process. I'm curious if TiDDIT has additional requirements for index files.

Thank you,


MicrosoftTeams-image (1)

Inverted DUP

A random thought: does the current clustering handle inverted dups? How are they classified?

Error when using alt-contigs for Tiddit-sv

Hi,

I'm not sure if this is a bug, but I've has the following error using 3.3.1 and 3.4.0 versions of Tiddit.

Traceback (most recent call last):
    File "/usr/local/bin/tiddit", line 11, in <module>
      sys.exit(main())
    File "/usr/local/lib/python3.9/site-packages/tiddit/__main__.py", line 151, in main
      tiddit_contig_analysis.main(prefix,sample_id,library,contigs,coverage_data,args)
    File "tiddit/tiddit_contig_analysis.pyx", line 133, in tiddit.tiddit_contig_analysis.main
    File "tiddit/tiddit_contig_analysis.pyx", line 57, in tiddit.tiddit_contig_analysis.read_contigs
  KeyError: 'HLADRB1*04:03:01'

It seems like perhaps it's some sort of parsing error, because looks like it should be formatted as HLA-DRB1*04:03:01 (i.e. it's missing a dash). So HLADRB1*04:03:01 is "missing" from my input file because the query is malformed, possibly? I should mention this only seems to be an issue on one of my inputs (Platinum genome ERR194159, using the fasta reference from Sarek iGenome Homo_sapiens_assembly38.fasta).

Do you have any suggestions or fix for this one? Please let me know if you'd like more info.

Thanks very much!

Sarah

reference mismatch error

Hi.
TIDDIT is running fine for me on GRCh37 bams, but I'm running into a problem with GRCh38.

I'm getting a reference mismatch error, here are the last few lines of output and the error:
working on seqence HLA-DRB108:03:02
working on seqence HLA-DRB1
09:21
working on seqence HLA-DRB111:01:02
working on seqence HLA-DRB1
15:02:01
variant calling time consumption= 5609s
error: reference mismatch!

I checked to make sure that the bam is aligned to the reference fasta I'm specifying, and I checked to make sure that the chromosome names are the same in both. The only difference I can see is unaligned reads with chromosome "*" in the bam.

Any ideas?

Thanks!

get_gc creates format restriction for reference fasta

I'm having to work with a version of a reference genome that has been modified internally, and in the process, the description lines for each chromosome/etc. have gained an extra > character in the description itself. This breaks get_gc here. Perhaps it would be possible to use re.split("\n>|^>", sequence) or similar instead, so that the string parse is more resilient to idiosyncrasies of the input fasta? Thanks!

How genotype are assigned ?

Hello,

It seems great approach to incorporate coverage along with split read and discordant read.
I am trying TIDDIT and come across an issue. When I looked at genotype in VCF it has 3 categories
./1, 0/1, 1/1. The latter two are understood but what does "./1" represents? 50% of deletions have that genotype.

Thanks,
Nick

Some questions

Hello,
resulting vfc file from SV calling has column REF completely N and ALT columns contains N for translocations e.g N[21:10775188[

Do you know a reason for such result?

Best

memory and cpu

Hello,

I'm trying to run TIDDIT. Is there a way I can tell the script to run on certain threads and memory or it will auto-detect the maximum on my machine?

What's your recommended threads and memory to run on human whole genome samples?

Thanks,
lz

Update Dockerfile

Hello!

I was wondering if it would be possible to get an updated Dockerfile for tiddit 3.x? I was hoping it might be as simple as:

FROM continuumio/miniconda3

RUN conda config --add channels defaults &&\
    conda config --add channels bioconda &&\
    conda config --add channels conda-forge &&\
    conda config --set channel_priority strict &&\
    conda install -c bioconda tiddit

but when I do this I end up with:

#5 7.839 PackagesNotFoundError: The following packages are not available from current channels:
#5 7.839
#5 7.839   - tiddit
#5 7.839
#5 7.839 Current channels:
#5 7.839
#5 7.839   - https://conda.anaconda.org/bioconda/linux-aarch64
#5 7.839   - https://conda.anaconda.org/bioconda/noarch
#5 7.839   - https://repo.anaconda.com/pkgs/main/linux-aarch64
#5 7.839   - https://repo.anaconda.com/pkgs/main/noarch
#5 7.839   - https://repo.anaconda.com/pkgs/r/linux-aarch64
#5 7.839   - https://repo.anaconda.com/pkgs/r/noarch

presumably because tiddit is currently only build for linux-64 as seen here, which doesn't match the needed architecture?

Alternatively, it may make sense to build from source in the image, depending on what tradeoffs you want to make.

IndexError: index 13 is out of bounds for axis 0 with size 13

Hi,

I used TIDDIT to call SVs but encountered the following error:

variant calling time consumption= 4878s
Traceback (most recent call last):
  File "/home/work01/tools/TIDDIT/TIDDIT.py", line 54, in <module>
    TIDDIT_clustering.cluster(args)
  File "TIDDIT_clustering.py", line 681, in TIDDIT_clustering.cluster
    header,chromosomes,library_stats=signals(args,coverage_data)
  File "TIDDIT_clustering.py", line 164, in TIDDIT_clustering.signals
    coverage_data[chrB][int(math.floor(posB/100)),2]+=1
IndexError: index 13 is out of bounds for axis 0 with size 13

And another small question, there are two identical parameters when typing python TIDDIT.py --sv -h

-m M             minimum variant size,(default = 100)
-z Z             minimum variant size (default=100)

What's the differences of these two?

Thanks in advance!

reference not required

Hi, just to confirm that ITDDIT does not require reference to run anymore, is that correct?

Thanks,
Hurley

error: reference mismatch!

Hello,

I am using hg38 as reference but I have this error : "error: reference mismatch!". Although when I used an other version of hg38 I don't have this problem the script is running well.
The only difference I think I have is that in one version the chromosome are annotated as 1, 2, 3 .... and in the other : chr1, chr2, chr3 ...

How can I solve this problem ?

Best regards,

Laura

tiddit_variant error

I used small .bam files to call SVs but the process is aborted everytime. (I have been succeeded for just once). Last time I used Manta's example .bam & reference.fasta files (https://github.com/Illumina/manta/tree/master/src/demo/data) to do it, I received the error that I mention below. On the other hand, I should add that I used the TIDDIT-3.6.0 version. How can we solve this problem, or what is the reason of that issue, could you please enlighten me?

Run Code

$ tiddit --sv --bam /Users/islekbro/Desktop/SV/bams/hg19/MANTA/G15512.HCC1954.1.COST16011_region.bam -o /Users/islekbro/Desktop/SV/deneme3/manta_out --ref /Users/islekbro/Desktop/SV/genomes/manta.fa --fermi2 /Users/islekbro/miniconda3/bin/fermi2 --ropebwt2 /Users/islekbro/miniconda3/bin/ropebwt2

Error

LIBRARY STATISTICS
Pair orientation = Forward-Reverse
Average Read length = 101.0
Average insert size = 322.6496907216495
Stdev insert size = 57.08645970234663
99.95 percentile insert size = 581.020000000015

Collecting signals on contig: 8
Collecting signals on contig: 11
('total', 0.3015010356903076)
Writing signals to file
extracted signals in:
-0.3036017417907715
calculated coverage in:
7.124830961227417
[M::main_ropebwt2] inserted 129948 symbols in 0.229 sec, 0.021 CPU sec
[M::main_ropebwt2] constructed FM-index in 0.229 sec, 0.021 CPU sec
[M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (1274, 38888, 25449, 25449, 38888, 0)
[M::main_ropebwt2] rld: (tot, $, A, C, G, T, N) = (129948, 1274, 38888, 25449, 25449, 38888, 0)
[M::main] Version: r187
[M::main] CMD: /Users/islekbro/miniconda3/bin/ropebwt2 -dNCr /Users/islekbro/Desktop/SV/deneme3/manta_out_tiddit/clips.fa
[M::main] Real time: 0.234 sec; CPU: 0.029 sec
[M::fm6_unitig] choose prime 123457
[M::main] Version: r178
[M::main] CMD: /Users/islekbro/miniconda3/bin/fermi2 assemble -t 1 -l 81 -
[M::main] Real time: 0.309 sec; CPU: 0.078 sec
Clip read assembly in:
0.570451021194458
generated clusters in
0.0017549991607666016
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/tiddit", line 11, in
load_entry_point('tiddit', 'console_scripts', 'tiddit')()
File "/Users/islekbro/TIDDIT/tiddit/main.py", line 167, in main
variants=tiddit_variant.main(bam_file_name,sv_clusters,args,library,min_mapq,samples,coverage_data,contig_number,max_ins_len)
File "tiddit/tiddit_variant.pyx", line 546, in tiddit.tiddit_variant.main
File "<array_function internals>", line 180, in percentile
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4166, in percentile
return _quantile_unchecked(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4424, in _quantile_unchecked
r, k = _ureduce(a,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 3725, in _ureduce
r = func(a, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4593, in _quantile_ureduce_func
result = _quantile(arr,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/lib/function_base.py", line 4699, in _quantile
take(arr, indices=-1, axis=DATA_AXIS)
File "<array_function internals>", line 180, in take
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 190, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)

Error when using CRAM file as input

When running TIDDIT with a CRAM file, i get the error:

BAM file has no @HD SO:<SortOrder> attribute: it is not possible to determine the sort order
Traceback (most recent call last):
  File "/services/tools/tiddit/2.7.0/TIDDIT.py", line 65, in <module>
    TIDDIT_calling.cluster(args)
  File "TIDDIT_calling.py", line 190, in TIDDIT_calling.cluster
    coverage_data=TIDDIT_coverage.coverage(args)
  File "TIDDIT_coverage.py", line 8, in TIDDIT_coverage.coverage
    for line in open(args.o+".tab"):
IOError: [Errno 2] No such file or directory: 'OUTPUT.tab'

Looking a bit more, it seems that it is the cov module that fails, even though the CRAM file has the attribute:

@HD     VN:1.6  SO:coordinate

Does TIDDIT support CRAM files?
thanks,

`ValueError`: file does not contain alignment data

I got the below error. It seems the error arises from read_contigs step. I suspect it might be due to non-canonical chromosomes. Any thought what could be the problem and how to fix?

$ tiddit --sv \
--threads $threads \
--ref $mount_dir/genome.fa \
--bam $mount_dir/$out_dir/biscuit/$normal_bam \
-o $mount_dir/$out_dir/tiddit/LC_AC00

LIBRARY STATISTICS
	Pair orientation = Forward-Reverse
	Average Read length = 150.0
	Average insert size = -10541.058613317655
	Stdev insert size = 1159626.655535344
	99.95 percentile insert size = 557.0

Collecting signals on contig: chr16_KI270728v1_random
Collecting signals on contig: chr17
Collecting signals on contig: chr1_KI270707v1_random
Collecting signals on contig: chr14_KI270724v1_random
Collecting signals on contig: chr1_KI270708v1_random
Collecting signals on contig: chr14_KI270723v1_random
Collecting signals on contig: chr1_KI270710v1_random
Collecting signals on contig: chr14_GL000009v2_random
Collecting signals on contig: chr15_KI270727v1_random
Collecting signals on contig: chr14
Collecting signals on contig: chr10
Collecting signals on contig: chr14_KI270726v1_random
Collecting signals on contig: chr1_KI270706v1_random
Collecting signals on contig: chr15
Collecting signals on contig: chr17_GL000205v2_random
Collecting signals on contig: chr14_KI270725v1_random
Collecting signals on contig: chr18
Collecting signals on contig: chr11_KI270721v1_random
Collecting signals on contig: chr14_GL000194v1_random
Collecting signals on contig: chr16
Collecting signals on contig: chr14_KI270722v1_random
Collecting signals on contig: chr1_KI270712v1_random
Collecting signals on contig: chr1_KI270709v1_random
Collecting signals on contig: chr12
Collecting signals on contig: chr14_GL000225v1_random
Collecting signals on contig: chr19
Collecting signals on contig: chr13
Collecting signals on contig: chr1
Collecting signals on contig: chr1_KI270711v1_random
Collecting signals on contig: chr17_KI270730v1_random
Collecting signals on contig: chr17_KI270729v1_random
Collecting signals on contig: chr11
Collecting signals on contig: chr1_KI270713v1_random
Collecting signals on contig: chr1_KI270714v1_random
Collecting signals on contig: chr2
Collecting signals on contig: chr20
Collecting signals on contig: chr21
Collecting signals on contig: chr22
Collecting signals on contig: chr22_KI270731v1_random
Collecting signals on contig: chr22_KI270732v1_random
Collecting signals on contig: chr22_KI270733v1_random
Collecting signals on contig: chr22_KI270734v1_random
Collecting signals on contig: chr22_KI270735v1_random
Collecting signals on contig: chr22_KI270736v1_random
Collecting signals on contig: chr22_KI270737v1_random
Collecting signals on contig: chr22_KI270738v1_random
Collecting signals on contig: chr22_KI270739v1_random
Collecting signals on contig: chr2_KI270715v1_random
Collecting signals on contig: chr2_KI270716v1_random
Collecting signals on contig: chr3
Collecting signals on contig: chr3_GL000221v1_random
Collecting signals on contig: chr4
Collecting signals on contig: chr4_GL000008v2_random
Collecting signals on contig: chr5
Collecting signals on contig: chr5_GL000208v1_random
Collecting signals on contig: chr6
Collecting signals on contig: chr7
Collecting signals on contig: chr8
Collecting signals on contig: chr9
Collecting signals on contig: chr9_KI270717v1_random
Collecting signals on contig: chr9_KI270718v1_random
Collecting signals on contig: chr9_KI270719v1_random
Collecting signals on contig: chr9_KI270720v1_random
Collecting signals on contig: chrEBV
Collecting signals on contig: chrM
Collecting signals on contig: chrUn_GL000195v1
Collecting signals on contig: chrUn_GL000213v1
Collecting signals on contig: chrUn_GL000214v1
Collecting signals on contig: chrUn_GL000216v2
Collecting signals on contig: chrUn_GL000218v1
Collecting signals on contig: chrUn_GL000219v1
Collecting signals on contig: chrUn_GL000220v1
Collecting signals on contig: chrUn_GL000224v1
Collecting signals on contig: chrUn_GL000226v1
Collecting signals on contig: chrUn_KI270311v1
Collecting signals on contig: chrUn_KI270317v1
Collecting signals on contig: chrUn_KI270322v1
Collecting signals on contig: chrUn_KI270435v1
Collecting signals on contig: chrUn_KI270438v1
Collecting signals on contig: chrUn_KI270442v1
Collecting signals on contig: chrUn_KI270512v1
Collecting signals on contig: chrUn_KI270519v1
Collecting signals on contig: chrUn_KI270538v1
Collecting signals on contig: chrUn_KI270579v1
Collecting signals on contig: chrUn_KI270589v1
Collecting signals on contig: chrUn_KI270741v1
Collecting signals on contig: chrUn_KI270742v1
Collecting signals on contig: chrUn_KI270743v1
Collecting signals on contig: chrUn_KI270744v1
Collecting signals on contig: chrUn_KI270745v1
Collecting signals on contig: chrUn_KI270746v1
Collecting signals on contig: chrUn_KI270747v1
Collecting signals on contig: chrUn_KI270748v1
Collecting signals on contig: chrUn_KI270749v1
Collecting signals on contig: chrUn_KI270750v1
Collecting signals on contig: chrUn_KI270751v1
Collecting signals on contig: chrUn_KI270752v1
Collecting signals on contig: chrUn_KI270753v1
Collecting signals on contig: chrUn_KI270754v1
Collecting signals on contig: chrUn_KI270755v1
Collecting signals on contig: chrUn_KI270756v1
Collecting signals on contig: chrUn_KI270757v1
Collecting signals on contig: chrX
Collecting signals on contig: chrY
Collecting signals on contig: chrY_KI270740v1_random
total 310.280535697937
Writing signals to file
extracted signals in:
-330.81649899482727
/opt/conda/envs/tiddit/lib/python3.10/site-packages/numpy/core/fromnumeric.py:3504: RuntimeWarning: Mean of empty slice.
  return _methods._mean(a, axis=axis, dtype=dtype,
/opt/conda/envs/tiddit/lib/python3.10/site-packages/numpy/core/_methods.py:129: RuntimeWarning: invalid value encountered in scalar divide
  ret = ret.dtype.type(ret / rcount)
calculated coverage in:
125.14846658706665
[M::main_ropebwt2] inserted 226699924 symbols in 17.377 sec, 40.176 CPU sec
[M::main_ropebwt2] constructed FM-index in 17.902 sec, 40.701 CPU sec
[M::main_ropebwt2] symbol counts: ($, A, C, G, T, N) = (1501324, 85186519, 27412781, 27412781, 85186519, 0)
[M::main_ropebwt2] rld: (tot, $, A, C, G, T, N) = (226699924, 1501324, 85186519, 27412781, 27412781, 85186519, 0)
[M::main] Version: r187
[M::main] CMD: ropebwt2 -dNCr /data/omeran/sv_calling/tiddit/LC_AC00_tiddit/clips.fa
[M::main] Real time: 18.806 sec; CPU: 41.556 sec
[M::fm6_unitig] choose prime 123457
[M::main] Version: r178
[M::main] CMD: fermi2 assemble -t 32 -l 81 -
[M::main] Real time: 25.227 sec; CPU: 202.571 sec
Traceback (most recent call last):
  File "/opt/conda/envs/tiddit/bin/tiddit", line 11, in <module>
    sys.exit(main())
  File "/opt/conda/envs/tiddit/lib/python3.10/site-packages/tiddit/__main__.py", line 148, in main
    tiddit_contig_analysis.main(prefix,sample_id,library,contigs,coverage_data,args)
  File "tiddit/tiddit_contig_analysis.pyx", line 134, in tiddit.tiddit_contig_analysis.main
  File "tiddit/tiddit_contig_analysis.pyx", line 11, in tiddit.tiddit_contig_analysis.read_contigs
  File "pysam/libcalignmentfile.pyx", line 747, in pysam.libcalignmentfile.AlignmentFile.__cinit__
  File "pysam/libcalignmentfile.pyx", line 952, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file does not contain alignment data

New Command Line Options?

There appear to be more command line options in the s/w than you document in the repo. Could you please elaborate on whaat these extra flags do?
~/wgs_resources/bin/TIDDIT.simg TIDDIT.py --sv

usage: TIDDIT --sv --bam inputfile [-o prefix] --ref ref.fasta
[-h] [--sv] --bam BAM [-o O] [-i I] [-d D] [-p P] [-r R] [-q Q] [-Q Q]
[-n N] [-e E] [-l L] [-s S] [-z Z] [--force_ploidy] [--no_cluster]
[--debug] [--n_mask N_MASK] [--ref REF] [--p_ratio P_RATIO]
[--r_ratio R_RATIO]
TIDDIT --sv --bam inputfile [-o prefix] --ref ref.fasta: error: argument --bam is required

Thank you!

John Major

FORMAT/LQ value always the same

FORMAT/LQ is specified as two Floats in the header but it seems the values are always equal (LQ:0.465656,0.465656). Is this expected? Or are they different in some edge cases?
thanks,

Small homozygous deletion (call het)

screen shot 2018-09-24 at 16 55 27

The sample in the middle was called 0/1 by FindSV/TIDDIT.

vcf record in TIDDIT
16 89596675 SV_22209_1 N . PASS SVTYPE=DEL;CIPOS=-303,0;CIEND=0,252;END=89621394;SVLEN=24720;COVM=0.0921832022299;COVA=30.8620200268;COVB=28.228429712;LFA=27;LFB=33;LTE=22;E1=5;E2=16;OR=0,0,0,22;ORSR=0,0;QUALA=54;QUALB=53 GT:CN:DV:RV 0/1:2:22:0

ERROR AFTER INSTALLING sys.path.insert(0,'{}/src/'.format(wd))

Dear developers, I have downloaded and installed TIDDIT as per the details given on this website and I am receiving the following error

Traceback (most recent call last):
  File "/home/s1667153/TIDDIT/TIDDIT.py", line 7, in <module>
    sys.path.insert(0, '{}/src/'.format(wd))
ValueError: zero length field name in format

Originally, TIDDIT was installed by my server manager for global use as a module, but this error arose so instead we decided to install and compile locally in my home directory yet the error persists.

We have used the following command to run TIDDIT

$ ~/TIDDIT/TIDDIT.py --sv --bam ../RAW_DATA/BLUE1.bam

Please can you help :)

BWA index error

Similar to the issue #89
I am using a docker image quay.io/biocontainers/tiddit:3.6.1--py38h24c8ff8_0

Created ref genome indexes using bwa ( bwa index -p ref ref.fa) as instructed. Following files are generated:
ref.amb, ref.ann, ref.pac, ref.bwt, ref.sa

However the error still persists:
"error, The reference must be indexed using bwa index; run bwa index, or skip local assembly (--skip_assembly)"

How to change the glob in TIDDIT docker image?

Duplicate VCF lines from HG002 BAM

Hello,

I tested your tool on AshkenazimTrio and noticed that vcf_id is common for BND pairs with different quality, and there are duplicate vcf lines.
The reference files I used

https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/

The Bam and index files used

https://ftp.ncbi.nlm.nih.gov/ReferenceSamples/giab/data/AshkenazimTrio/HG002_NA24385_son/NIST_Illumina_2x250bps/novoalign_bams/

The duplicate vcf lines. I can also share the VCF file if you want so. Thanks

chr5	58813706	SV_736_1	N	]chr5:58813779]N	50	PASS	SVTYPE=BND;REGIONA=58813706,58813943;REGIONB=58813723,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=GGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAATGCACACATACAGGCAATCAGGAATGCAGAAATGAATTTACCAAGTTACAAAATGGGTTAACACCCATGGAGCAAGAATCAGATGCATGCCACCAAACACAATTTATTGGCATTTCTTTCTATTTGCAAGAACTTGTATTATTATTGGTTTTCCACCACCTAC	GT:CN:COV:DV:RV:LQ:RR:DR	0/1:2:62.02100840336134,59.54054054054054,59.0:0:0:0.0,0.0:36,46:36,39
chr5	58813706	SV_1066_1	N	]chr5:58813779]N	50	PASS	SVTYPE=BND;REGIONA=58813706,58813755;REGIONB=58813502,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=CCCCCCATGGATCTTTCTACACGCGCGGGGTTGGGTATCTTCTGTGTGCACACTGCTCACCCCCCGTTCTCATAGACAGGTTGTCTAGTCACTCCAAGCACATGCCTTCCTTAGCCATTGTATTGTTAAGTTTTTATGTTTTATTTATATTTATATTTATATATATATATATATATATATATATATATATATATATACACATACACACATATACATATGGTAGAACCACAGCTTTTATCCAAATATAAAATAAACACATGTCAAAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAAATCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCAGAGGCACAAGTGGATACTCAGTGAACGCAA	GT:CN:COV:DV:RV:LQ:RR:DR	0/1:2:59.32,59.54054054054054,61.881294964028775:0:0:0.0,0.0:36,46:36,39
chr5	58813779	SV_736_2	N	N[chr5:58813706[	50	PASS	SVTYPE=BND;REGIONA=58813706,58813943;REGIONB=58813723,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=GGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAATGCACACATACAGGCAATCAGGAATGCAGAAATGAATTTACCAAGTTACAAAATGGGTTAACACCCATGGAGCAAGAATCAGATGCATGCCACCAAACACAATTTATTGGCATTTCTTTCTATTTGCAAGAACTTGTATTATTATTGGTTTTCCACCACCTAC	GT:CN:COV:DV:RV:LQ:RR:DR	0/1:2:62.02100840336134,59.54054054054054,59.0:0:0:0.0,0.0:36,46:36,39
chr5	58813779	SV_1066_2	N	N[chr5:58813706[	50	PASS	SVTYPE=BND;REGIONA=58813706,58813755;REGIONB=58813502,58813779;LFA=0,0;LFB=0,0;LTE=0,0;CTG=CCCCCCATGGATCTTTCTACACGCGCGGGGTTGGGTATCTTCTGTGTGCACACTGCTCACCCCCCGTTCTCATAGACAGGTTGTCTAGTCACTCCAAGCACATGCCTTCCTTAGCCATTGTATTGTTAAGTTTTTATGTTTTATTTATATTTATATTTATATATATATATATATATATATATATATATATATATATACACATACACACATATACATATGGTAGAACCACAGCTTTTATCCAAATATAAAATAAACACATGTCAAAGATATTATTTAACTCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCATAGGCTCAAGGGGATATTCAGTGAGATATTATTTAAATCTGGACTTAAAGAAGGGACCAGTAAGATGTTGCAGAGGCACAAGTGGATACTCAGTGAACGCAA	GT:CN:COV:DV:RV:LQ:RR:DR	0/1:2:59.32,59.54054054054054,61.881294964028775:0:0:0.0,0.0:36,46:36,39

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.