kcleal / dysgu Goto Github PK
View Code? Open in Web Editor NEWToolkit for calling structural variants using short or long reads
License: MIT License
Toolkit for calling structural variants using short or long reads
License: MIT License
Hello!
I've noticed in the FORMAT field there is a metric that reports the number of reads supporting the structural variant, but I was wondering if there is any metric or a way to report the total number of reads in that region.
Best regards,
Jonatan
Hello,
I was wondering if it would be possible to add an option to list the supporting reads for each SV? Even if only for the pacbio/ont reads, I think it would be useful to have as an information.
Thank you in advance,
Andrea
Hi,
I am currently testing dysgu on some chicken samples. For some samples, sequenced with 2*151bp paired-end Illumina reads, I encounter an error if --remap=True
:
...
2022-08-09 13:41:43,692 [INFO ] Number of matching SVs from --sites 101804
Traceback (most recent call last):
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/dysgu/main.py", line 444, in call_events
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1337, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 1192, in dysgu.cluster.pipe1
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/dysgu/re_map.py", line 427, in remap_soft_clips
gstart, ref_seq_big, idx)
File "/home/uni08/geibel/chicken/chicken_sv/test/testDysgou/.snakemake/conda/023852c9410a500a0b2051e5e324c328/lib/python3.7/site-packages/dysgu/re_map.py", line 279, in process_contig
e.ref_seq = ref_seq_clipped[500 - 1]
IndexError: string index out of range
The chicken reference genome actually has some small contigs < 500bp, but I'm not sure whether ref_seq_clipped
holds the reference contig. Further, I would expect the error then for all samples when force-calling, but it appears only in 4 out of 6 test samples.
Do you have any clue whether this could cause the problem?
Thanks,
Johannes
Hi @kcleal and thanks for the great tool!
I was playing around with a few settings and ran into the following error when trying out --max-cov auto
.
Note that the pipeline runs perfectly when this is set to -1 or the default 200 for instance.
2022-03-23 19:26:25,545 [INFO ] [dysgu-run] Version: 1.3.7
2022-03-23 19:26:25,546 [INFO ] run --mode nanopore --diploid True --min-support 3 --min-size 30 --max-cov auto -o output.vcf -p 18 -c genome.fa /scratch input.bam
2022-03-23 19:26:25,546 [INFO ] Destination: /scratch
[W::hts_idx_load3] The index file is older than the data file: input.bam.bai
Traceback (most recent call last):
File "/opt/venv/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/venv/lib/python3.9/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/opt/venv/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/venv/lib/python3.9/site-packages/dysgu/main.py", line 270, in run_pipeline
max_cov_value = sv2bam.process(ctx.obj)
File "dysgu/sv2bam.pyx", line 165, in dysgu.sv2bam.process
File "dysgu/coverage.pyx", line 47, in dysgu.coverage.auto_max_cov
TypeError: 'bool' object is not callable
How can I re-use the temp files with the --ibam option?
Hi,
Would you please add dysgu to bioconda.
After trying very hard to install dysgu, I got following errors, do you know why?
dysgu -h
Traceback (most recent call last):
File "/Bio/User/kxie/software/anaconda3/envs/dysgu/bin/dysgu", line 5, in
from dysgu.main import cli
File "/Bio/User/kxie/software/anaconda3/envs/dysgu/lib/python3.9/site-packages/dysgu/main.py", line 11, in
from dysgu import cluster, view, sv2bam
File "dysgu/cluster.pyx", line 15, in init dysgu.cluster
File "dysgu/coverage.pyx", line 5, in init dysgu.coverage
File "dysgu/io_funcs.pyx", line 19, in init dysgu.io_funcs
File "/Bio/User/kxie/software/anaconda3/envs/dysgu/lib/python3.9/site-packages/numpy/init.py", line 284, in getattr
raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'float'
Best,
Kun
I ran three samples simultanously. One of them processed to completion but two failed due to Segmentation fault during the Building graph step
2021-07-27 16:01:08,978 [INFO ] [dysgu-run] Version: 1.1.7
2021-07-27 16:01:08,980 [INFO ] run -p5 GRCh38_full_analysis_set_plus_decoy_hla.fa X_dysgu X.cram.bam
2021-07-27 16:01:08,980 [INFO ] Destination: X_dysgu
2021-07-27 20:52:22,501 [INFO ] dysgu fetch X.cram.bam written to X_dysgu/X.cram.dysgu_reads.bam, n=32435857, time=4:51:13 h:m:s
2021-07-27 20:52:22,501 [INFO ] Input file is: X_dysgu/X.cram.dysgu_reads.bam
[E::idx_find_and_load] Could not retrieve index file for 'X_dysgu/X.cram.dysgu_reads.bam'
2021-07-27 20:52:22,715 [INFO ] Input file has index False
2021-07-27 20:52:23,231 [WARNING] Warning: more than one @RG, using first sample (SM) for output: X
2021-07-27 20:52:23,231 [INFO ] Sample name: X
2021-07-27 20:52:23,231 [INFO ] Writing vcf to stdout
2021-07-27 20:52:23,231 [INFO ] Running pipeline
2021-07-27 20:52:26,642 [INFO ] Removed 55 outliers with insert size >= 1661
2021-07-27 20:52:26,659 [INFO ] Inferred read length 151.0, insert median 444, insert stdev 202
2021-07-27 20:52:26,660 [INFO ] Max clustering dist 1454
2021-07-27 20:52:26,660 [INFO ] Minimum support 3
2021-07-27 20:52:26,660 [INFO ] Building graph with clustering distance 1454 bp, scope length 1454 bp
Segmentation fault
I am running dysgu installed on Linux WSL2 (Windows 10). The command is default
dysgu run -p4 GRCh38_full_analysis_set_plus_decoy_hla.fa sample.bam > sample_dysgu_sv.vcf
the data is 30x WGS (illumina) PE 150. The output an empty vcf, the folder contains 25 bin files and the reads.bam file which is about 30GB. The output log is too long to paste but here is the last few lines of it.
File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/husamia/.local/lib/python3.8/site-packages/dysgu/main.py", line 220, in run_pipeline
cluster.cluster_reads(ctx.obj)
File "/home/husamia/.local/lib/python3.8/site-packages/dysgu/re_map.py", line 473, in drop_svs_near_reference_gaps
logging.warning("Error fetching reference chromosome: {}".format(chrom), errors)
Message: 'Error fetching reference chromosome: Y'
Arguments: (KeyError("sequence 'Y' not present"),)
2021-07-13 15:33:00,848 [INFO ] N near gaps dropped 0
2021-07-13 15:33:45,354 [INFO ] Loaded n=25 chromosome coverage arrays from /mnt/e/20A0012672_Proband/dysgu
2021-07-13 15:35:37,356 [INFO ] Adding genotype
Traceback (most recent call last):
File "/home/husamia/.local/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "/usr/lib/python3/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/lib/python3/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/lib/python3/dist-packages/click/decorators.py", line 17, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/husamia/.local/lib/python3.8/site-packages/dysgu/main.py", line 220, in run_pipeline
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1051, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 970, in dysgu.cluster.pipe1
File "dysgu/post_call_metrics.pyx", line 465, in dysgu.post_call_metrics.ref_repetitiveness
File "pysam/libcfaidx.pyx", line 303, in pysam.libcfaidx.FastaFile.fetch
KeyError: "sequence '1' not present"
Hi,
I am upgrading to the newest version of dysgu and I have the following error with pip3. I created a dedicated conda env especially for that.
Collecting dysgu
Using cached dysgu-1.3.7-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (72.7 MB)
Collecting lightgbm
Using cached lightgbm-3.3.2-py3-none-manylinux1_x86_64.whl (2.0 MB)
Requirement already satisfied: numpy>=1.16.5 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from dysgu) (1.19.5)
Collecting networkx>=2.4
Using cached networkx-2.7.1-py3-none-any.whl (2.0 MB)
Requirement already satisfied: pandas in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from dysgu) (1.3.3)
Requirement already satisfied: scipy in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from dysgu) (1.7.1)
Collecting click>=8.0
Using cached click-8.0.4-py3-none-any.whl (97 kB)
Collecting scikit-learn>=0.22
Using cached scikit_learn-1.0.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.4 MB)
Collecting edlib
Using cached edlib-1.3.9-cp39-cp39-manylinux2010_x86_64.whl (327 kB)
Requirement already satisfied: pysam in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from dysgu) (0.17.0)
Collecting cython
Using cached Cython-0.29.28-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
Collecting scikit-bio
Using cached scikit-bio-0.5.6.tar.gz (8.4 MB)
Preparing metadata (setup.py) ... done
Collecting sortedcontainers
Using cached sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Collecting joblib>=0.11
Using cached joblib-1.1.0-py2.py3-none-any.whl (306 kB)
Collecting threadpoolctl>=2.0.0
Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB)
Requirement already satisfied: wheel in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from lightgbm->dysgu) (0.37.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from pandas->dysgu) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from pandas->dysgu) (2021.1)
Collecting lockfile>=0.10.2
Using cached lockfile-0.12.2-py2.py3-none-any.whl (13 kB)
Collecting CacheControl>=0.11.5
Using cached CacheControl-0.12.10-py2.py3-none-any.whl (20 kB)
Collecting decorator>=3.4.2
Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB)
Collecting IPython>=3.2.0
Using cached ipython-8.1.1-py3-none-any.whl (750 kB)
Collecting matplotlib>=1.4.3
Using cached matplotlib-3.5.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (11.2 MB)
Collecting natsort>=4.0.3
Using cached natsort-8.1.0-py3-none-any.whl (37 kB)
Collecting hdmedians>=0.13
Using cached hdmedians-0.14.2.tar.gz (7.6 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: requests in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from CacheControl>=0.11.5->scikit-bio->dysgu) (2.26.0)
Collecting msgpack>=0.5.2
Using cached msgpack-1.0.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (322 kB)
Collecting prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0
Using cached prompt_toolkit-3.0.28-py3-none-any.whl (380 kB)
Requirement already satisfied: setuptools>=18.5 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from IPython>=3.2.0->scikit-bio->dysgu) (60.5.0)
Collecting matplotlib-inline
Using cached matplotlib_inline-0.1.3-py3-none-any.whl (8.2 kB)
Collecting pexpect>4.3
Using cached pexpect-4.8.0-py2.py3-none-any.whl (59 kB)
Collecting stack-data
Using cached stack_data-0.2.0-py3-none-any.whl (21 kB)
Collecting jedi>=0.16
Using cached jedi-0.18.1-py2.py3-none-any.whl (1.6 MB)
Collecting traitlets>=5
Using cached traitlets-5.1.1-py3-none-any.whl (102 kB)
Collecting backcall
Using cached backcall-0.2.0-py2.py3-none-any.whl (11 kB)
Collecting pygments>=2.4.0
Using cached Pygments-2.11.2-py3-none-any.whl (1.1 MB)
Collecting pickleshare
Using cached pickleshare-0.7.5-py2.py3-none-any.whl (6.9 kB)
Collecting kiwisolver>=1.0.1
Using cached kiwisolver-1.4.0-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.6 MB)
Collecting fonttools>=4.22.0
Using cached fonttools-4.31.2-py3-none-any.whl (899 kB)
Collecting cycler>=0.10
Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting pillow>=6.2.0
Using cached Pillow-9.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB)
Collecting packaging>=20.0
Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting pyparsing>=2.2.1
Using cached pyparsing-3.0.7-py3-none-any.whl (98 kB)
Requirement already satisfied: six>=1.5 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from python-dateutil>=2.7.3->pandas->dysgu) (1.15.0)
Collecting parso<0.9.0,>=0.8.0
Using cached parso-0.8.3-py2.py3-none-any.whl (100 kB)
Collecting ptyprocess>=0.5
Using cached ptyprocess-0.7.0-py2.py3-none-any.whl (13 kB)
Collecting wcwidth
Using cached wcwidth-0.2.5-py2.py3-none-any.whl (30 kB)
Requirement already satisfied: idna<4,>=2.5 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from requests->CacheControl>=0.11.5->scikit-bio->dysgu) (3.2)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from requests->CacheControl>=0.11.5->scikit-bio->dysgu) (1.26.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from requests->CacheControl>=0.11.5->scikit-bio->dysgu) (2.0.6)
Requirement already satisfied: certifi>=2017.4.17 in /home/kgagalova/.linuxbrew/lib/python3.9/site-packages (from requests->CacheControl>=0.11.5->scikit-bio->dysgu) (2021.5.30)
Collecting asttokens
Using cached asttokens-2.0.5-py2.py3-none-any.whl (20 kB)
Collecting executing
Using cached executing-0.8.3-py2.py3-none-any.whl (16 kB)
Collecting pure-eval
Using cached pure_eval-0.2.2-py3-none-any.whl (11 kB)
Building wheels for collected packages: scikit-bio, hdmedians
Building wheel for scikit-bio (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /home/kgagalova/.linuxbrew/opt/[email protected]/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-22gwih1p/scikit-bio_fd5adfa3cbcc4059926676ebae73c81f/setup.py'"'"'; __file__='"'"'/tmp/pip-install-22gwih1p/scikit-bio_fd5adfa3cbcc4059926676ebae73c81f/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-c_4psgrc
cwd: /tmp/pip-install-22gwih1p/scikit-bio_fd5adfa3cbcc4059926676ebae73c81f/
Complete output (663 lines):
running bdist_wheel
running build
[....]
running build_ext
creating build/temp.linux-x86_64-3.9
creating build/temp.linux-x86_64-3.9/skbio
creating build/temp.linux-x86_64-3.9/skbio/metadata
gcc-5 -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -O3 -Wall -fPIC -I/home/kgagalova/.linuxbrew/opt/[email protected]/lib/python3.9/site-packages/numpy/core/include -I/home/kgagalova/.linuxbrew/opt/[email protected]/include/python3.9 -c skbio/metadata/_intersection.c -o build/temp.linux-x86_64-3.9/skbio/metadata/_intersection.o
error: command 'gcc-5' failed: No such file or directory
----------------------------------------
ERROR: Failed building wheel for scikit-bio
Running setup.py clean for scikit-bio
Building wheel for hdmedians (pyproject.toml) ... error
ERROR: Command errored out with exit status 1:
command: /home/kgagalova/.linuxbrew/opt/[email protected]/bin/python3.9 /home/kgagalova/.linuxbrew/opt/[email protected]/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpke8nnicz
cwd: /tmp/pip-install-22gwih1p/hdmedians_34f0c0b3631144b6aeec2bd7484540bd
Complete output (9 lines):
running bdist_wheel
running build
running build_py
running build_ext
cythoning hdmedians/geomedian.pyx to hdmedians/geomedian.c
/tmp/pip-build-env-b8fpnjvp/overlay/lib/python3.9/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/pip-install-22gwih1p/hdmedians_34f0c0b3631144b6aeec2bd7484540bd/hdmedians/geomedian.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
building 'hdmedians.geomedian' extension
error: command 'gcc-5' failed: No such file or directory
----------------------------------------
ERROR: Failed building wheel for hdmedians
Failed to build scikit-bio hdmedians
ERROR: Could not build wheels for hdmedians, which is required to install pyproject.toml-based projects
Looks like I have some issues installing gcc-5 on my system.
ERROR conda.core.link:_execute(699): An error occurred while installing package 'psi4::gcc-5-5.2.0-1'.
Rolling back transaction: done
LinkError: post-link script failed for package psi4::gcc-5-5.2.0-1
location of failed script: /home/kgagalova/miniconda3/envs/py3.7/bin/.gcc-5-post-link.sh
==> script messages <==
<None>
==> script output <==
stdout: Couldn't locate crtXXX.o in default library search paths. You may not have it at all. It is usually packaged in libc6-dev/glibc-devel packages. We will try to locate crtXXX.o with system installed gcc...
Installation failed: gcc is not able to compile a simple 'Hello, World' program.
stderr: ln: failed to create symbolic link '/home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/crt1.o': File exists
ln: failed to create symbolic link '/home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/crti.o': File exists
ln: failed to create symbolic link '/home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/11.2.0': File exists
ln: failed to create symbolic link '/home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/crtn.o': File exists
ln: failed to create symbolic link '/home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/11.2.0': File exists
/home/kgagalova/miniconda3/envs/py3.7/bin/.gcc-5-post-link.sh: line 98: /home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-conda-linux-gnu/11.2.0 /home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/specs: No such file or directory
sed: can't read /home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-conda-linux-gnu/11.2.0 /home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/specs: No such file or directory
sed: can't read /home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-conda-linux-gnu/11.2.0 /home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/specs: No such file or directory
sed: can't read /home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-conda-linux-gnu/11.2.0 /home/kgagalova/miniconda3/envs/py3.7/lib/gcc/x86_64-unknown-linux-gnu/5.2.0/specs: No such file or directory
/home/kgagalova/miniconda3/envs/py3.7/gcc/libexec/gcc/x86_64-unknown-linux-gnu/5.2.0/cc1: error while loading shared libraries: libisl.so.10: cannot open shared object file: No such file or directory
return code: 1
()
Do you have any suggestion on how to install dysgu with pip3? Thank you in advance
Just a note for others.
Running manual installation (without conda) worked for me, but only after a little mucking around.
conda deactivate
Then from the github README
git clone --recursive https://github.com/kcleal/dysgu.git
cd dysgu/dysgu/htslib
make
cd ../../
pip install --user -r requirements.txt
pip install .
Successfully installed dysgu-1.3.6
$ dysgu
Traceback (most recent call last):
File "/home/rcug/.local/bin/dysgu", line 5, in <module>
from dysgu.main import cli
File "/home/rcug/.local/lib/python3.8/site-packages/dysgu/main.py", line 106, in <module>
version = pkg_resources.require("dysgu")[0].version
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 901, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python3/dist-packages/pkg_resources/__init__.py", line 792, in resolve
raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (Click 7.0 (/usr/lib/python3/dist-packages), Requirement.parse('click>=8.0.0'), {'black'})
solved with
pip install --user click==8.0
Now works
Usage: dysgu [OPTIONS] COMMAND [ARGS]...
Dysgu-SV is a set of tools calling structural variants from bam/cram files
Options:
--version Show the version and exit.
--help Show this message and exit.
Commands:
call Call structural variants from alignment file/stdin
fetch Filters input .bam/.cram for read-pairs that are discordant or...
merge Merge .vcf/csv variant files
run Run the dysgu pipeline.
test Run dysgu tests
Hi
I am working with plant WGS 20X data from more than 300 accessions.
First, I call SV variants with dysgu run and then dysgu merge
I re-genotyped at the sample level and when I have to merge the re-genotyped samples I got RAM issue (basically 500 gb RAM are not enough)
What I found really weird is that the first merge was fine with no problem.
Any idea on how I can manage this issue?
Cheers
Apologies if I missed this somewhere, but does the merge command work for merging translocations across input samples? I used the merge command and it didn't seem to merge any, though I could be missing something. If it does merge translocations, is there information for how it does so? Thanks for any assistance.
Hi,
Thanks for the nice tool. Do you have some method (or future plan) for joint calling?
I tried merge
to combine multiple samples' VCFs but am not sure whether this is a correct way to get reference homozygous genotypes..
Hi,
I'm working with Human WGS 35X data from multiple patients.
First, I call SV variants with dysgu run (in 26 samples below example for sample 1):
dysgu run \
--procs 12 \
--mode pe --pl pe \
--diploid True \
--drop-gaps True \
--max-cov auto \
--min-support 5 \
--mq 20 \
--exclude ${blacklist} \
--min-size 50 \
--verbosity 2 \
${hg38} \
${output}/tmp1 \
${input}/sample1.bam > ${output_run}/sample1.SV.vcf
Then, I merge the samples into a unified site list:
dysgu merge \
${output_run}/sample1.SV.vcf \
${output_run}/sample2.SV.vcf ... \
> ${output_merge}/merged.vcf
I re-genotype at the sample level:
dysgu run --sites ${output_merge}/merged.vcf ${hg38} tmp1 sample1.bam > ${output_geno}/sample1.re_geno.vcf
dysgu run --sites ${output_merge}/merged.vcf ${hg38} tmp2 sample1.bam > ${output_geno}/sample2.re_geno.vcf
....
Finally, I merge re-genotyped samples:
dysgu merge \
${output_geno}/sample1.re_geno.vcf \
${output_geno}/sample2.re_geno.vcf... \
> ${output_merge}/merged.re_geno.vcf
My questions are:
1) Based on WGS with 35X (Illumina paired-end), what better recommended value for "--min-support"?
2) Is the last step (merge regenotyped samples) correct? because is not mentioned in the doc https://github.com/kcleal/dysgu/blob/master/README.rst
3) Do you have any specific recommendations for calling germline variants?
Thanks for any help you can provide me on this protocol,
Best,
Tarek
Hello,
I ran into an issue while running a basic test with dysgu that seems to be allocating an extremely large array for some reason. The error is below (redacting some file paths):
2021-07-28 08:59:57,769 [INFO ] [dysgu-run] Version: 1.2.7
2021-07-28 08:59:57,770 [INFO ] run -o output.vcf --mode pe --pl pe /cluster/home/jholt/reference/hg38_asm5_alt/hg38.fa ./working_dir <redacted>/pipeline/merged_alignments/hg38_asm5_alt/sentieon-202010.02/HALB3002753.bam
2021-07-28 08:59:57,770 [INFO ] Destination: ./working_dir
2021-07-28 09:43:45,827 [INFO ] dysgu fetch <redacted>/pipeline/merged_alignments/hg38_asm5_alt/sentieon-202010.02/HALB3002753.bam written to ./working_dir/HALB3002753.dysgu_reads.bam, n=65472206, time=0:43:48 h:m:s
2021-07-28 09:43:45,827 [INFO ] Input file is: ./working_dir/HALB3002753.dysgu_reads.bam
2021-07-28 09:43:48,444 [INFO ] Sample name: HALB3002753
2021-07-28 09:43:48,444 [INFO ] Writing SVs to output.vcf
2021-07-28 09:43:48,446 [INFO ] Running pipeline
2021-07-28 09:43:49,103 [INFO ] Removed 34 outliers with insert size >= 903.0
2021-07-28 09:43:49,124 [INFO ] Inferred read length 151.0, insert median 391, insert stdev 99
2021-07-28 09:43:49,125 [INFO ] Max clustering dist 886
2021-07-28 09:43:49,126 [INFO ] Minimum support 3
2021-07-28 09:43:49,126 [INFO ] Building graph with clustering distance 886 bp, scope length 886 bp
2021-07-28 10:01:48,141 [INFO ] Total input reads 63689262
2021-07-28 10:02:50,425 [INFO ] Graph constructed
(315,)
(132,)
(array([[1.320000e+02, 1.870000e+02],
[1.230000e+02, 1.980000e+02],
[1.400000e+02, 2.810000e+02],
...,
[1.496422e+06, 1.496339e+06],
[1.496425e+06, 1.496419e+06],
[1.496474e+06, 1.496389e+06]]),)
Traceback (most recent call last):
File "dysgu/call_component.pyx", line 663, in dysgu.call_component.partition_single
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1064, in linkage
if not np.all(np.isfinite(y)):
numpy.core._exceptions.MemoryError: Unable to allocate 41.6 GiB for an array with shape (44717395096,) and data type bool
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<redacted>/miniconda3/envs/dysgu_test/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/dysgu/main.py", line 252, in run_pipeline
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1110, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 901, in dysgu.cluster.pipe1
File "dysgu/cluster.pyx", line 653, in dysgu.cluster.component_job
File "dysgu/call_component.pyx", line 1747, in dysgu.call_component.call_from_block_model
File "dysgu/call_component.pyx", line 1755, in dysgu.call_component.call_from_block_model
File "dysgu/call_component.pyx", line 1736, in dysgu.call_component.multi
File "dysgu/call_component.pyx", line 874, in dysgu.call_component.single
File "dysgu/call_component.pyx", line 668, in dysgu.call_component.partition_single
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/scipy/cluster/hierarchy.py", line 1060, in linkage
y = distance.pdist(y, metric)
File "<redacted>/miniconda3/envs/dysgu_test/lib/python3.9/site-packages/scipy/spatial/distance.py", line 2250, in pdist
return pdist_fn(X, out=out, **kwargs)
numpy.core._exceptions.MemoryError: Unable to allocate 333. GiB for an array with shape (44717395096,) and data type float64
It seems like the memory requirements should be much lower according to the docs. Any suggestions?
Towards the end of the SV run command I noticed the following line:
python3.7/site-packages/sklearn/base.py:338: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.0.2. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to: https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations UserWarning
Are you all aware of this? Should I be concerned about the unpickle
?
Thanks in advance. Very easy to use tool btw.
Hi
I am using a high coverage depth pacbio hifi dataset with depth 300+
I noticed that the default --mode pacbio
includes --max-cov 150
which is less than my actual avg data depth
pacbio:
--mq 20
--paired False
--min-support 2
--max-cov 150
--dist-norm 200
--trust-ins-len True
Can I add --max-cov 500
after --mode pacbio
and overwrite that parameter at runtime or should I set manually all arguments behind --mode pacbio
and replace it altogether?
dysgu run -p 24 --mode pacbio --max-cov 500 -x -c --thresholds 0.45,0.45,0.45,0.45,0.45 Sc.R64-1-1.fa wd mappings.bam > var.vcf
Another point; what other values than 0.45 would one want to use for --thresholds
and what would that mean?
thanks for your help
Hi,
Nice tool! I like that it can run with both short and long reads.
In my case I have few small families (duos or trios) sequenced using PacBio HiFi reads for which I want to create family VCFs
Following your suggestions I've first generated SV calls for each individual using dysgu run --mode pacbio
, then for each family I've merged calls from the various family members using dysgu merge vcf_sample1.vcf vcf_sample2.vcf | bgzip -c > merged.vcf.gz
Finally I want to re-call SV so I've tried something like the following
dysgu run -p12 --mode pacbio --sites merged.vcf.gz --sites-prob 0.8 --all-sites True GRCh38.fa tmpdir sample1.bam
I've tried the above also using uncompressed merged file and without --site-prob
and the results are the same as described below
In the beginning, the tool seems to run fine and it can read variants from my input I think. Here is an extract of the log:
2021-11-16 15:35:56,906 [INFO ] Writing vcf to stdout
2021-11-16 15:35:56,906 [INFO ] Running pipeline
2021-11-16 15:35:56,940 [INFO ] Minimum support 2
2021-11-16 15:35:56,940 [INFO ] Reading --sites
2021-11-16 15:35:57,623 [INFO ] Building graph with clustering 500000 bp
2021-11-16 15:36:31,347 [INFO ] Total input reads 244912
2021-11-16 15:36:31,417 [INFO ] Added 43336 variants from input sites
2021-11-16 15:36:31,504 [INFO ] Graph constructed
But then just after that I always get this error:
Traceback (most recent call last):
File "/well/gel/HICF2/software/conda_envs/dysgu/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/site-packages/dysgu/main.py", line 280, in run_pipeline
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1115, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 919, in dysgu.cluster.pipe1
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/gpfs3/well/gel/HICF2/software/conda_envs/dysgu/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'dysgu.sites_utils.Site'>: attribute lookup Site on dysgu.sites_utils failed
Any idea of possible solutions?
Thanks!
Hello Kez,
Exciting work you've put together here with dysgu. Very happy to have come across your preprint as I've been looking for a tool that would generate such useful read metrics for the major SV classes.
I have been trying to get the software to complete on one of my samples and continue to run into this same error upon running the call
step.
dysgu fetch working_dir PD26400a_T.final.bam
2021-12-17 11:44:15,327 [INFO ] [dysgu-fetch] Version: 1.3.0
2021-12-17 11:59:25,375 [INFO ] dysgu fetch PD26400a_T.final.bam written to working_dir/PD26400a_T.final.dysgu_reads.bam, n=6183390, time=0:15:10 h:m:s
dysgu call --ibam PD26400a_T.final.bam --sites PD26400a_T_vs_PD26400b_N.consensus.somatic.sv.vcf Homo_sapiens_assembly38.fasta working_dir workin
g_dir/PD26400a_T.final.dysgu_reads.bam > svs.vcf
2021-12-17 12:01:57,321 [INFO ] [dysgu-call] Version: 1.3.0
2021-12-17 12:01:57,321 [INFO ] Input file is: working_dir/PD26400a_T.final.dysgu_reads.bam
2021-12-17 12:01:57,321 [INFO ] call --ibam PD26400a_T.final.bam --sites PD26400a_T_vs_PD26400b_N.consensus.somatic.sv.vcf Homo_sapiens_assembly38.fasta working_dir working_dir/PD26400a_T.final.dysgu_reads.bam
[W::hts_idx_load3] The index file is older than the data file: PD26400a_T.final.bai
2021-12-17 12:01:57,573 [INFO ] Sample name: PD26400a_T
2021-12-17 12:01:57,573 [INFO ] Writing vcf to stdout
2021-12-17 12:01:57,573 [INFO ] Running pipeline
2021-12-17 12:01:58,283 [INFO ] Calculating insert size. Removed 27 outliers with insert size >= 938.0
2021-12-17 12:01:58,299 [INFO ] Inferred read length 151.0, insert median 458, insert stdev 89
2021-12-17 12:01:58,301 [INFO ] Max clustering dist 903
2021-12-17 12:01:58,301 [INFO ] Minimum support 3
2021-12-17 12:01:58,301 [INFO ] Reading --sites
[W::vcf_parse_info] INFO/END=34143939 is smaller than POS at chr1:84232901
2021-12-17 12:01:58,351 [INFO ] Building graph with clustering 903 bp
2021-12-17 12:03:57,234 [INFO ] Total input reads 6183142
2021-12-17 12:03:59,688 [INFO ] Added 57 variants from input sites
2021-12-17 12:04:01,797 [INFO ] Graph constructed
Traceback (most recent call last):
File "/conda/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "/conda/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/conda/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/conda/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/conda/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/conda/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/conda/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/conda/lib/python3.7/site-packages/dysgu/main.py", line 442, in call_events
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1115, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 898, in dysgu.cluster.pipe1
File "dysgu/cluster.pyx", line 633, in dysgu.cluster.component_job
File "dysgu/call_component.pyx", line 1914, in dysgu.call_component.call_from_block_model
File "dysgu/call_component.pyx", line 1928, in dysgu.call_component.call_from_block_model
File "dysgu/call_component.pyx", line 1880, in dysgu.call_component.multi
File "dysgu/call_component.pyx", line 988, in dysgu.call_component.single
File "dysgu/call_component.pyx", line 608, in dysgu.call_component.make_single_call
File "dysgu/call_component.pyx", line 286, in dysgu.call_component.count_attributes2
ZeroDivisionError: float division
To troubleshoot, I first ran dysgu test
to determine if my build was fully successful and this completed without error.
Next, I checked the output bam from the fetch
step with some samtools commands. It's wasn't corrupt and had a reasonable amount of reads (6183390).
Then I went into the script call_component.pyx and looked into the code block where the error originated:
if clipped_bases > 0 and aligned_bases > 0:
er.clip_qual_ratio = (aligned_base_quals / aligned_bases) / (clipped_base_quals / clipped_bases)
else:
er.clip_qual_ratio = 0
From that I can see it's one of aligned_bases
, clipped_base_quals
, or clipped_bases
I couldn't see a reason any to these values would be zero based on the BAM that was produced so I wanted to present the issue here.
Let me know if there is any other information or tests you would like me to try.
Best regards,
Patrick
Below are packages installed for build via Anaconda and pip:
Cython 0.29.26
click 8.0.3
numpy 1.21.2
pandas 1.3.5
pysam 0.18.0
networkx 2.6.3
scikit-learn 1.0.1
ncls 0.0.62
scikit-bio 0.5.6
edlib 1.3.9
sortedcontainers 2.4.0
lightgbm 3.3.1
Hi,
I merged multiple samples and used them to genotype each sample. Then I merged the genotyped samples to get a population level VCF. It has variants with a mapping quality of zero and genotype quality zero. These variants are not present in the original sample genotype file.
Chr: NC_001493.2
Position: 63036-63075
ID: 35
Genotype Information
Sample: ATCC.dysgu_reads
Genotype: T
Quality: 0
Type: HOM_REF
Is Filtered Out: No
Genotype Attributes
MAPQP: 0
SU: 0
PS: 0
BCC: 0
MS: 0
FCC: 0
Genotype Quality: 0
COV: 0
SC: 0
RED: 0
PROB: 0
PE: 0
ICN: 0
NEIGH10: 0
BND: 0
RMS: 0
WR: 0
OCN: 0
SR: 0
How to filter them? Thank you
Hello,
I have used dysgu for SV calling earlier without facing any errors but suddenly now I facing following error. Can you please help me what's going wrong, since google didn't show any possible solution
command : dysgu call --mode nanopore -p 20 ../../hg38.fa work_dir/ --ibam HG00733.sortedmappeddedup.bam > HG00733_ONTlee_dysgu.vcf
error :
2022-07-11 12:11:04,948 [INFO ] [dysgu-call] Version: 1.3.11
Traceback (most recent call last):
File "/home/sachin/miniconda3/bin/dysgu", line 8, in
sys.exit(cli())
File "/home/sachin/miniconda3/lib/python3.8/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/home/sachin/miniconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/sachin/miniconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/sachin/miniconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/sachin/miniconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/sachin/miniconda3/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/sachin/miniconda3/lib/python3.8/site-packages/dysgu/main.py", line 431, in call_events
raise ValueError("Could not find '{}'".format(kwargs["sv_aligns"]))
ValueError: Could not find 'None'
Hi,
Thanks for the nice tool
I am trying to merge multiple samples that came out from the dysgu run -v2
command in one combined file. I used the following command:
dysgu merge Sample1_SVs.vcf Sample2_SVs.vcf Sample3_SVs.vcf .... Sample8_SVs.vcf > Combined_file.vcf
However, if the variant exists in multiple samples it doesn't get actually merged. Instead, it keeps writing the same variant in separate rows:
#CHROM | POS | ID | REF | ALT | QUAL | FILTER | INFO | FORMAT | Sample1 | Sample2 | Sample3 | Sample4 | Sample5 | Sample6 | Sample7 | Sample8 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | 34200481 | 18843 | A | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=151067;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTA;CONTIGB=agtatatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTGTT;GC=24.85;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=21;WR=7;PE=0;SR=0;SC=7;BND=0;LPREC=1;RT=pe;MeanPROB=0.892;MaxPROB=0.892 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0/1:129.0:60.0:21:7:0:0:7:0:40.02:3:7:7:0:0:18:0.516:0.513:0.994:0.892 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | |
2 | 34200481 | 72815 | A | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=137688;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATA;CONTIGB=cagtatatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTGTTTCT;GC=24.85;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=21;WR=9;PE=0;SR=0;SC=3;BND=0;LPREC=1;RT=pe;MeanPROB=0.89;MaxPROB=0.89 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:101.0:60.0:21:9:0:0:3:0:38.48:3:6:6:0:0:20:0.444:0.421:0.947:0.89 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | |
2 | 34200481 | 127694 | A | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=138581;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACAC;GC=24.43;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=26;WR=10;PE=0;SR=0;SC=6;BND=0;LPREC=1;RT=pe;MeanPROB=0.896;MaxPROB=0.896 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:78.0:60.0:26:10:0:0:6:0:37.75:3:11:5:0:0:17:0.342:0.342:1.0:0.896 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | |
2 | 34200481 | 182265 | A | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=148809;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACA;CONTIGB=tatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTT;GC=25.08;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=22;WR=10;PE=0;SR=0;SC=2;BND=0;LPREC=1;RT=pe;MeanPROB=0.888;MaxPROB=0.888 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:134.0:60.0:22:10:0:0:2:0:38.75:3:8:4:0:0:20:0.509:0.568:1.115:0.888 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | |
2 | 34200481 | 236622 | A | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=149759;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=TGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGT;CONTIGB=ctattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTGTTT;GC=25.16;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=12;WR=3;PE=0;SR=0;SC=6;BND=0;LPREC=1;RT=pe;MeanPROB=0.865;MaxPROB=0.865 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:92.0:60.0:12:3:0:0:6:0:36.96:3:6:3:0:0:8:0.392:0.389:0.993:0.865 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | |
2 | 34200481 | 290902 | A | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=154353;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACA;CONTIGB=tatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATT;GC=24.77;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=19;WR=8;PE=0;SR=0;SC=3;BND=0;LPREC=1;RT=pe;MeanPROB=0.827;MaxPROB=0.827 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:171.0:60.0:19:8:0:0:3:0:37.45:3:5:6:0:0:13:0.712:0.684:0.961:0.827 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | |
2 | 34200481 | 341988 | A | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=132226;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=ATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACAC;CONTIGB=agtatatacactattgacaatagtgtataTAGAGATATAGCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATT;GC=25.31;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=8;WR=3;PE=0;SR=0;SC=2;BND=0;LPREC=1;RT=pe;MeanPROB=0.842;MaxPROB=0.842 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:71.0:60.0:8:3:0:0:2:0:32.49:3:2:3:0:0:8:0.344:0.333:0.97:0.842 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | |
2 | 34200481 | 391981 | A | . | PASS | SVMETHOD=DYSGUv1.3.14;SVTYPE=DEL;END=34200511;CHR2=2;GRP=154176;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=30;CONTIGA=TACTATTGACAATAGTACATATATAATATACAGTATATACACTATTGACAATAGTGTATATAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACAC;CONTIGB=atatacactattgacaatagtgtataTAGAGATATATCTCTATATTGATACATATGTAGAGATATATCTCTATATTGATATATATGTACACACACAGGAGATATATACGTATGTATCAAAACATGTAATATACGTATACACACGTCTTTTTTATTG;GC=25.23;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=36;OL=0;SU=30;WR=12;PE=0;SR=0;SC=6;BND=0;LPREC=1;RT=pe;MeanPROB=0.92;MaxPROB=0.92 | GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0:0 | 0/1:138.0:60.0:30:12:0:0:6:0:39.6:3:8:10:0:0:19:0.575:0.579:1.007:0.92 |
Is this what I should expect from the merge
command? or I am doing something wrong?
Thanks
Hi Kez,
During the steps to write the output of the call
function to the VCF, I encountered the following error:
dysgu call --ibam PD26400a_T.final.bam --sites PD26400a_T_vs_PD26400b_N.consensus.somatic.sv.vcf --all-sites True --ho
m-ref-sites True --clip-length 15 --metrics Homo_sapiens_assembly38.fasta working_dir working_dir/PD26400a_T.final.dysgu_reads.bam
> svs.vcf
2021-12-21 13:12:33,777 [INFO ] [dysgu-call] Version: 1.3.3
2021-12-21 13:12:33,777 [INFO ] Input file is: working_dir/PD26400a_T.final.dysgu_reads.bam
2021-12-21 13:12:33,777 [INFO ] call --ibam PD26400a_T.final.bam --sites PD26400a_T_vs_PD26400b_N.consensus.somatic.sv.vcf --all-sites True --hom-ref-sites True --clip-length 15 --metrics Homo_sapiens_assembly38.fasta working_dir working_dir/PD26400a_T.final.dysgu_reads.bam
[W::hts_idx_load3] The index file is older than the data file: PD26400a_T.final.bai
2021-12-21 13:12:34,079 [INFO ] Sample name: PD26400a_T
2021-12-21 13:12:34,079 [INFO ] Writing vcf to stdout
2021-12-21 13:12:34,079 [INFO ] Running pipeline
2021-12-21 13:12:34,782 [INFO ] Calculating insert size. Removed 27 outliers with insert size >= 938.0
2021-12-21 13:12:34,798 [INFO ] Inferred read length 151.0, insert median 458, insert stdev 89
2021-12-21 13:12:34,799 [INFO ] Max clustering dist 903
2021-12-21 13:12:34,799 [INFO ] Minimum support 3
2021-12-21 13:12:34,799 [INFO ] Reading --sites
[W::vcf_parse_info] INFO/END=34143939 is smaller than POS at chr1:84232901
2021-12-21 13:12:34,826 [INFO ] Building graph with clustering 903 bp
2021-12-21 13:14:03,262 [INFO ] Total input reads 4894147
2021-12-21 13:14:04,966 [INFO ] Added 57 variants from input sites
2021-12-21 13:14:06,505 [INFO ] Graph constructed
2021-12-21 13:19:44,026 [INFO ] Number of components 2578332. N candidates 416974
2021-12-21 13:19:44,330 [INFO ] Number of matching SVs from --sites 62
2021-12-21 13:20:02,988 [INFO ] Re-alignment of soft-clips done. N candidates 300203
2021-12-21 13:20:07,483 [INFO ] Number of candidate SVs merged: 25814
2021-12-21 13:20:07,483 [INFO ] Number of candidate SVs after merge: 274389
2021-12-21 13:20:07,573 [INFO ] Number of candidate SVs dropped with sv-len < min-size or support < min support: 243030
2021-12-21 13:20:07,893 [INFO ] Number or SVs near gaps dropped 174
2021-12-21 13:20:09,897 [INFO ] Loaded n=24 chromosome coverage arrays from working_dir
/conda/lib/python3.7/site-packages/sklearn/base.py:333: UserWarning: Trying to unpickle estimator LabelEncoder from version 0.23.2 when using version 1.0.1. This might lead to breaking code or invalid results. Use at your own risk. For more info please refer to:
https://scikit-learn.org/stable/modules/model_persistence.html#security-maintainability-limitations
UserWarning,
2021-12-21 13:20:29,021 [INFO ] Model: pe, diploid: True, contig features: True. N features: 43
Traceback (most recent call last):
File "/conda/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "/conda/lib/python3.7/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/conda/lib/python3.7/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/conda/lib/python3.7/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/conda/lib/python3.7/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/conda/lib/python3.7/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/conda/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/conda/lib/python3.7/site-packages/dysgu/main.py", line 445, in call_events
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1190, in dysgu.cluster.cluster_reads
File "dysgu/io_funcs.pyx", line 656, in dysgu.io_funcs.to_vcf
File "dysgu/io_funcs.pyx", line 279, in dysgu.io_funcs.make_main_record
ValueError: could not convert string to float: '.'
There was no issues with the fetch
command and the beginning of the svs.vcf
appears to be output correctly (i.e. full header was present, including all FILTER/INFO/FORMAT fields).
The issue stems from the REP metric calculation, specifically at the following code block:
if not small_output:
info_extras += [
f"REP={'%.3f' % float(rep)}",
f"REPSC={'%.3f' % float(repsc)}",
]
I'm running the most recent push to pypi (v1.3.3).
Is the VCF output compatible with genome browser? I am getting error that the reference allele is missing when I try to open the VCF output with IGV. This would be highly needed for visualize the results.
I'm trying to run dysgu with the multiprocessing flag (-p), some samples runs smoothly but once in a while i get errors like this one:
2022-07-20 18:39:33,269 [INFO ] [dysgu-run] Version: 1.3.11
2022-07-20 18:39:35,108 [INFO ] run -v1 -p4 -c --regions /home/mimmie/glob/dysgu/chr1_1.bed --regions-only True -I 393,95,151 parsed.pabies-2.0.fa P9904-117_temp dedup.ir.P9904-117.bam
2022-07-20 18:39:35,108 [INFO ] Destination: P9904-117_temp
2022-07-20 18:39:35,108 [INFO ] Searching regions from /home/mimmie/glob/dysgu/chr1_1.bed
2022-07-20 18:41:13,640 [INFO ] dysgu fetch dedup.ir.P9904-117.bam written to P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam, n=417399, time=0:01:38 h:m:s
2022-07-20 18:41:13,789 [INFO ] Input file is: P9904-117_temp/dedup.ir.P9904-117.dysgu_reads.bam
2022-07-20 18:41:13,791 [INFO ] Input file has no index, but --include was provided, attempting to index
2022-07-20 18:41:14,439 [INFO ] Sample name: P9904-117
2022-07-20 18:41:14,439 [INFO ] Writing vcf to stdout
2022-07-20 18:41:14,439 [INFO ] Running pipeline
2022-07-20 18:41:14,441 [INFO ] Read length 151, insert_median 393, insert stdev 95
2022-07-20 18:41:14,441 [INFO ] Max clustering dist 868
2022-07-20 18:41:14,442 [INFO ] Minimum support 3
2022-07-20 18:41:14,442 [INFO ] Building graph with clustering 868 bp
2022-07-20 18:41:29,817 [INFO ] Total input reads 533293
2022-07-20 18:48:00,408 [INFO ] Graph constructed
Process Process-1:
Traceback (most recent call last):
File "/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/home/mimmie/.pyenv/versions/3.9.9/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "dysgu/cluster.pyx", line 813, in dysgu.cluster.process_job
File "dysgu/cluster.pyx", line 761, in dysgu.cluster.component_job
File "dysgu/call_component.pyx", line 1972, in dysgu.call_component.call_from_block_model
File "dysgu/call_component.pyx", line 1986, in dysgu.call_component.call_from_block_model
File "dysgu/call_component.pyx", line 1933, in dysgu.call_component.multi
File "dysgu/call_component.pyx", line 1822, in dysgu.call_component.get_reads
File "pysam/libcalignmentfile.pyx", line 1876, in pysam.libcalignmentfile.AlignmentFile.__next__
OSError: error -4 while reading file
If i run the sample without multiprocessing, it runs through without any problems. Since I have over 1000 samples with massive genomes it's not an option to run without multiprocessing so I'm grateful for any ideas and suggestions to what could cause this error and how to fix it.
Hi,
First of all, thank you for the great tool, dysgu!
I am running it with my bam file from 10x genomics linked-read sequencing, and I found that error keep popping up. Could you give me some advice please?
2022-12-04 15:31:26,573 [INFO ] Sample name: SHR_OlaIpcv
2022-12-04 15:31:26,575 [INFO ] Writing vcf to stdout
2022-12-04 15:31:26,575 [INFO ] Running pipeline
2022-12-04 15:31:27,337 [INFO ] Calculating insert size. Removed 735 outliers with insert size >= 1033.0
2022-12-04 15:31:27,345 [INFO ] Inferred read length 148.0, insert median 302, insert stdev 128
2022-12-04 15:31:27,362 [INFO ] Max clustering dist 942
2022-12-04 15:31:27,362 [INFO ] Minimum support 3
2022-12-04 15:31:27,372 [INFO ] Building graph with clustering 942 bp
Traceback (most recent call last):
File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag
TypeError: '<' not supported between instances of 'bool' and 'str'
Exception ignored in: 'dysgu.graph.process_alignment'
Traceback (most recent call last):
File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag
TypeError: '<' not supported between instances of 'bool' and 'str'
Traceback (most recent call last):
File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag
TypeError: '<' not supported between instances of 'bool' and 'str'
Exception ignored in: 'dysgu.graph.process_alignment'
Traceback (most recent call last):
File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag
TypeError: '<' not supported between instances of 'bool' and 'str'
Traceback (most recent call last):
File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag
TypeError: '<' not supported between instances of 'bool' and 'str'
Exception ignored in: 'dysgu.graph.process_alignment'
Traceback (most recent call last):
File "dysgu/graph.pyx", line 754, in dysgu.graph.alignments_from_sa_tag
TypeError: '<' not supported between instances of 'bool' and 'str'
currently, if a cram file is sent, then dysgu will try to use a fasta over the network according to the header (which fails in my case).
it would be faster to use:
result = hts_set_fai_filename(f_in, fasta);
...
after sam_open
. This will work for bam as well (in that it will have no effect for bam files).
Hi,
I got a problem with dysgu merging multiple samples and genotyping. I have twenty samples. First, SVs were called on each of them and merged. However, genotyping with the merged vcf file has failed with the following error.
ATCC BCAHV C02-169 C96-152 C97-256 C98-172 LA87-305 LA90-390 S03-468 S09-363 S09-391 S09-398 S11-140 S11-366 S11-375 S11-462 S11-467 S98-108 S98-595 S99-1170
20
Dysgu is Processing ATCC
2022-07-24 09:03:54,699 [INFO ] [dysgu-call] Version: 1.3.11
2022-07-24 09:03:54,700 [INFO ] Input file is: /ssd2/av724/var_jul19/final2/minimap2_snf/ATCC.bam
2022-07-24 09:03:54,700 [INFO ] call --sites /ssd2/av724/var_jul19/final2/dysgu_np30x/ver2/dy_poplnmerg.vcf /ssd2/av724/varanalysis/reference/atcc_ref_ccv.fna /ssd2/av724/var_jul19/final2/dysgu_np30x/ver2/ATCC-temp /ssd2/av724/var_jul19/final2/minimap2_snf/ATCC.bam
2022-07-24 09:03:54,700 [WARNING] Warning: no @rg, using input file name as sample name for output: ATCC
2022-07-24 09:03:54,700 [INFO ] Sample name: ATCC
2022-07-24 09:03:54,700 [INFO ] Writing vcf to stdout
2022-07-24 09:03:54,700 [INFO ] Running pipeline
2022-07-24 09:03:54,764 [INFO ] Inferred read length 3398.0
2022-07-24 09:03:54,764 [INFO ] Max clustering dist 1050
2022-07-24 09:03:54,764 [INFO ] Minimum support 3
2022-07-24 09:03:54,764 [INFO ] Reading --sites
[E::vcf_parse_format] Invalid character '.' in 'GQ' FORMAT field at NC_001493.2:78
Traceback (most recent call last):
File "/home/av724/miniconda3/envs/dysgu2/bin/dysgu", line 8, in
sys.exit(cli())
File "/home/av724/miniconda3/envs/dysgu2/lib/python3.7/site-packages/click/core.py", line 1134, in call
return self.main(*args, **kwargs)
File "/home/av724/miniconda3/envs/dysgu2/lib/python3.7/site-packages/click/core.py", line 1059, in main
rv = self.invoke(ctx)
File "/home/av724/miniconda3/envs/dysgu2/lib/python3.7/site-packages/click/core.py", line 1665, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/av724/miniconda3/envs/dysgu2/lib/python3.7/site-packages/click/core.py", line 1401, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/av724/miniconda3/envs/dysgu2/lib/python3.7/site-packages/click/core.py", line 767, in invoke
return __callback(*args, **kwargs)
File "/home/av724/miniconda3/envs/dysgu2/lib/python3.7/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/av724/miniconda3/envs/dysgu2/lib/python3.7/site-packages/dysgu/main.py", line 444, in call_events
cluster.cluster_reads(ctx.obj)
File "dysgu/cluster.pyx", line 1337, in dysgu.cluster.cluster_reads
File "dysgu/cluster.pyx", line 917, in dysgu.cluster.pipe1
File "/home/av724/miniconda3/envs/dysgu2/lib/python3.7/site-packages/dysgu/sites_utils.py", line 61, in vcf_reader
for idx, r in enumerate(vcf):
File "pysam/libcbcf.pyx", line 4175, in pysam.libcbcf.VariantFile.next
I checked the VCF file. Here in the field at NC_001493.2:78, GQ is equal to MAPQ. In the header file, GQ is defined as an integer and MAPQ is a Float. This is causing the problem. I changed the GQ datatype from integer to float. It solved the problem. I hope this will not affect the results. Please let me know your suggestion.
Hi,
I'm interested in seeing how dysgu performs in genotyping a set of SVs discovered from a consensus of SV callers, but when running with --sites and --all-sites True the output includes new SVs discovered by dysgu. Is there a way to report only the sites from the provided vcf?
Thanks,
Will
Hello, while I was trying to clone your repository (using git clone --recursive https://github.com/kcleal/dysgu.git
), I got the following error:
fatal: remote error: upload-pack: not our ref b341a74e355bcf4ff295307ed22d1f4905facb11
fatal: Fetched in submodule path 'dysgu/htslib', but it did not contain b341a74e355bcf4ff295307ed22d1f4905facb11. Direct fetching of that commit failed.
fatal:
Hello,
Is there any option to output the Translocations according to the VCF specifications? Meaning 2 entries per translocation and the alternate allele in this fashion ]chr18:53456042]N
?
Best regards,
Jonatan
Based on another issue posted here, I am running Dysgu with PacBio CLR reads using the Nanopore model. The specific error that it is outputting is:
Exception ignored in: 'dysgu.assembler.topo_sort2'
ValueError: Graph contains a cycle. Please report this. n=10, w=10, v=0. Node info n was: 10, 0, 0, 32
ValueError: Graph contains a cycle. Please report this. n=22, w=22, v=0. Node info n was: 6, 1, 0, 32
This occurs thousands and thousands of times using both the run and call subcommands.
Any suggestions?
Hi, given a variant like this one (added some newlines for readability):
chr1 54712 5 . <INS> . PASS
SVMETHOD=DYSGUv1.2.3;SVTYPE=INS;END=54712;CHR2=chr1;GRP=426691;NGRP=1;CT=5to3;CIPOS95=0;CIEND95=0;SVLEN=36;
CONTIGA=ttttttttctttctttctttctttctttctttctttctttcTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTCCTCCTTTTCTTTCCTTTTCTTTCTTTCATTCTTTCTTTCTTTTTTAAGTGGCAGGGTCTCACT;
KIND=extra-regional;GC=28.14;NEXP=28;STRIDE=4;
EXPSEQ=ctttctttctttctttctttctttcTTT;
RPOLY=84;OL=0;SU=4;WR=0;PE=0;SR=0;SC=4;BND=4;LPREC=1;RT=pe
GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB
1/1:10:34.5:4:0:0:0:4:4:16.5:1:1:3:76:1:7:0.829:1.129:0.935:0.65
what is the actual inserted sequence?
SVLEN says 36. EXPSEQ has len=28, the lowercase sequence in CONTIGA
has len=41
I understand that in some cases we can't get the full inserted sequence, but for those cases, can we get the left end of the inserted sequence from CONTIGA and the right end from CONTIGB? how?
Would also be nice to be able to get deleted sequence for DEL:
chr1 10250 2 . <DEL> . PASS SVMETHOD=DYSGUv1.2.3;SVTYPE=DEL;END=10282;CHR2=chr1;GRP=212728;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=32;CONTIGA=CTAACCCTAACCCTAAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCTAACCCTAAACCTCACCCCCACCCCCACCCCCACCCCCACCCCCACCCCAACCCTAACCCCTAACCCTAACCCTAACCCTAacCCCTAACCCTAACCCTAACCCTAACCCTAACC;KIND=extra-regional;GC=56.52;NEXP=0;STRIDE=0;EXPSEQ=;RPOLY=120;OL=0;SU=6;WR=2;PE=1;SR=0;SC=2;BND=1;LPREC=1;RT=pe
In this case, the sequence after the lower-case letters in CONTIGA is length 32. Is that the deleted sequence? Or would I look this up in a fasta and contig A is the haplotype with the deletion?
Dear dysgu team,
Good afternoon.
I used ccs.bam file as input in dysqu program as below,
dysgu call --mode pacbio NK.fasta pb.large.ccs.bam > PbNK.vcf
However, I got the Error like this:
"Traceback (most recent call last):
File "/home/hsiang/miniconda3/envs/Python3_9/bin/dysgu", line 5, in
from dysgu.main import cli
File "/home/hsiang/miniconda3/envs/Python3_9/lib/python3.9/site-packages/dysgu/init.py", line 2, in
from dysgu.python_api import DysguSV,
File "/home/hsiang/miniconda3/envs/Python3_9/lib/python3.9/site-packages/dysgu/python_api.py", line 9, in
from dysgu.cluster import pipe1, merge_events
File "dysgu/cluster.pyx", line 1, in init dysgu.cluster
ValueError: numpy.ufunc size changed, may indicate binary incompatibility. Expected 232 from C header, got 216 from PyObject"
Should I run pbmm2 align to get the new alignment bam?
Thank you very much
Sincerely yours,
Clarence
This is more of a request to optimize the code to run faster in production environment. multi threading, or GPU acceleration.
Hi
Thank you for your great tools.
I had a question about the merge function.
I want to merge my different vcf files containing duplicates and I would like to know the properties of this function
does this function merge the same duplicates present on my different samples or does it recover all duplicates at the same time
Thanks in advance
Hello,
Thanks for your tool. I was wondering if there is any way for translocations to get the location on both chromosomes. At the moment I don't see this information in the VCF file (only CHR2). Thank you.
P.S. I think the VCF format would encode these with SVTYPE=BND for each breakend. For my case I don't need the breakend information, but it would be great to get the location.
The majority of the SV calls are low quality so it's important to filter the results to highest quality calls. Is there an easy way to filter cals based on the INFO column annotation and the Genotype info key?
for example:
SU > 10
PE > 5
SC > 10
PROB > 0.95
Filter = PASS
I know this can be done with awk. It would take me many hours to find the right command.
Hi,
I wanted to test this out, I was packaging it up for nixpkgs, but I get this error when running the tests:
AttributeError: module 'pysam.libcalignmentfile' has no attribute 'IteratorColumnAll'
What version of pysam do you use? This is the error log:
Traceback (most recent call last):
File "nix_run_setup", line 8, in <module>
exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\\r\\n', '\\n'), __file__, 'exec'))
File "setup.py", line 136, in <module>
setup(
File "/nix/store/sbiym6y0nmyabnh6mz4xzy26l0fhyqy7-python3.8-setuptools-47.3.1/lib/python3.8/site-packages/setuptools/__init__.py", line 161, in setup
return distutils.core.setup(**attrs)
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/nix/store/sbiym6y0nmyabnh6mz4xzy26l0fhyqy7-python3.8-setuptools-47.3.1/lib/python3.8/site-packages/setuptools/command/test.py", line 238, in run
self.run_tests()
File "/nix/store/sbiym6y0nmyabnh6mz4xzy26l0fhyqy7-python3.8-setuptools-47.3.1/lib/python3.8/site-packages/setuptools/command/test.py", line 256, in run_tests
test = unittest.main(
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/unittest/main.py", line 100, in __init__
self.parseArgs(argv)
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/unittest/main.py", line 124, in parseArgs
self._do_discovery(argv[2:])
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/unittest/main.py", line 244, in _do_discovery
self.createTests(from_discovery=True, Loader=Loader)
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/unittest/main.py", line 154, in createTests
self.test = loader.discover(self.start, self.pattern, self.top)
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/unittest/loader.py", line 349, in discover
tests = list(self._find_tests(start_dir, pattern))
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/unittest/loader.py", line 405, in _find_tests
tests, should_recurse = self._find_test_path(
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/unittest/loader.py", line 483, in _find_test_path
tests = self.loadTestsFromModule(package, pattern=pattern)
File "/nix/store/sbiym6y0nmyabnh6mz4xzy26l0fhyqy7-python3.8-setuptools-47.3.1/lib/python3.8/site-packages/setuptools/command/test.py", line 55, in loadTestsFromModule
tests.append(self.loadTestsFromName(submodule))
File "/nix/store/qy5z9gcld7dljm4i5hj3z8a9l6p37y81-python3-3.8.8/lib/python3.8/unittest/loader.py", line 154, in loadTestsFromName
module = __import__(module_name)
File "/build/source/dysgu/view.py", line 11, in <module>
from dysgu import io_funcs, cluster
File "dysgu/cluster.pyx", line 16, in init dysgu.cluster
from dysgu import coverage, graph, call_component, assembler, io_funcs, re_map, post_call_metrics
File "dysgu/graph.pyx", line 1, in init dysgu.graph
#cython: language_level=3, boundscheck=False, c_string_type=unicode, c_string_encoding=utf8, infer_types=True
AttributeError: module 'pysam.libcalignmentfile' has no attribute 'IteratorColumnAll'
I see this is also an issue mentioned here:
Hi,
I am trying to merge SVs discovered from Oxford Nanopore data (plant species, 60 samples, 20,000-50,000 SVs/sample). It's been running for 10 hours now. Is that an expected runtime? Would it make sense to try to merge step wise? For example, find most closely related samples, merge those first (say groups 5-10) and then merge the merged files to get final non-redundant SVs?
minimap2 --MD -t 16 -ax map-ont ../short/Express617_v1.fa /vol/agcpgl/jlee/BreedPath_nanopore/${ID}.fq.gz | samtools sort -o ${ID}.bam
dysgu call -p 8 -v 2 --min-support 5 --mode nanopore ../short/Express617_v1.fa temp_dir.$ID $ID.bam > $ID.vcf
python flt_vcf.py $p.vcf > $p.pass.vcf
dysgu merge *pass.vcf > long.vcf
Hi,
This is more of a query than an issue. I have twenty samples of a DNA virus. I have both Illumina and nanopore reads. I would like to try Dysgu PE mode, LR mode, and hybrid mode. My samples are full genome data and have coverage of more than 1000x. I would like to know how to set the coverage for the input file. I can set the max coverage option to auto. Is this enough? Alternatively, I can use RASUSA (https://github.com/mbhall88/rasusa) to get 20x data then map with NGMLR or minimap2 or I can use samtools mpileup to reduce the coverage. What is the recommended coverage for PE mode, LR mode, and hybrid mode? My virus is a herpes virus. What do you suggest?
Hi,
Thanks your for this tools.
I have a question. can dysgu run on a cluster. I would install this on the cluster.
Thanks your
Dear @kcleal
I have several questions about the output of dysgu, hope you can take some time to help me. Thank you in advance.
First, I found the result of dysgu is redundant, for example, the following deletion can be merged, in my opinion.
1 1136857 78530 C <DEL> . PASS SVMETHOD=DYSGUv1.3.5;SVTYPE=DEL;END=1136943;CHR2=1;GRP=2350686;NGRP=1;CT=5to3;CIPOS95=0;CIEND95=0;SVLEN=86;CONTIGB=gcggggtttattctaagaatgattatttccCATAATTCCTGGTCCTGTGTGAGTGCCAGCCACCGTTTCCTCGTGTCCCTCTGGATGGGTCATTCCCTGGCCTCTGGCCTGTGTGCTGACCAGTCctgagcggccct;KIND=intra_regional;GC=55.47;NEXP=0;STRIDE=0;EXPSEQ;RPOLY=0;OL=0;SU=3;WR=0;PE=0;SR=0;SC=3;BND=3;LPREC=0;RT=pe GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 0/1:66:60:3:0:0:0:3:3:16.89:0:1:2:54:0:0:0.938:0.789:0.842:0.559
1 1136860 78532 C <DEL> . PASS SVMETHOD=DYSGUv1.3.5;SVTYPE=DEL;END=1136945;CHR2=1;GRP=1565924;NGRP=1;CT=3to5;CIPOS95=0;CIEND95=0;SVLEN=85;CONTIGA=GTTAATTTGCTTGCAAGAAGTTTGAGCCTTTCTGGTCTCGCTTTTACGATGCATTGAAAGTGAGCCTGGAGCGGGGTTTATTCTAAGAATGATTATTTCCCAtaattcctggtcctgtgtgagtgccagccaccgtttcctcgtgtccctctgg;KIND=intra_regional;GC=46.75;NEXP=0;STRIDE=0;EXPSEQ;RPOLY=0;OL=0;SU=6;WR=0;PE=0;SR=0;SC=6;BND=6;LPREC=0;RT=pe GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 0/1:28:60:6:0:0:0:6:6:16.89:0:0:6:76:28:0:0.938:0.789:0.842:0.572
Second, I found that REF allele was not identical with the reference sequence allele at that POS
Finally, for the insertion, in my mind, the END tag should be the same with POS, however, they are different in the output, for example,
1 8058312 59 G TAGCTAGCTAGCTAGCTAGATCTATAAATAGATAGATAG . PASS SVMETHOD=DYSGUv1.3.5;SVTYPE=INS;END=8058322;CHR2=1;GRP=2400;NGRP=1;CT=5to3;CIPOS95=0;CIEND95=0;SVLEN=40;KIND=extra-regional;GC=32.03;NEXP=0;STRIDE=0;EXPSEQ;RPOLY=42;OL=0;SU=13;WR=0;PE=1;SR=3;SC=12;BND=9;LPREC=0;RT=pe GT:GQ:MAPQP:SU:WR:PE:SR:SC:BND:COV:NEIGH10:PS:MS:RMS:RED:BCC:FCC:ICN:OCN:PROB 0/1:147:60:13:0:1:3:12:9:16.65:0:11:6:0:0:13:0.655:1.526:1:0.627
Sincerely,
Zheng zhuqing
Dear,
Thanks for this very promising tool.
I fetched the container but when I run the following command I get an error, can you please help me?
Other docker images do run on my machine (ubuntu 20 server)
Thanks
$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
kcleal/dysgu latest 5e3234d8d9f0 3 months ago 1.98GB
$ sudo docker run kcleal/dysgu test # no output and no error
sudo docker run kcleal/dysgu --help
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: exec: "--help": executable file not found in $PATH: unknown.
ERRO[0000] error waiting for container: context canceled
sudo docker run kcleal/dysgu run --help
docker: Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: exec: "run": executable file not found in $PATH: unknown.
ERRO[0000] error waiting for container: context canceled
Hi,
And thanks for such a useful tool.
I'm exploring polymorphisms including previous data (in addition to new ones) and I'm trying to feed a VCF with --sites to dysgu run (conda env with python 3.9). I get the following error:
File "/opt/miniconda3/envs/dysgu/lib/python3.9/site-packages/dysgu/sites_utils.py", line 70, in vcf_reader
svt = r.info["SVTYPE"]
File "pysam/libcbcf.pyx", line 2581, in pysam.libcbcf.VariantRecordInfo.getitem
KeyError: 'Unknown INFO field: SVTYPE'
The VCF is v4.2 generated with bcftools 1.13
What can I do to fix the problem?
Thanks!
NP
Hello, thank you for creating this great software. I look forward to using this software in my analyses.
However, I have a question, does Dysgu supports PacBio long reads? I know you mention it supports PacBio HiFi reads but I was wondering about the PacBio long reads. Thank you, in advance. And I apologize if I missed this information.
I was interested in trying Dysgu, and created a fresh conda environment with python 3.10. When I tried to install dysgu within that fresh env, I got the following error:
(dysgu)$ pip install dysgu
ERROR: Could not find a version that satisfies the requirement dysgu (from versions: none)
ERROR: No matching distribution found for dysgu
This seemed odd to me, and may have something to do with it the new version released 1 day before.
However, when I dropped my python version down to 3.7 it went through (I did have to install numpy
first). This is just for your documentation or others' troubleshooting.
Hello,
I've been trying to run dysgu v1.3.10 using a conda approach (which is what I've done for previous versions). However, I keep encountering a strange error:
Activating conda environment: <redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435
2022-04-20 11:11:43,425 [INFO ] [dysgu-run] Version: 1.3.10
2022-04-20 11:11:43,429 [INFO ] run -o <redacted>/CSL_pipeline_benchmark/pipeline/structural_variant_calls/hg38_T2T_masked/sentieon_mm2-202112.01/dysgu-1.3.10/HALB3010452.vcf --mode pacbio --pl pacbio --procs 16 --search <redacted>/CSL_pipeline_benchmark/data_files/hg38_contigs.bed.gz <redacted>/reference/hg38_T2T_masked/hg38.fa <redacted>/CSL_pipeline_benchmark/pipeline/structural_variant_calls/hg38_T2T_masked/sentieon_mm2-202112.01/dysgu-1.3.10/HALB3010452_tmp <redacted>/CSL_pipeline_benchmark/pipeline/merged_alignments/hg38_T2T_masked/sentieon_mm2-202112.01/HALB3010452.bam
2022-04-20 11:11:43,430 [INFO ] Destination: <redacted>/CSL_pipeline_benchmark/pipeline/structural_variant_calls/hg38_T2T_masked/sentieon_mm2-202112.01/dysgu-1.3.10/HALB3010452_tmp
2022-04-20 11:11:43,430 [INFO ] Searching regions from <redacted>/CSL_pipeline_benchmark/data_files/hg38_contigs.bed.gz
Traceback (most recent call last):
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/bin/dysgu", line 8, in <module>
sys.exit(cli())
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/site-packages/dysgu/main.py", line 270, in run_pipeline
max_cov_value = sv2bam.process(ctx.obj)
File "dysgu/sv2bam.pyx", line 184, in dysgu.sv2bam.process
File "dysgu/sv2bam.pyx", line 58, in dysgu.sv2bam.parse_search_regions
File "dysgu/sv2bam.pyx", line 59, in dysgu.sv2bam.parse_search_regions
File "<redacted>/CSL_pipeline_benchmark/scripts/.snakemake/conda/a45103408c16f97aed2118b0a7687435/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
I'm not sure what's causing this, the BAM files have worked for all of my other tests. I am also happy to try the Docker if you'd prefer that, but I would need it versioned on docker hub (I only saw "latest" earlier).
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.