Comments (3)
Hi @warthmann,
The documentation is not very clear, apologies. When you filter against a normal sample, the calls returned should be unique to the mutant line. In theory this should mean all the sites in the normal are 0/0. In practise, some sites may have low support. When you supply a --sites
file, the whole genome is analysed, not just the regions around the input sites. Any variant in the --sites file will make SV detection more sensitive around those sites, but all SVs from the genome will still be reported, which is consistent with your output.
I think the behaviour you were expected was a bit different - using dysgu to only re-genotype a file (rather than discovery as well). If you only want re-genotyping around the input sites, you can provide a bed file of regions to search using the --search
option.
Hope that helps
from dysgu.
hello,
thank you for this clarification.
Regarding the --search flag in dysgu run
. I can parse the required positions from the vcf. converting this into a .bed file, will I need to pad the positions in some way or can you recommend using the (same) position of the variant as BEGIN and END in the .bed? Alternatively, can I suggest including a feature similar to --all-sites for dysgu --filter? I.e., such that upon set flag dysgu --filter returns a vcf that also contains entries for a list of specified samples, or all samples for which a bam file is provided?
thanks a lot
from dysgu.
That is a good suggestion, I can try and add that in the next release.
In the meantime, you will need to parse the dysgu vcf to extract the regions surrounding each breakpoint. I recommend using pysam for this. The pad should be similar to the insert size: Pseudo code would be along the lines
import pysam
vcf = pysam.VariantFile('dysgu.vcf')
outbed = open('out.bed', 'w')
insert = 200
for v in vcf.fetch():
outbed.write(f"{v.chrom}\t{v.pos - insert}\t{v.pos + insert}\n")
if r.info['SVTYPE'] != 'TRA':
outbed.write(f"{v.chrom}\t{v.end - insert}\t{v.end + insert}\n")
else:
outbed.write(f"{v.info['CHR']}\t{v.info['CHR2_POS'] - insert}\t{v.info['CHR2_POS'] + insert}\n")
Dysgu will sort and merge intervals as needed.
from dysgu.
Related Issues (20)
- Generating Alternative Reference HOT 16
- Run OSError: [Errno 24] Too many open files Mac OS M HOT 4
- OverflowError: can't convert negative value to size_t HOT 2
- Dysgu filter IndexError: string index out of range HOT 6
- long reads default mapq lowered to 1: help text for dysgu call still says pacbio and nanopore mode has --mq 20 HOT 1
- When will docker image with new release be available? HOT 1
- Got an warning when Loading Model in "dysgu run" HOT 1
- clarification needed on RG and samples HOT 4
- Getting SV length in dysgu output vcf HOT 3
- _pickle.UnpicklingError: invalid load key, 'A'. Failed to read from standard input: unknown file type HOT 2
- Subject: Inquiry on Benchmarking DEL and INS Events with dysgu Pipelines. HOT 35
- TypeError: an integer is required when using --sites option and manta.vcf HOT 6
- When combining a large number of samples, the speed is very slow HOT 13
- When merging a large number of samples, the process is very slow
- Long run time HOT 13
- Parameters for R9 Guppy2, 4, 6 HOT 4
- Process_KILLED HOT 7
- Paired-end reads calling sv HOT 6
- Error installing with conda HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dysgu.