Hi, I am trying to merge SVs discovered from Oxford Nanopore data (plant species,

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi, thanks for the reply. It actually finished: <div class="snippet-clipbo

Merging SV from Oxford Nanopore - expected runtime about dysgu HOT 3 CLOSED

kcleal commented on May 20, 2024

Merging SV from Oxford Nanopore - expected runtime

from dysgu.

Comments (3)

kcleal commented on May 20, 2024

Hi @agolicz,
Thanks for reporting this. That is much longer than expected, I will try and help get this fixed. Would you mind checking a few things for me? If possible could you check to see how much memory is being consumed? Also, if you have time, would you mind trying to merge just two of your samples, rather than the whole cohort - this should only take a minute or so, and it would be useful to know if it completes in a reasonable time. I have mainly tested merging on larger cohorts of short-read data, so its possible there is a scaling issue for long read data.

I had a quick scan of the code, and it looks like there might be a scaling issue if there is a complex region of the genome with lots of SVs that overlap each other or lots of diversity - this situation is common near centromeric regions in humans, for example. Merging will essentially be an all vs all comparison in these regions, which might give rise to the high run time. However, dysgu usually gives these types of rearrangements low probability, so its possible you have filtered those out with the flt_vcf.py script?

from dysgu.

agolicz commented on May 20, 2024

Hi,
thanks for the reply.
It actually finished:

2022-06-14 21:50:43,619 [INFO   ]  [dysgu-merge] Version: 1.3.11
2022-06-14 21:54:54,489 [INFO   ]  Merge distance: 500 bp
2022-06-15 08:56:46,338 [INFO   ]  SVs output to stdout
2022-06-15 08:56:46,342 [INFO   ]  Input samples: ['a702', 'a703', 'a705', 'a709', 'a711', 'a714', 'a715', 'a716', 'a717', 'a723', 'a724', 'a726', 'a727', 'a728', 'a729', 'a730', 'a731', 'a732', 'a733', 'a734', 'a735', 'a743', 'a748', 'a762', 'a764', 'a765', 'a776', 'a778', 'a779', 'a783', 'a784', 'a786', 'a790', 'a792', 'a796', 'a797', 'a802', 'a810', 'a815', 'a816', 'a817', 'a818', 'a820', 'a823', 'a824', 'a825', 'a827', 'a828', 'a830', 'a833', 'a834', 'a835', 'a836', 'a838', 'a839', 'a840', 'a841', 'a842', 'a843', 'a845']
2022-06-15 09:08:44,000 [INFO   ]  Sample rows before merge [27524, 39428, 36668, 18995, 42091, 42029, 30433, 26344, 27145, 40743, 1269, 36795, 36315, 43578, 39322, 43658, 35629, 37191, 41179, 29260, 34499, 20557, 39690, 36079, 38506, 42710, 49017, 40979, 41427, 44644, 42835, 40906, 42328, 40995, 32513, 51, 41343, 40148, 39025, 46364, 39965, 20092, 31426, 35372, 37474, 14263, 20316, 33790, 44510, 40746, 37944, 19803, 40951, 32969, 24995, 10389, 24662, 31023, 15909, 26496], rows after 375685
2022-06-15 09:08:44,009 [INFO   ]  dysgu merge complete h:m:s, 11:18:00

 cat *pass.vcf | grep -v "^#" | wc -l
2013307
grep -v "^#" long.vcf | wc -l
375685

Yes, flt_vcf.py only keeps the variants with PASS.
Can't check exact memory usage but it had to be less than 40G which was the limit.
Just trying two files.

dysgu merge a843.pass.vcf a845.pass.vcf > dm.t.vcf
2022-06-15 11:54:15,586 [INFO   ]  [dysgu-merge] Version: 1.3.11
2022-06-15 11:54:18,394 [INFO   ]  Merge distance: 500 bp
2022-06-15 11:54:34,354 [INFO   ]  SVs output to stdout
2022-06-15 11:54:34,392 [INFO   ]  Input samples: ['a843', 'a845']
2022-06-15 11:54:47,542 [INFO   ]  Sample rows before merge [15909, 26496], rows after 36978
2022-06-15 11:54:47,543 [INFO   ]  dysgu merge complete h:m:s, 0:00:31

11hrs is not too bad (we're used to that in plants :)). I was just surprised because merging from 100 short read samples was much quicker.

If you are interested in testing merging for long reads I am planning to run SVJedi to genotype and can report if there any issues, sites with too many missing genotypes etc.

minimap2+dysgu have done very well in our in-house comparisons for Brassica napus! :)

from dysgu.

kcleal commented on May 20, 2024

Glad it finished! I think the runtime is probably caused by high genome complexity in that case. Would be very interested to hear how you get on - feed back from users is very valuable! If you have not come across it already, jasmine could be a useful tool for merging also: https://github.com/mkirsche/Jasmine

from dysgu.

Merging SV from Oxford Nanopore - expected runtime about dysgu HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent