Hello,
I downloaded the latest version of all files in the directory /pub/dbVar/sandbox/sv_datasets/nonredundant/deletions
.
However, it seems the SVs in chr 1, 2, 9, 10, 11, 12 and X are missing in some files (e.g. GRCh37.nr_deletions.tsv, GRCh37.nr_deletions.bed, GRCh37.nr_deletions.tsv, and GRCh38.nr_deletions.bed).
For example, when I checked the file using sed '1,2d' GRCh38.nr_deletions.tsv | cut -f1 | sort -k1,1V | uniq -c
, it gives:
183462 3
215779 4
179941 5
186375 6
177445 7
156126 8
100796 13
102341 14
88040 15
103951 16
97261 17
84456 18
89493 19
73053 20
46757 21
51888 22
7323 Y
56 mt
But GRCh38.nr_deletions.pathogenic.tsv (which I believe is a subset of GRCh38.nr_deletions.tsv) contains SVs from all chromosomes:
sed '1,2d' GRCh38.nr_deletions.pathogenic.tsv | cut -f1 | sort -k1,1V | uniq -c
1126 1
1657 2
788 3
499 4
677 5
729 6
863 7
561 8
725 9
452 10
691 11
370 12
409 13
321 14
765 15
1415 16
1175 17
342 18
489 19
272 20
189 21
661 22
1847 X
76 Y
16 mt
Would be great if you could help update the files. Thank you!
Best,
Anson