mquinodo / automap Goto Github PK
View Code? Open in Web Editor NEWTool to find regions of homozygosity (ROHs) from sequencing data.
Tool to find regions of homozygosity (ROHs) from sequencing data.
The following error was observed on running AutoMap_v1.2.sh:
/home/AutoMap/Scripts/common_analysis.sh: line 83: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory
running "locale" identified the correct files on the local system:
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
Changed /home/AutoMap/Scripts/common_analysis.sh: line 83: en_US.UTF-8 to en_GB.UTF-8 solved the issue. This similar issue may occur for others where en_US.UTF-8 is not the default locale on the local system.
Hi,
I'm reporting a small misprint in the .HomRegions.tsv
output of AutoMap_v1.2.sh. The footer reads
AutoMap v1.0 used for analysis
Thanks for the great tool. Easy to set up and run and quickly produces easy to understand and useful output.
Best,
George
After processing the VCF file we get a list of the number of variants, like how many variants were found in each chromosome. Can we get a list of those variants?
#Chr Begin End Size(Mb) Nb_variants Percentage_homozygosity
chr1 41513572 44129738 2.62 58 89.66
chr1 86447582 89713973 3.27 42 90.48
chr2 128182225 130580573 2.40 27 88.89
chr3 38698526 42184659 3.49 36 91.67
chr4 81447479 88390810 6.94 48 95.83
chr5 139380057 146062570 6.68 85 89.41
chr5 146515817 148402169 1.89 29 93.10
chr6 46716485 47879229 1.16 37 94.59
chr11 47338502 49154505 1.82 26 88.46
chr15 51377428 55196867 3.82 40 92.50
chr19 18938389 22180787 3.24 42 95.24
chr22 27224553 35082493 7.86 81 93.83
So I want to know what are the 58 variants in chromosome 1. Is there a way we can get that?
Hi!
I was thinking about how to calculate which percentage of an exome that the total sum of ROH represents (to try to estimate some kind of inbreeding coefficient). My problem is how to estimate the total length of the exome in question according to AutoMap for each sample, and how this depends on teh parametres selected.
Any help?
Thanks in advance
Hi,
I'm using latest version (v1.2) and when running the AutoMap_v1.2.sh file, with --multivcf option, I have an error because it's trying to seek the v1.0 bash file.
Command line :
AutoMap_v1.2.sh --vcf ../multiSample_annotated.vcf.gz --genome hg38 --out multiSample_AutoMap_HM --multivcf
Error :
## Launching analyis for sample: Sample_Name1
bash: /shared/home/user/AutoMap/AutoMap_v1.0.sh: Aucun fichier ou dossier de ce type
## Launching analyis for sample: Sample_Name2
bash: /shared/home/user/AutoMap/AutoMap_v1.0.sh: Aucun fichier ou dossier de ce type
Aucun fichier ou dossier de ce type -> means can't find the file.
Best,
Julien
Dear team,
Thanks for the great software!
I have two questions. I have a VCF file generated using GATK.
Regards,
Najeeb
Hi, thank you for making the project open-source!
I was trying to run the test case but looks like it is failing the new version.
The only change I made was the grep -P
to grep -E
in the main script.
bash AutoMap_v1.2.sh --vcf Test/TestSample.vcf --out testing/ --genome hg19
# bcftools higher or equal to v1.9
# bedtools higher or equal to v2.24.0
# perl higher or equal to v5.22.0
# R higher or equal to v3.2.0
## Parameters used by default:
-> No use of --DP option, value set as default: 8
-> No use of --binomial option, value set as default: 0.000001
-> No use of --percaltlow option, value set as default: 0.25
-> No use of --percalthigh option, value set as default: 0.75
-> No use of --window option, value set as default: 7
-> No use of --windowthres option, value set as default: 5
-> No use of --minsize option, value set as default: 1
-> No use of --minvar option, value set as default: 25
-> No use of --minperc option, value set as default: 88
-> No use of --maxgap option, value set as default: 10
-> chrX will NOT be included in the analysis and in the graphics.
-> Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
AutoMap_v1.2.sh: line 326: [: : integer expression expected
1) Parsing of VCF file and variant filtering
* 111569 variants before filtering
* 80260 variants after filtering
2) Detection of ROHs with sliding window, trimming and extension
* Treatment of the data
* Printing of the homozygous regions
3) Filtering of regions found and output to text file
* 0 regions before filtering
* 0 regions after filtering with 0 Mb in total
Hi,
you should change AutoMap_v1.2.sh to AutoMap_v1.3.sh also in the code at lines 283, 286, 289 and 292
Best
bash /data/software_wdh/AutoMap-master/AutoMap_v1.2.sh --vcf /data/software_wdh/AutoMap-master/Test/TestSample.vcf --out /data/software_wdh/AutoMap-master/test/ --genome hg19
-> No use of --DP option, value set as default: 8
-> No use of --binomial option, value set as default: 0.000001
-> No use of --percaltlow option, value set as default: 0.25
-> No use of --percalthigh option, value set as default: 0.75
-> No use of --window option, value set as default: 7
-> No use of --windowthres option, value set as default: 5
-> No use of --minsize option, value set as default: 1
-> No use of --minvar option, value set as default: 25
-> No use of --minperc option, value set as default: 88
-> No use of --maxgap option, value set as default: 10
-> chrX will NOT be included in the analysis and in the graphics.
-> Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
grep: this version of PCRE is compiled without UTF support
bash AutoMap_v1.2.sh -h
bcftools higher or equal to v1.9
bedtools higher or equal to v2.24.0
perl higher or equal to v5.22.0
R higher or equal to v3.2.0
ERROR: Usage: AutoMap_v1.2.sh [--vcf ] [--genome <hg19|hg38>] [--out ] [--common] [--id ] [--panel ] [--panelname ] [--DP <0-99>] [--binomial <0-1.0>] [--percaltlow <0-1.0>] [--percalthigh <0-1.0>] [--window <3-999>] [--windowthres <1-999>] [--minsize <0-99>] [--minvar <1-999>] [--minperc <0-100>] [--maxgap <0-1000Mb>] [--chrX] [--extend <0-100Mb>]. Exit.
(base)
bash /home/databench/AutoMap_v1.2_nonhuman.sh --vcf Routput_raw_variants.vcf --out Routput_directory --DP 8 --binomial 0.000001 --percaltlow 0.25 --percalthigh 0.75 --window 7 --windowthres 5 --minsize 2 --minvar 2 --minperc 88 --maxgap 10 --chrX No --extend 1
Any suggestions on how to fix this error?
Hi,
I ran AutoMap v1.2, with 2 vcf files, --common and --chrX option.
I've noticed that the output.tsv file in not well sorted by the Begin position.
#Chr Begin End Size(Mb)
chr2 114204074 115353814 1.15
chr5 130804987 132560051 1.76
chr8 49371142 50469394 1.10
chr11 73507185 74558585 1.05
chrX 2782116 4985215 2.20
chrX 4985295 9404537 4.42
chrX 9414219 49319525 39.91
chrX 49591026 52857711 3.27
chrX 52862903 55145938 2.28
chrX 55146275 62965055 7.82
chrX 62965138 67415295 4.45
chrX 72794567 74583034 1.79
chrX 67415333 72755923 5.34
chrX 74583402 89204206 14.62
chrX 89207472 119469743 30.26
chrX 119471254 131678991 12.21
chrX 131680824 136873868 5.19
chrX 136874416 141007457 4.13
chrX 144085451 149596554 5.51
chrX 141111645 144078700 2.97
chrX 153605174 155469854 1.86
chrX 150215170 153605159 3.39
In this example, the line chrX 74583402 89204206 14.62
is after the line chrX 67415333 72755923 5.34
.
Same with chrX 141111645 144078700 2.97
and chrX 144085451 149596554 5.51
.
I do not know if it's related with chrX. It's seems not to be important but still, I was confused at first.
Another question how the tools know if a sample is female or male ?
Best,
Julien
hi there,
Thanks for this tool.
I am interested in using this tool on a WES data. I have two cohort of 4K and 11K samples.
I have a VCF file per CHR comprising all these individuals. I see that --common
option cannot be used with --multivcf
flag.
I use
bash $AUTOMAP_HOME/AutoMap_v1.0.sh --vcf $VCF_file --multivcf --out ROH_output/CHR22 --genome hg19
It generates VCF per sample. With 22 CHRs I will have 22 times 4K VCFs.
Next, I would like to get common ROHs from these.
--vcf VCF1,VCF2,VCF3
Is there a way provide list of VCFs in a file? I do not think it is fun to provide a list of 4K/11K VCFs in a bash string.
Let me know if you need any help with code/structuring or testing this.
best,
Dear sir,
I had try to use the command:
bash AutoMap_v1.0.sh --vcf 0366.HC.vcf 0367.HC.vcf 0368.HC.vcf /0369.HC.vcf --out /working_space/ --genome hg38
and got the message below:
bcftools higher or equal to v1.9
bedtools higher or equal to v2.24.0
perl higher or equal to v5.22.0
R higher or equal to v3.2.0
Parameters used by default:
ERROR: You need to provide the genome version through --genome option (hg19 or hg38). Exit.
I sure I had provided the "--genome" option.
Hi,
I have WES VCFs for a family (father, mother and two kids ), those VCFs were generated using Illumina DRAGEN Bio-IT Platform, I used the Automap command as:
bash AutoMap_v1.2.sh --vcf GN-21-0017.vcf,GN-21-0018.vcf,GN-21-0019.vcf,GN-21-0022.vcf --out family --genome hg19 --common
I tried to change many AutoMap options however I usually get 0.00 Mb common ROH. As far as I understand I should get a common areas between them as they are one family. Could you please assist me to understand that, does the tool works with WES data? do I need to use ny own bed file?
Thanks
I get the following error, bcftools on my workstation is the latest version v1.9.1
root@CNV:/home/mai/automap/AutoMap# bash /home/mai/automap/AutoMap/AutoMap_v1.2.sh
--vcf /home/mai/automap/AutoMapTest/TestSample.vcf
--genome hg19
--out /home/mai/automap/AutoMap/test
--vcf: command not found
--genome: command not found
--out: command not found
I keep facing this issue that AutoMap does not detect all the ROHs and I can't figure out why. The parameters I chose should allow all the large ROHs to get detected. Do you know what can be the issue here?
--DP 10 --percaltlow 0.3 --minsize 5
Thank you!
I suppose if I change the repeats file that it is using with a repeats file for my species it should, but would that be wrong to do it? If it is really not possible, please state this clearly on your program description page that it only works for human data. I did notice that sentence in the paper but couldn't guess it really doesn't work with any other species. Online version makes sense because you can't have repeat datasets for all species but unix version?
Another question, what if I don't incorporate repeat coordinates at all?
Thanks.
Hello,
Was curious to know if the tool can be used on RNA-seq data, or if some parameters have to be modified to accommodate RNA data.
When using --common option, after the computation I get this error, just before ending analysis:
AutoMap/Scripts/common_analysis.sh: line 83: printf: 24.8027: invalid number
Hi,
I got the following error when analyzing one of my files:
bash /kyukon/data/gent/vo/000/gvo00082/research/mvheetve/temp/CEP162/AutoMap/AutoMap_v1.0.sh --vcf /kyukon/data/gent/vo/000/gvo00082/research/mvheetve/temp/CEP162/fastq/analysis/samples_D1800416-merged/final/18-00416/18-00416-gatk-haplotype_norm_nonref.vcf.gz --genome hg38 --out /kyukon/data/gent/vo/000/gvo00082/research/mvheetve/temp/CEP162/common_reanalysis
# bcftools higher or equal to v1.9
# bedtools higher or equal to v2.24.0
# perl higher or equal to v5.22.0
# R higher or equal to v3.2.0
## Parameters used by default:
-> No use of --DP option, value set as default: 8
-> No use of --binomial option, value set as default: 0.000001
-> No use of --percaltlow option, value set as default: 0.25
-> No use of --percalthigh option, value set as default: 0.75
-> No use of --window option, value set as default: 7
-> No use of --windowthres option, value set as default: 5
-> No use of --minsize option, value set as default: 1
-> No use of --minvar option, value set as default: 25
-> No use of --minperc option, value set as default: 88
-> No use of --maxgap option, value set as default: 10
-> chrX will NOT be included in the analysis and in the graphics.
-> Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
## WARNING: No sample name provided through --id option, name will be taken from the VCF: 18-00416
## ERROR: Less than 10,000 variants (0 detected variants) with AD and DP available. Exit.
I tried setting --DP
to 0, but same error occured. vcf file attached and output of bcftools query
for FORMAT/AD
and FORMAT/DP
, which clearly shows AD and DP values for all variants (albeit in the low numbers at times). Any idea what might be the problem?
Regards
mvheetve
getting below bcftools error when running AutoMap_v1.2.sh script on singularity container
Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
/opt/Container/AutoMap/AutoMap_v1.2.sh: line 309: /opt/Container/AutoMap/.log: Read-only file system
When running bash AutoMap_v1.2.sh on local system its running correctly
The minimum ROH threshold (--minsize
) that automap uses by default is 1 Mb. How low can I go in setting this threshold? Is 100kb or 50kb too small a threshold to use? Is there a threshold below which its accuracy falls significantly in your experience? Thanks!
Hi
Thanks for creating the tool, it was really helpful
I have to run the ROH for 455 samples and want to find the common ROH between them. And it's taking 20mins for each sample and then finally common analysis.
Is there a way, I can fasten the process?
Best,
Vignesh
Hi,
I have installed all prerequisites with the latest versions and tried to use the command below:
(automap_analysis) tugcebozkurt@x86_64-apple-darwin13 AutoMap-master % bash AutoMap_v1.2.sh --vcf ../family_2311713/vcf/62524530_PASS_only.vcf --out 62524530 --genome hg19
# bcftools higher or equal to v1.9
# bedtools higher or equal to v2.24.0
# perl higher or equal to v5.22.0
# R higher or equal to v3.2.0
## Parameters used by default:
-> No use of --DP option, value set as default: 8
-> No use of --binomial option, value set as default: 0.000001
-> No use of --percaltlow option, value set as default: 0.25
-> No use of --percalthigh option, value set as default: 0.75
-> No use of --window option, value set as default: 7
-> No use of --windowthres option, value set as default: 5
-> No use of --minsize option, value set as default: 1
-> No use of --minvar option, value set as default: 25
-> No use of --minperc option, value set as default: 88
-> No use of --maxgap option, value set as default: 10
-> chrX will NOT be included in the analysis and in the graphics.
-> Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
AutoMap_v1.2.sh: line 326: [: : integer expression expected
1) Parsing of VCF file and variant filtering
* 4186573 variants before filtering
* 1842046 variants after filtering
2) Detection of ROHs with sliding window, trimming and extension
* Treatment of the data
* Printing of the homozygous regions
3) Filtering of regions found and output to text file
* 0 regions before filtering
* 0 regions after filtering with 0 Mb in total
4) Generating PDF
But, the output could not be generated and I don't know why. It may be related to line 326 as implied in the above (AutoMap_v1.2.sh: line 326: [: : integer expression expected). Btw, I got the same problem when I used the test data you provided. Then I tried to run the same code on a Windows computer then it worked without any problem.
Thank you in advance!
Tugce
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.