mquinodo / automap Goto Github PK

View Code? Open in Web Editor NEW

26.0 26.0 9.0 58.85 MB

Tool to find regions of homozygosity (ROHs) from sequencing data.

Shell 62.35% Perl 29.59% R 8.05%

automap's People

Contributors

Stargazers

Watchers

Forkers

ddpinto leequn xiaosheng8361 salonso65 novapyth manavalang tbrugiere bisaradpritha arth032

automap's Issues

Unable to find locale file in common_analysis.sh

The following error was observed on running AutoMap_v1.2.sh:

/home/AutoMap/Scripts/common_analysis.sh: line 83: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory

running "locale" identified the correct files on the local system:

LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

Changed /home/AutoMap/Scripts/common_analysis.sh: line 83: en_US.UTF-8 to en_GB.UTF-8 solved the issue. This similar issue may occur for others where en_US.UTF-8 is not the default locale on the local system.

Small misprint in AutoMap v1.2.0 TSV output footer

Hi,

I'm reporting a small misprint in the .HomRegions.tsv output of AutoMap_v1.2.sh. The footer reads

AutoMap v1.0 used for analysis

Thanks for the great tool. Easy to set up and run and quickly produces easy to understand and useful output.

Best,
George

Can we get the list of variants in the result?

After processing the VCF file we get a list of the number of variants, like how many variants were found in each chromosome. Can we get a list of those variants?

#Chr	Begin	End	Size(Mb)	Nb_variants	Percentage_homozygosity
chr1	41513572	44129738	2.62	58	89.66
chr1	86447582	89713973	3.27	42	90.48
chr2	128182225	130580573	2.40	27	88.89
chr3	38698526	42184659	3.49	36	91.67
chr4	81447479	88390810	6.94	48	95.83
chr5	139380057	146062570	6.68	85	89.41
chr5	146515817	148402169	1.89	29	93.10
chr6	46716485	47879229	1.16	37	94.59
chr11	47338502	49154505	1.82	26	88.46
chr15	51377428	55196867	3.82	40	92.50
chr19	18938389	22180787	3.24	42	95.24
chr22	27224553	35082493	7.86	81	93.83

So I want to know what are the 58 variants in chromosome 1. Is there a way we can get that?

How to calculate the fraction of the exome in ROH

Hi!
I was thinking about how to calculate which percentage of an exome that the total sum of ROH represents (to try to estimate some kind of inbreeding coefficient). My problem is how to estimate the total length of the exome in question according to AutoMap for each sample, and how this depends on teh parametres selected.

Any help?

Thanks in advance

ERROR : AutoMap_v1.0.sh file does not exists (while using v1.2)

Hi,

I'm using latest version (v1.2) and when running the AutoMap_v1.2.sh file, with --multivcf option, I have an error because it's trying to seek the v1.0 bash file.

Command line :
AutoMap_v1.2.sh --vcf ../multiSample_annotated.vcf.gz --genome hg38 --out multiSample_AutoMap_HM --multivcf

Error :

## Launching analyis for sample: Sample_Name1
bash: /shared/home/user/AutoMap/AutoMap_v1.0.sh: Aucun fichier ou dossier de ce type
 ## Launching analyis for sample: Sample_Name2
bash: /shared/home/user/AutoMap/AutoMap_v1.0.sh: Aucun fichier ou dossier de ce type

Aucun fichier ou dossier de ce type -> means can't find the file.

Best,
Julien

PASS filtering

Dear team,

Thanks for the great software!

I have two questions. I have a VCF file generated using GATK.

Does the tool filter out the Non PASS calls ?
Does the tool filter out the call where ALT is "*"?

Regards,
Najeeb

Empty output for the test case

Hi, thank you for making the project open-source!
I was trying to run the test case but looks like it is failing the new version.
The only change I made was the grep -P to grep -E in the main script.

bash AutoMap_v1.2.sh --vcf Test/TestSample.vcf --out testing/ --genome hg19
# bcftools higher or equal to v1.9
# bedtools higher or equal to v2.24.0
# perl higher or equal to v5.22.0
# R higher or equal to v3.2.0
## Parameters used by default:
 -> No use of --DP option, value set as default: 8
 -> No use of --binomial option, value set as default: 0.000001
 -> No use of --percaltlow option, value set as default: 0.25
 -> No use of --percalthigh option, value set as default: 0.75
 -> No use of --window option, value set as default: 7
 -> No use of --windowthres option, value set as default: 5
 -> No use of --minsize option, value set as default: 1
 -> No use of --minvar option, value set as default: 25
 -> No use of --minperc option, value set as default: 88
 -> No use of --maxgap option, value set as default: 10
 -> chrX will NOT be included in the analysis and in the graphics.
 -> Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
AutoMap_v1.2.sh: line 326: [: : integer expression expected

1) Parsing of VCF file and variant filtering
 *   111569 variants before filtering
 *    80260 variants after filtering

2) Detection of ROHs with sliding window, trimming and extension
 * Treatment of the data
 * Printing of the homozygous regions

3) Filtering of regions found and output to text file
 *        0 regions before filtering
 *        0 regions after filtering with 0 Mb in total

change v1.2 to v1.3

Hi,
you should change AutoMap_v1.2.sh to AutoMap_v1.3.sh also in the code at lines 283, 286, 289 and 292

Best

Less than 10,000 variants (0 detected variants) with AD (or AO) and DP available

bash /data/software_wdh/AutoMap-master/AutoMap_v1.2.sh --vcf /data/software_wdh/AutoMap-master/Test/TestSample.vcf --out /data/software_wdh/AutoMap-master/test/ --genome hg19

bcftools higher or equal to v1.9

bedtools higher or equal to v2.24.0

perl higher or equal to v5.22.0

R higher or equal to v3.2.0

Parameters used by default:

-> No use of --DP option, value set as default: 8
-> No use of --binomial option, value set as default: 0.000001
-> No use of --percaltlow option, value set as default: 0.25
-> No use of --percalthigh option, value set as default: 0.75
-> No use of --window option, value set as default: 7
-> No use of --windowthres option, value set as default: 5
-> No use of --minsize option, value set as default: 1
-> No use of --minvar option, value set as default: 25
-> No use of --minperc option, value set as default: 88
-> No use of --maxgap option, value set as default: 10
-> chrX will NOT be included in the analysis and in the graphics.
-> Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
grep: this version of PCRE is compiled without UTF support

WARNING: No sample name provided through --id option, name will be taken from the VCF: TestSample

ERROR: Less than 10,000 variants (0 detected variants) with AD (or AO) and DP available. Exit.

--vcflist option not available in v1.2

bash AutoMap_v1.2.sh -h
bcftools higher or equal to v1.9
bedtools higher or equal to v2.24.0
perl higher or equal to v5.22.0
R higher or equal to v3.2.0
ERROR: Usage: AutoMap_v1.2.sh [--vcf ] [--genome <hg19|hg38>] [--out ] [--common] [--id ] [--panel ] [--panelname ] [--DP <0-99>] [--binomial <0-1.0>] [--percaltlow <0-1.0>] [--percalthigh <0-1.0>] [--window <3-999>] [--windowthres <1-999>] [--minsize <0-99>] [--minvar <1-999>] [--minperc <0-100>] [--maxgap <0-1000Mb>] [--chrX] [--extend <0-100Mb>]. Exit.
(base)

Error

bash /home/databench/AutoMap_v1.2_nonhuman.sh --vcf Routput_raw_variants.vcf --out Routput_directory --DP 8 --binomial 0.000001 --percaltlow 0.25 --percalthigh 0.75 --window 7 --windowthres 5 --minsize 2 --minvar 2 --minperc 88 --maxgap 10 --chrX No --extend 1

bcftools higher or equal to v1.9

bedtools higher or equal to v2.24.0

perl higher or equal to v5.22.0

R higher or equal to v3.2.0

ERROR: Usage: /home/databench/AutoMap_v1.2_nonhuman.sh [--vcf ] [--out ] [--common] [--id ] [--panel ] [--panelname ] [--DP <0-99>] [--binomial <0-1.0>] [--percaltlow <0-1.0>] [--percalthigh <0-1.0>] [--window <3-999>] [--windowthres <1-999>] [--minsize <0-99>] [--minvar <1-999>] [--minperc <0-100>] [--maxgap <0-1000Mb>] [--chrX] [--extend <0-100Mb>]. Exit

Any suggestions on how to fix this error?

output.tsv not well sorted

Hi,

I ran AutoMap v1.2, with 2 vcf files, --common and --chrX option.
I've noticed that the output.tsv file in not well sorted by the Begin position.

#Chr	Begin	End	Size(Mb)
chr2    114204074       115353814       1.15
chr5    130804987       132560051       1.76
chr8    49371142        50469394        1.10
chr11   73507185        74558585        1.05
chrX	2782116	4985215	2.20
chrX	4985295	9404537	4.42
chrX	9414219	49319525	39.91
chrX	49591026	52857711	3.27
chrX	52862903	55145938	2.28
chrX	55146275	62965055	7.82
chrX	62965138	67415295	4.45
chrX	72794567	74583034	1.79
chrX	67415333	72755923	5.34
chrX	74583402	89204206	14.62
chrX	89207472	119469743	30.26
chrX	119471254	131678991	12.21
chrX	131680824	136873868	5.19
chrX	136874416	141007457	4.13
chrX	144085451	149596554	5.51
chrX	141111645	144078700	2.97
chrX	153605174	155469854	1.86
chrX	150215170	153605159	3.39

In this example, the line chrX 74583402 89204206 14.62 is after the line chrX 67415333 72755923 5.34.
Same with chrX 141111645 144078700 2.97 and chrX 144085451 149596554 5.51.

I do not know if it's related with chrX. It's seems not to be important but still, I was confused at first.

Another question how the tools know if a sample is female or male ?

Best,
Julien

multi vcf: how to use common flag and/or, provide list of VCFs with --common

hi there,

Thanks for this tool.
I am interested in using this tool on a WES data. I have two cohort of 4K and 11K samples.

I have a VCF file per CHR comprising all these individuals. I see that --common option cannot be used with --multivcf flag.

I use

bash $AUTOMAP_HOME/AutoMap_v1.0.sh --vcf $VCF_file  --multivcf --out ROH_output/CHR22  --genome hg19

It generates VCF per sample. With 22 CHRs I will have 22 times 4K VCFs.

Next, I would like to get common ROHs from these.

--vcf VCF1,VCF2,VCF3

Is there a way provide list of VCFs in a file? I do not think it is fun to provide a list of 4K/11K VCFs in a bash string.

Let me know if you need any help with code/structuring or testing this.
best,

--genome error when using mutiple vcf files

Dear sir,
I had try to use the command:
bash AutoMap_v1.0.sh --vcf 0366.HC.vcf 0367.HC.vcf 0368.HC.vcf /0369.HC.vcf --out /working_space/ --genome hg38
and got the message below:

bcftools higher or equal to v1.9
bedtools higher or equal to v2.24.0
perl higher or equal to v5.22.0
R higher or equal to v3.2.0
Parameters used by default:
ERROR: You need to provide the genome version through --genome option (hg19 or hg38). Exit.

I sure I had provided the "--genome" option.

common for a family

Hi,
I have WES VCFs for a family (father, mother and two kids ), those VCFs were generated using Illumina DRAGEN Bio-IT Platform, I used the Automap command as:
bash AutoMap_v1.2.sh --vcf GN-21-0017.vcf,GN-21-0018.vcf,GN-21-0019.vcf,GN-21-0022.vcf --out family --genome hg19 --common
I tried to change many AutoMap options however I usually get 0.00 Mb common ROH. As far as I understand I should get a common areas between them as they are one family. Could you please assist me to understand that, does the tool works with WES data? do I need to use ny own bed file?
Thanks

issues with bcftools version.

I get the following error, bcftools on my workstation is the latest version v1.9.1

root@CNV:/home/mai/automap/AutoMap# bash /home/mai/automap/AutoMap/AutoMap_v1.2.sh

--vcf /home/mai/automap/AutoMapTest/TestSample.vcf

--genome hg19

--out /home/mai/automap/AutoMap/test

ERROR: bcftools lower than v1.9 -> Please Update! Exit.

--vcf: command not found

--genome: command not found

--out: command not found

Why some of the ROH are not detected?

I keep facing this issue that AutoMap does not detect all the ROHs and I can't figure out why. The parameters I chose should allow all the large ROHs to get detected. Do you know what can be the issue here?
--DP 10 --percaltlow 0.3 --minsize 5

Thank you!

can it be used on non-human data?

I suppose if I change the repeats file that it is using with a repeats file for my species it should, but would that be wrong to do it? If it is really not possible, please state this clearly on your program description page that it only works for human data. I did notice that sentence in the paper but couldn't guess it really doesn't work with any other species. Online version makes sense because you can't have repeat datasets for all species but unix version?

Another question, what if I don't incorporate repeat coordinates at all?

Thanks.

Can it be used on RNA-seq data?

Hello,

Was curious to know if the tool can be used on RNA-seq data, or if some parameters have to be modified to accommodate RNA data.

--common error

When using --common option, after the computation I get this error, just before ending analysis:
AutoMap/Scripts/common_analysis.sh: line 83: printf: 24.8027: invalid number

## ERROR: Less than 10,000 variants (0 detected variants) with AD and DP available. Exit.

Hi,

I got the following error when analyzing one of my files:

bash /kyukon/data/gent/vo/000/gvo00082/research/mvheetve/temp/CEP162/AutoMap/AutoMap_v1.0.sh --vcf /kyukon/data/gent/vo/000/gvo00082/research/mvheetve/temp/CEP162/fastq/analysis/samples_D1800416-merged/final/18-00416/18-00416-gatk-haplotype_norm_nonref.vcf.gz --genome hg38 --out /kyukon/data/gent/vo/000/gvo00082/research/mvheetve/temp/CEP162/common_reanalysis
# bcftools higher or equal to v1.9
# bedtools higher or equal to v2.24.0
# perl higher or equal to v5.22.0
# R higher or equal to v3.2.0
## Parameters used by default:
 -> No use of --DP option, value set as default: 8
 -> No use of --binomial option, value set as default: 0.000001
 -> No use of --percaltlow option, value set as default: 0.25
 -> No use of --percalthigh option, value set as default: 0.75
 -> No use of --window option, value set as default: 7
 -> No use of --windowthres option, value set as default: 5
 -> No use of --minsize option, value set as default: 1
 -> No use of --minvar option, value set as default: 25
 -> No use of --minperc option, value set as default: 88
 -> No use of --maxgap option, value set as default: 10
 -> chrX will NOT be included in the analysis and in the graphics.
 -> Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
## WARNING: No sample name provided through --id option, name will be taken from the VCF: 18-00416
## ERROR: Less than 10,000 variants (0 detected variants) with AD and DP available. Exit.

I tried setting --DP to 0, but same error occured. vcf file attached and output of bcftools query for FORMAT/AD and FORMAT/DP, which clearly shows AD and DP values for all variants (albeit in the low numbers at times). Any idea what might be the problem?

Regards
mvheetve

getting bcftools error when running AutoMap_v1.2.sh script on singularity container

getting below bcftools error when running AutoMap_v1.2.sh script on singularity container

Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
/opt/Container/AutoMap/AutoMap_v1.2.sh: line 309: /opt/Container/AutoMap/.log: Read-only file system

ERROR: The input VCF format is incorrect ('bcftools query -l' was unsuccessful). Exit.

When running bash AutoMap_v1.2.sh on local system its running correctly

How small the minimum ROH size threshold can be without losing reliability?

The minimum ROH threshold (--minsize) that automap uses by default is 1 Mb. How low can I go in setting this threshold? Is 100kb or 50kb too small a threshold to use? Is there a threshold below which its accuracy falls significantly in your experience? Thanks!

Help in running automap

Thanks for creating the tool, it was really helpful

I have to run the ROH for 455 samples and want to find the common ROH between them. And it's taking 20mins for each sample and then finally common analysis.

Is there a way, I can fasten the process?

Best,
Vignesh

AutoMap_v1.2.sh: line 326: [: : integer expression expected

Hi,

I have installed all prerequisites with the latest versions and tried to use the command below:

(automap_analysis) tugcebozkurt@x86_64-apple-darwin13 AutoMap-master % bash AutoMap_v1.2.sh --vcf ../family_2311713/vcf/62524530_PASS_only.vcf --out 62524530 --genome hg19
# bcftools higher or equal to v1.9
# bedtools higher or equal to v2.24.0
# perl higher or equal to v5.22.0
# R higher or equal to v3.2.0
## Parameters used by default:
 -> No use of --DP option, value set as default: 8
 -> No use of --binomial option, value set as default: 0.000001
 -> No use of --percaltlow option, value set as default: 0.25
 -> No use of --percalthigh option, value set as default: 0.75
 -> No use of --window option, value set as default: 7
 -> No use of --windowthres option, value set as default: 5
 -> No use of --minsize option, value set as default: 1
 -> No use of --minvar option, value set as default: 25
 -> No use of --minperc option, value set as default: 88
 -> No use of --maxgap option, value set as default: 10
 -> chrX will NOT be included in the analysis and in the graphics.
 -> Homozygosity regions will be extended to nearest variant with maximum of 1 Mb.
AutoMap_v1.2.sh: line 326: [: : integer expression expected

1) Parsing of VCF file and variant filtering
 *  4186573 variants before filtering
 *  1842046 variants after filtering

2) Detection of ROHs with sliding window, trimming and extension
 * Treatment of the data
 * Printing of the homozygous regions

3) Filtering of regions found and output to text file
 *        0 regions before filtering
 *        0 regions after filtering with 0 Mb in total

4) Generating PDF

But, the output could not be generated and I don't know why. It may be related to line 326 as implied in the above (AutoMap_v1.2.sh: line 326: [: : integer expression expected). Btw, I got the same problem when I used the test data you provided. Then I tried to run the same code on a Windows computer then it worked without any problem.

Thank you in advance!
Tugce

mquinodo / automap Goto Github PK

automap's People

Contributors

Stargazers

Watchers

Forkers

automap's Issues

AutoMap v1.0 used for analysis

bcftools higher or equal to v1.9

bedtools higher or equal to v2.24.0

perl higher or equal to v5.22.0

R higher or equal to v3.2.0

Parameters used by default:

WARNING: No sample name provided through --id option, name will be taken from the VCF: TestSample

ERROR: Less than 10,000 variants (0 detected variants) with AD (or AO) and DP available. Exit.

bcftools higher or equal to v1.9

bedtools higher or equal to v2.24.0

perl higher or equal to v5.22.0

R higher or equal to v3.2.0

ERROR: bcftools lower than v1.9 -> Please Update! Exit.

ERROR: The input VCF format is incorrect ('bcftools query -l' was unsuccessful). Exit.

Recommend Projects

Recommend Topics

Recommend Org