freeseek / gtc2vcf Goto Github PK
View Code? Open in Web Editor NEWTools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
License: MIT License
Tools to convert Illumina IDAT/BPM/EGT/GTC and Affymetrix CEL/CHP files to VCF
License: MIT License
I'm using bcftools Version: 1.10.2 (using htslib 1.10.2)
bcftools +gtc2vcf -c HumanOmniExpressExome-8-v1-0-B.csv -f human_g1k_v37.fasta test.gtc -o test.vcf
================================================================================
Reading CSV file HumanOmniExpressExome-8-v1-0-B.csv
BPM manifest file version = 0
Name of manifest = HumanOmniExpressExome-8v1_B.bpm
Number of loci = 951117
================================================================================
Reading GTC files
GTC file test.gtc format identifier is bad
First couple of lines in my gtc file
[Header]
Autocall Version 1.6.2.2
Processing Date 8/24/2012 9:10 PM
Content HumanOmniExpressExome-8v1_B.bpm
Cluster File StCtrCEPH_OMXEX_B.egt
Gender F
Num SNPs 951117
Total SNPs 951117
Num Samples 1
Total Samples 1
[Data]
SNP Name Chromosome Position GC Score Allele1 - Top Allele2 - Top Allele1 - AB Allele
2 - AB X Y Raw X Raw Y R Illumina Theta Illumina bAllele Freq Log R Ratio Illumina
200610-104 MT 212 0.4097353 A A A A 3.1761038 0.037173282 23245.
0 433.0 3.213277 0.0074506905 0.0020603272 0.27923167
200610-106 MT 246 0.3716166 A A A A 3.1220326 0.1416725 22856.
0 1400.0 3.2637053 0.028868914 0.0 0.3078592
my manifest file
HumanOmniExpressExome-8-v1-0-B.csv
Illumina, Inc.
[Heading]
Descriptor File Name,HumanOmniExpressExome-8v1_B.bpm
Assay Format,Infinium HD Super
Date Manufactured,4/21/2014
Loci Count ,951117
[Assay]
IlmnID,Name,IlmnStrand,SNP,AddressA_ID,AlleleA_ProbeSeq,AddressB_ID,AlleleB_ProbeSeq,GenomeBuild,Chr,MapInfo,Ploidy,Sp
ecies,Source,SourceVersion,SourceStrand,SourceSeq,TopGenomicSeq,BeadSetID,RefStrand,Exp_Clusters
200610-104-0_B_F_1867864664,200610-104,BOT,[T/C],0095685332,CGCACCTACGTTCAATATTACAGGCGAACATACTTACTAAAGTGTGTTAA,,,37,MT
,212,diploid,Homo sapiens,BGI,0,BOT,TTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACTTACTAAAGTGTGTTAA[T/C]TAATTAATGCTTGTAGGAC
ATAATAATAACAATTGAATGTCTGCACAGCCACTTTCCACACAGACATCATAACAA,TTGTTATGATGTCTGTGTGGAAAGTGGCTGTGCAGACATTCAATTGTTATTATTATGTCCT
ACAAGCATTAATTA[A/G]TTAACACACTTTAGTAAGTATGTTCGCCTGTAATATTGAACGTAGGTGCGATAAATAA,485,+,2
After I install bcftools, I follow the README document and run the following code, but there is a error message.
path_to_output_folder="..." cel_list_file="..." apt-probeset-genotype \ --analysis-files-path . \ --xml-file GenomeWideSNP_6.apt-probeset-genotype.AxiomGT1.xml \ --out-dir $path_to_output_folder \ --cel-files $cel_list_file \ --special-snps GenomeWideSNP_6.specialSNPs \ --chip-type GenomeWideEx_6 \ --chip-type GenomeWideSNP_6 \ --table-output false \ --cc-chp-output \ --write-models \ --read-models-brlmmp GenomeWideSNP_6.generic_prior.txt
The question is that what software should be install when use [apt-probeset-genotype]?
Hi,
When I'm changing from CHP files to BCF this is the command:
I was wondering, if I want to change the format to VCF I need to change the lines 2, 8 and 9 to "-Ov", "-Ov" and "-Oz", respectively? I mean, because "-Ov" and "-Oz" is for VCF, instead of "-Ou" and "-Ob" that is for BCF format.
If this is correct, It would look like this:
When I run it in this way, I have the VCF file in the end, but also I have this message:
index: "NAME.vcf" is in a format that cannot be usefully indexed
I just want to know if the change is correct and if its correct, there is any way to index the file usefully?
Hi thanks for the great tool. I have some query, I want to remove some poor quality snps from the vcf file. The filteration I want should be based the following threshold
I can see that AA T Mean and BB T Dev are there in the VCF file but I am unable to find Call Freq, AA Freq, BB Freq and AB freq.
Please let me know how can I get these values.
Awaiting for your reply
Thanks
Hi Giulio,
I'm trying to cite this tool in my manuscripts but I did not find a related paper. Could you please share a citation format?
Great thanks!
Xiaotong
Hello,
I got the .xcl.bcf file after this step:
/bcftools annotate --no-version -Ob -o $pfx.unphased.bcf -x ID,QUAL,INFO,^FMT/GT,^FMT/BAF,^FMT/LRR $pfx.vcf &&
/bcftools index -f $pfx.unphased.bcf
n=$(/bcftools query -l $pfx.unphased.bcf|wc -l);
ns=$((n*98/100));
echo '##INFO=<ID=JK,Number=1,Type=Float,Description="Jukes Cantor">' |
/bcftools annotate --no-version -Ou -a $dup -c CHROM,FROM,TO,JK -h /dev/stdin $pfx.unphased.bcf |
/bcftools +/fill-tags.so --no-version -Ou -- -t NS,ExcHet |
bcftools +mochatools.so --no-version -Ou -- -x $sex -G |
bcftools annotate --no-version -Ob -o $pfx.xcl.bcf
-i 'FILTER!="." && FILTER!="PASS" || JK<.02 || NS<'$ns' || ExcHet<1e-6 || AC_Sex_Test>6'
-x FILTER,^INFO/JK,^INFO/NS,^INFO/ExcHet,^INFO/AC_Sex_Test &&
bcftools index -f $pfx.xcl.bcf
Then, when I ran eagle:
for chr in {1..22} X; do
eagle
--geneticMapFile $map
--chrom $chr
--outPrefix $pfx.chr$chr
--numThreads 4
--vcfRef
--vcfTarget $pfx.unphased.bcf
--vcfOutFormat b
--noImpMissing
--outputUnphased
--vcfExclude $dir/$pfx.xcl.bcf && bcftools index -f $pfx.chr$chr.bcf
done
I get the following: ERROR: Could not open X.xcl.bcf for reading: unknown file type.
I have full permissions on the file. I am not sure if it's the eagle problem or it's the file generating issues.
Can you please help me with this?
Thank you.
Dear gtc2vcf team
I was wondering whether a prebuilt binary file for mac exists?
if not, is there any recipe/instruction to build the package from source in macos?
Thank you in advance.
Regards,
Sina
Hello, freeseek,
I cannot seem to convert my .gtc files to a vcf file using the following code:
bpm_manifest_file="InfiniumOmni2-5-8v1-5_A1.bpm"
csv_manifest_file="InfiniumOmni2-5-8v1-5_A1.csv"
egt_cluster_file="InfiniumOmni2-5-8v1-5_A1_ClusterFile.egt"
ref="$HOME/GRCh37/human_g1k_v37.fasta"
out_prefix="batch1_vcf"
bcftools +gtc2vcf --no-version -Ou --bpm $bpm_manifest_file --csv $csv_manifest_file --egt $egt_cluster_file --gtcs $path_to_gtc_folder --fasta-ref $ref --extra $out_prefix.tsv | bcftools sort -Ou -T ./bcftools-sort.XXXXXX | bcftools norm --no-version -Ob -c x -f $ref | tee $out_prefix.bcf | bcftools index --force --output $out_prefix.bcf.csi
The error that I receive is [E: :hts_open_format] Failed to open file "Ou" : No such file or directory
Reading BPM file InfiniumOmni2-5-8v1-5_A1.bpm
Could not read Ou
Failed to read from standard input: unknown file type
index: "-" is in a format that cannot be usefully indexed
I've tried adapting the command by reading through the other issues that have come up, but have had no luck creating a bcf file that has > 0 bytes. May I ask for assistance in resolving this issue? I should mention that the manifest and cluster files provided by illumina are in the same directory in which I am running this command.
Thank you,
Chris
Hello, I have a list of idat files. I can read them in R using https://github.com/HenrikBengtsson/illuminaio
But how can I convert them into a vcf file? If I use +gtc2vcf plugin as follows:
bcftools +gtc2vcf -c /shire/databases/InfiniumOmni2-5-8v1-5_A1.csv -f /shire/databases/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna -i EGAF00000868323.idat -o test.vcf
I am getting this:
IDAT file only allowed when converting to CSV
Any help? Best, Zillur
Dear freeseek,
I have some issues with running the R script gtc2vcf_plot.R to generate plots. My input was first a .vcf file, but i got an error about the file format, so I converted it with bgzip to a vcf.gz file (as suggested in the message) with the following command: bgzip file.vcf
. After converting the file to a .vcf.gz file format, I got the error below.
gtc2vcf_plot.R 2020-09-01 https://github.com/freeseek/gtc2vcf
Command: bcftools query --format [%CHROM\t%POS\t%ID\t%INFO/meanR_AA\t%INFO/meanR_AB\t%INFO/meanR_BB\t%INFO/meanTHETA_AA\t%INFO/meanTHETA_AB\t%INFO/meanTHETA_BB\t%INFO/devR_AA\t%INFO/devR_AB\t%INFO/devR_BB\t%INFO/devTHETA_AA\t%INFO/devTHETA_AB\t%INFO/devTHETA_BB\t%GT\t%X\t%Y\t%NORMX\t%NORMY\t%R\t%THETA\t%BAF\t%LRR\n]" all_qc.unphased_extra.vcf.gz -r 11:66328095-66328095
Error in names(object) <- nm :
'names' attribute [24] must be the same length as the vector [0]
Calls: setNames
In addition: Warning message:
In fread(cmd = cmd, sep = "\t", header = FALSE, na.strings = ".", :
File '/tmp/RtmpaEu4mo/file13974573efd5' has size 0. Returning a NULL data.frame.
Execution halted
Thanks in advance!
Hi Giulio,
Quick question, I see affy2vcf can convert cel to chp and chp to vcf. I am just wondering if this is required to do two steps to get from cel to vcf? I don't see in description requiring this and I know PennCNV goes from cel to vcf but requires multiple steps. Let me know whether we can go straight from CEL to VCF. Thanks.
Brian
I have approximately 4000 gtcs that I am trying to convert to vcf files using the gtc2vcf plugin but even though the script reads gtcs correctly and writes the vcf file - no output is produced. I have tried to run it by reducing the number of gtcs to 8 and get the same result.
I get this output;
Writing to ./bcftools-sort.XXXXXXMMTHoa gtc2vcf 2022-01-12 https://github.com/freeseek/gtc2vcf Reading BPM file /bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D2.bpm Reading EGT file /bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D1_ClusterFile.egt Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R02C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R07C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R06C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R01C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R08C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R03C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R05C01.gtc Reading GTC file /home5/maamir/mfgitry/somegtc/206043240081_R04C01.gtc Writing VCF file Lines total/missing-reference/skipped: 1748250/23814/14885 Merging 2 temporary files Cleaning Lines total/split/realigned/skipped: 1733365/0/0/23817
But no sub directory of bcftools-sort.XXXXXXMMTHoa is present in my directory when the programme has stopped running.
Below is the code I am using -
ref="/home5/maamir/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna" bcftools +gtc2vcf --no-version -Ou --bpm /bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D2.bpm --egt /bochica/shared/numom/raw_babies/GUER_20211019_MEGA_1001_1002/Multi-EthnicGlobal_D1_ClusterFile.egt --gtcs /home5/maamir/mfgi --fasta-ref $ref --extra $out_prefix.tsv | bcftools sort -Ou -T ./bcftools-sort.XXXXXX | bcftools norm --no-version -Ob -c x -f $ref && \ bcftools index --f $out_prefix.bcf
I have wine64 installed, but the commands I ran from read me also needed wine32. So I switched to a 32 bit vm and then it needs wine64. Pretty frustrated with installing this pipeline as I would prefer using this than running beeline 100x to get my idat to gtc.
Hi,
I had succeeded transforming CEL files to vcf files, but I found the ID column of vcf files were still probeset ID. I have tried
bcftools annotate -a 00-All.vcf.gz -c ID xxx.vcf.gz
to make the ID column annotated by rsids, but there are still some SNPs failing to be annotated for 00-All.vcf.gz not containing all the SNPs from GenomeWideSNP_6.na35.annot.csv. Is there anyway to annotate the IDs in the step
bcftools +affy2vcf \
--no-version -Ou \
--csv $csv_manifest_file \
--fasta-ref $ref \
--chps $path_to_chp_folder \
--snp $path_to_txt_folder/AxiomGT1.snp-posteriors.txt \
--extra $out_prefix.tsv | \
bcftools sort -Ou -T ./bcftools-sort.XXXXXX | \
bcftools norm --no-version -Ob -o $out_prefix.bcf -c x -f $ref && \
bcftools index -f $out_prefix.bcf
or transform the GenomeWideSNP_6.na35.annot.csv to vcf annotation file? Thank you!
Hi, I have data from dbGaP that is in 'plink matrix format'. Can I use this tool? If not, what is the best way to prepare this data for MoCHa?
Dear freeseek,
I installed the gtc2vcf plugin yesterday in docker:
https://gitlab.com/intelliseq/workflows/-/blob/BIOINFO-998-genotype-source/src/main/docker/task/task_gtc-to-vcf/Dockerfile
(the reference is added later).
The plugin works without raising any error, but some vcf lines don't have the GT tag:
chr1 30345446 22:24375752_CNV_GSTT1 A C . . GC=0.4625;ALLELE_A=0;ALLELE_B=1;FRAC_A=0.360656;FRAC_C=0.262295;FRAC_G=0.229508;FRAC_T=0.147541;NORM_ID=1;BEADSET_ID=1705;INTENSITY_ONLY;ASSAY_TYPE=0;GenTrain_Score=0;Orig_Score=0.68275;Cluster_Sep=0.948275;N_AA=1236;N_AB=0;N_BB=0;devR_AA=0.30742;devR_AB=0.39422;devR_BB=0.20131;devTHETA_AA=0.0121041;devTHETA_AB=0.0223607;devTHETA_BB=0.0223607;meanR_AA=2.73401;meanR_AB=3.4935;meanR_BB=2.30665;meanTHETA_AA=0.130089;meanTHETA_AB=0.554171;meanTHETA_BB=0.978252;Intensity_Threshold=0.05 GQ:IGC:BAF:LRR:NORMX:NORMY:R:THETA:X:Y 0:0:0.0246714:-0.31685:1.76758:0.427339:2.19492:0.151014:32616:2228 0:0:0.0246714:-0.31685:1.76758:0.427339:2.19492:0.151014:32616:2228
chr1 109685814 1:110228436_CNV_GSTM1 T C . . GC=0.385;ALLELE_A=0;ALLELE_B=1;FRAC_A=0.147541;FRAC_C=0;FRAC_G=0.180328;FRAC_T=0.672131;NORM_ID=0;BEADSET_ID=1625;INTENSITY_ONLY;ASSAY_TYPE=0;GenTrain_Score=0;Orig_Score=0.376871;Cluster_Sep=0.173743;N_AA=0;N_AB=0;N_BB=1239;devR_AA=0.1;devR_AB=0.1;devR_BB=0.1;devTHETA_AA=0.0223607;devTHETA_AB=0.0223607;devTHETA_BB=0.140788;meanR_AA=0.17845;meanR_AB=0.194985;meanR_BB=0.207459;meanTHETA_AA=0.0145364;meanTHETA_AB=0.297995;meanTHETA_BB=0.581454;Intensity_Threshold=0.05 GQ:IGC:BAF:LRR:NORMX:NORMY:R:THETA:X:Y 0:0:1.28708:-0.170678:0.0549625:0.129349:0.184312:0.744207:1017:614 0:0:1.28708:-0.170678:0.0549625:0.129349:0.184312:0.744207:1017:614
The program is run with this wdl:
https://gitlab.com/intelliseq/workflows/-/blob/dev/src/main/wdl/tasks/gtc-to-vcf/gtc-to-vcf.wdl
Is it intentional? This has not happened with the previous installation (bcftools11-54-gaf54707, htslib1.11-74-gb8dcbd1
and gtc2vcf cloned on 2021-01-20).
Best,
Kasia
Dear Giulio,
Thank you for developing such a good tooI to deal with idat files. I have converted gtc files from idat successfully, thank you for your suggestion. When I run the code just like the guide, an error occured and I saw someone have the similar issue, but not suitable for me (#13). I used the -gtcs, the folder have 103 gtc files and less files still have the same error.
$bcftools +gtc2vcf \
--no-version -Ou \
--bpm $bpm_manifest_file \
--csv $csv_manifest_file \
--egt $egt_cluster_file \
--gtcs $path_to_gtc_folder \
--fasta-ref $ref \
--extra $out_prefix.tsv
gtc2vcf 2020-08-26 https://github.com/freeseek/gtc2vcf
Reading BPM file /media/EXTend2018/Wanghe2019/GEO/GSE113093/InfiniumPsychArray-24v1-1_A1.bpm
Reading CSV file /media/EXTend2018/Wanghe2019/GEO/GSE113093/InfiniumPsychArray-24v1-1_A1.csv
Reading EGT file /media/EXTend2018/Wanghe2019/GEO/GSE113093/InfiniumPsychArray-24v1-1_A1_ClusterFile.egt
Reading GTC file /media/EXTend2018/Wanghe2019/GEO/GSE113093/GSE113093_GTC/GSM3096512_200687150051.gtc
Failed to read 1359180426 bytes from stream
Best wishes,
Crane
Hello,
Thank you for the handy tool! I'm able to generate gtc files from idat files using your software. However, I'd like to know if the results are the same as Beeline's AutoConvert function. I don't have a windows os with illumina, so I can't compare by myself. I really appreciate if anyone has any inputs.
Thanks
Fan
Hello -- We could not patch the +gtc2vcf plugin using bcftools/1.9 on centOs 6
in another install attempt "MODE_SWAP" said undefined in the c code - first attempt to install.
bcftools-1.9/plugins]$ patch < fixref.patch
patching file fixref.c
Hunk #1 FAILED at 91.
Hunk #2 succeeded at 104 (offset -1 lines).
Hunk #3 FAILED at 134.
Hunk #4 FAILED at 155.
Hunk #5 succeeded at 180 (offset -5 lines).
Hunk #6 succeeded at 193 with fuzz 2 (offset -6 lines).
Hunk #7 succeeded at 236 (offset -6 lines).
Hunk #8 succeeded at 428 with fuzz 2 (offset -14 lines).
Hunk #9 succeeded at 586 (offset -14 lines).
3 out of 9 hunks FAILED -- saving rejects to file fixref.c.rej
This is with bcftools-1.9 etc.
somehow we something without the patch and it gave vcf version 3ish not 4.2?
Any plans to do more with this plugin maybe cover indels and some updating for the vcf spec?
I like the concept of making a bcftools plugin - that's kinda nifty :-)
Hello, I tried to obtain gtc files from idat using the command line in the tutorial :
mono $HOME/bin/autoconvert/AutoConvert.exe $path_to_idat_folder $path_to_output_folder $manifest_file $egt_file
unfortunately the process gives me, as you said, the normalization error. I tried to use a custom cluster file and a custom manifest file with a ".csv" extension, could it be possible that the error raises because of this. For me it's mandatory to use custom egt and csv or bpm files because of some added SNPs is there a solution to this issue?
Thank you
Hi,
I am currently using gtc2vcf tools to transform ~1800 GTC files into a single BCF file. I met a error report as below. However, when I tried small samples (like 20 samples) including the reported error sample 9479477122_R04C01.gtc , the pipeline could work with the correct bcf file produced. Is this a memory problem? Would you pls help me to figure this problem? Thank you very much for your help!
"Could not open 9479477122_R04C01.gtc: Too many open files
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
Could not read VCF/BCF headers from -
Cleaning
Failed to read from standard input: unknown file type"
Best regards,
Qidi
I'm sorry to bother you.
I got this error "Probe Set AX-82929059 not found in models file" when I run bcftools +affy2vcf.
How do i fix it? Thanks!
Could I not use this command --models xxxxxx.snp-posteriors.txt when I ran bcftools +affy2vcf. Any Different? => If I don't use --models command, I can get vcf file.
bcftools +affy2vcf
--csv ../APT-library/biobank/Axiom_BioBank1.na35.annot.csv
--fasta-ref ../resource-humanv37/human_g1k_v37.fasta
--calls ./GPS-step7-output/AxiomGT1.calls.txt
--confidences ./GPS-step7-output/AxiomGT1.confidences.txt
--summary ./GPS-step7-output/AxiomGT1.summary.txt
--models ./GPS-step7-output/AxiomGT1.snp-posteriors.txt
--output ./bcf-output/AxiomGT1.vcf
--- RUNNING LOG ---
Reading CSV file ../APT-library/biobank/Axiom_BioBank1.na35.annot.csv
Reading SNP file ./GPS-step7-output/AxiomGT1.snp-posteriors.txt
Writing VCF file
Probe Set AX-82929059 not found in models file
bcftools +affy2vcf
--csv ../APT-library/biobank/Axiom_BioBank1.na35.annot.csv
--fasta-ref ../resource-humanv37/human_g1k_v37.fasta
--chps ./GPS-step7-output/cc-chp/
--models ./GPS-step7-output/AxiomGT1.snp-posteriors.txt
--output bcf0517chp.vcf
--- RUNNING LOG ---
Reading CSV file ../APT-library/biobank/Axiom_BioBank1.na35.annot.csv
Reading CHP file ./GPS-step7-output/cc-chp//xxxxxxxxxxx.chp
...
Reading SNP file ./GPS-step7-output/AxiomGT1.snp-posteriors.txt
Writing VCF file
Probe Set AX-82929059 not found in models file
I used the GenCall algorithm to generate gtc files. My generated gtc files are not in a readable format- is this supposed to be the case? (I have set the LANG variable as instructed). My egt and bpm files are also correctly called on, and GenCall seems to run fine.
However, the gtc2vcf plugin is also unable to read in these gtc files.
This is the command I have used to generate gtc files:
LANG="en_US.UTF-8" $HOME/bin/iaap-cli/iaap-cli gencall /path/to/manifest/file.bpm /path/to/cluster/file.egt /path/to/output/folder --idat-folder /path/to/idat/folder/--output-gtc --gender-estimate-call-rate-threshold -0.1
Am I generating gtc files incorrectly?
Hi,
Can you tell me if there is a way to install wine64 not using the sudo command. I don't have the right to use sudo on the cluster that I use.
Thanks.
Hello, I am trying to use the "Convert Illumina GTC files to VCF" example shown in the README, but I am getting this error:
Writing to .
Could not initialize gtc2vcf.so, neither run or init found
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
Failed to open -: unknown file type
Failed to open -: unknown file type
Looking in the file, there is a run
function defined, but no init
function, and bcftools vcfplugin.c appears to be checking for both.
I am using bcftools version 1.9. Any idea what could be causing this?
I will want to used idat file more than gtc, do you have an example of command line?
bcftools +gtc2vcf -Ou --bpm .bpm --egt egt --idat filelink --fasta-ref fasta --extra gtc2vcf_idat".tsv" --output gtc2vcf_idat".vcf.gz" --threads 35 --output-type z
with filelink contains each idat file
error that I obtained ;
The --idat option can only be used alone or with option --gtcs
Could you explained more how to use idat with gtc2vcf? what algoritms ? what is the interrest?
thank you
Hi there,
I'm looking for a way to include the "cluster separation" [0-1] metric to the output vcf produced using the gtc2vcf method. Could someone please tell me if this would be possible and how I could change the code to achieve this goal?
Thank you!
Hi!
I get this error when using your pretty tool trying to convert gtcs to vcfs:
$HOME/bin/bcftools +$HOME/bin/gtc2vcf.so --no-version -Ou -b $manifest_file -e $egt_file -g $gtc_list -f $ref -x $out.sex
...cannot open more than 4096 files at once while 30546 is required
I need another machine >4Gb RAM or I can do something with RAM capacities?
Thank you in advance Dr. Genovese!!
Hi Giulio,
I met some issues when installing the tools. I'm using Ubuntu 16.04 and I'm not experienced at Ubuntu installation. Could you help me with them?
sudo apt install libicu66
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package libicu66
I did some search on Google but did not find a package named libicu66. If I just want to convert .idat files into .vcf files (do not have .bpm files), do I need to install this package?
git clone --branch=develop --recurse-submodules git://github.com/samtools/htslib.git
git clone --branch=develop git://github.com/samtools/bcftools.git
/bin/rm -f bcftools/plugins/{gtc2vcf.{c,h},affy2vcf.c}
wget -P bcftools/plugins https://raw.githubusercontent.com/freeseek/gtc2vcf/master/{gtc2vcf.{c,h},affy2vcf.c}
cd htslib && autoheader && (autoconf || autoconf) && ./configure --disable-bz2 --disable-gcs --disable-lzma && make && cd ..
cd bcftools && make && cd ..
/bin/cp bcftools/{bcftools,plugins/{gtc,affy}2vcf.so} $HOME/bin/
export PATH="$HOME/bin:$PATH"
export BCFTOOLS_PLUGINS="$HOME/bin"
These commands all run correctly but when I tried to use
gtc2vcf
I got
gtc2vcf: command not found
When I tried
gtc2vcf.so
I got
Segmentation fault (core dumped)
My system has 16GB RAM and 8 cores. Do you think it due to the lack of RAM?
When I tried to use the alternative method to install gtc2vcf, I got :
sudo apt install ./{libhts3_1.11-4,bcftools_1.11-1,gtc2vcf_1.11-dev}_amd64.deb
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'libhts3' instead of './libhts3_1.11-4_amd64.deb'
Note, selecting 'bcftools' instead of './bcftools_1.11-1_amd64.deb'
Note, selecting 'gtc2vcf' instead of './gtc2vcf_1.11-dev_amd64.deb'
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
bcftools : Depends: libc6 (>= 2.29) but 2.23-0ubuntu11.2 is to be installed
gtc2vcf : Depends: libc6 (>= 2.29) but 2.23-0ubuntu11.2 is to be installed
libhts3 : Depends: libc6 (>= 2.29) but 2.23-0ubuntu11.2 is to be installed
Depends: libdeflate0 (>= 1.0) but it is not installable
Depends: libssl1.1 (>= 1.1.0) but it is not installable
E: Unable to correct problems, you have held broken packages.
bcftools + gtc2vcf -i -g ~/Desktop/test/
Could not initialize , neither run or init found
Any suggestions would be greatly appreciated!
Thank you!
Xiaotong
Hi freeseek,
Thank you for your help yesterday and I installed the gtc2vcf successfully. But when I convert idat to gtc, the IDAT always not found at the location. I tried many times but can't solve it. I checked the bpm and egt, I am sure they are right:
Chip Prefix (Guess),InfiniumPsychArray-24v1-1
I don't know why the idat not found, my idat files just like this:
GSM3096512_200687150051_R01C01_Grn.idat
GSM3096512_200687150051_R01C01_Red.idat
This is the log:
ArrayAnalysis.NormToGenCall.CLI.App[0]
[10:25:21 2352]: Crawling /media/EXTend2018/Wanghe2019/GEO/GSE113093/GSE113093_RAW for samples ...
info: ArrayAnalysis.NormToGenCall.CLI.App[0]
[10:25:21 3578]: Number of samples to process: 103
info: ArrayAnalysis.NormToGenCall.Services.NormToGenCallSvc[0]
[10:25:21 3714]:
Starting processing...
Manifest file: /media/EXTend2018/Wanghe2019/GEO/GSE113093/InfiniumPsychArray-24v1-1_A1.bpm
Cluster file: /media/EXTend2018/Wanghe2019/GEO/GSE113093/InfiniumPsychArray-24v1-1_A1_ClusterFile.egt
Include file:
Output directory: /media/EXTend2018/Wanghe2019/GEO/GSE113093
GenCall score cutoff: 0.15
GenTrain ID: 3
Gender Estimate Settings:
Version: 2
MinAutosomalLoci : 100
MaxAutosomalLoci : 10000
MinXLoci : 20
MinYLoci : 20
AutosomalCallRateThreshold : 0.97
YIntensityThreshold : 0.3
XIntensityThreshold : 0.9
XHetRateThreshold : 0.1
Output Settings:
Output GTC: True
Output PED: False
PED tab delmited: False
PED use customer strand: False
Number of threads: 1
Buffer size: 131072
info: ArrayAnalysis.NormToGenCall.Services.NormToGenCallSvc[0]
[49m: ArrayAnalysis.NormToGenCall.Services.NormToGenCallSvc[0]
[12:33:32 8929]: Failed to normalize or gencall - GSM3096512_200687150051_R01C01: IDAT not found at location: /media/EXTend2018/Wanghe2019/GEO/GSE113093/GSE113093_RAW/GSM3096512_200687150051_Red.idat
at ArrayAnalysis.NormToGenCall.Services.SampleNormToGenCallSvc.LoadIdat(String idatPath, Manifest manifest) in /src/ArrayAnalysis.NormToGenCall.Services/Services/SampleNormToGenCallSvc.cs:line 63
at ArrayAnalysis.NormToGenCall.Services.SampleNormToGenCallSvc.Normalize(NormalizationBase normAlg, Manifest manifest, Byte[] transformLookups, Boolean needGreen, Boolean needRed, SampleData sample, String[] includeLociNames) in /src/ArrayAnalysis.NormToGenCall.Services/Services/SampleNormToGenCallSvc.cs:line 106
at ArrayAnalysis.NormToGenCall.Services.NormToGenCallSvc.<>c__DisplayClass7_0.b__2(SampleData sample) in /src/ArrayAnalysis.NormToGenCall.Services/Services/NormToGenCallSvc.cs:line 113
...There are many idat files fault like this.
Best wishes,
Crane
Hi!
When using your fantastic tool towards the readme file, i get this step and i do not know how to proceed. In fact, I jump to the next step (compile htslib and bcftools...). At the end I can use the converter IDAT to GTC for llumina but I want to run the whole tool.
Coul you please help me with this?
I paste the error and some aditional information
/bin/rm -f bcftools/plugins/{gtc2vcf.c,affy2vcf.c,fixref.patch}
wget -P bcftools/plugins https://raw.githubusercontent.com/freeseek/gtc2vcf/master/{gtc2vcf.c,affy2vcf.c,fixref.patch}
cd bcftools/plugins && patch < fixref.patch && cd ../..
File to patch:
Dear Giulio,
Thanks a lot for such nice workfellow for the conversion of the gtc files to vcf.
because of some limitations, I wasn't able to install everything and tried to convert the whole package to a docker and I failed here too.
Do you have any plane to make a docker container that does the whole process?
Really appreciate it.
Regards
Dear Freeseek,
The conversion from a genomestudio file to a vcf file works fine, but a lot of SNPs are missing after this conversion. I looked into this and observed that only the SNPs without any missings are in the vcf file, but I am not sure about this yet, so I have some questions about this.
Is it true that the gtc2vcf tool only keep the complete SNPs without any missings after conversion? Or is there another way to handle them in this tool? And is it right if I use -- for missings in the Genomestudio file?
Thanks in advance!
Hi,
Thank you for the wonderful set of tool for converting the illumina reports to vcf files.
I am getting a error while using the matrix format illumina reports.
Error is as follows:
Reading GTC file /Users/vikrants/Desktop/testvcf/ILHC24-12806_FinalReport.txt
GTC file /Users/vikrants/Desktop/testvcf/ILHC24-12806_FinalReport.txt format identifier is bad
Can you please have a look and let me know why i am getting this error.
P.S. - I have generated the matrix format report from the genome studio.
Thanks in advance,
Vikrant
Hi,
I am getting the error message "GTC files cannot be listed through both command interface and file list" even though I am only submitting a single .txt file with a list of the gtc file names. I have tried this where the actual gtc files are in the directory where I am running the script, and also where they are in their own directory. I am running on a google cloud instance and using a singularity container. Here is the code, and I have attached the gtc_list file.
`bpm_manifest_file="./GDA_PGx-8v1-0_20042614_A2.bpm"
csv_manifest_file="./ProjectDetailReport ILMN GDA 07-11-22 AMS1.csv"
egt_cluster_file="./GDA FINAL 3 plate validation reclustered 06302022.egt"
path_to_gtc_folder="./gtc_file_list.csv"
ref="./GRCh38_full_analysis_set_plus_decoy_hla.fa" # or ref="$HOME/GRCh37/human_g1k_v37.fasta"
out_prefix="206486390022"
singularity exec gtc2vcf_072922.sif bcftools +gtc2vcf
--no-version -Ou
--bpm $bpm_manifest_file
--csv $csv_manifest_file
--egt $egt_cluster_file
--gtcs ./gtc_list_file.txt
--fasta-ref $ref
--output $out_prefix.vcf
--output-type v
--extra $out_prefix.tsv
--verbose
`
Thank you
Harry
gtc_list_file.txt
Hello,
I have tried to use the following command to convert Illumina reports to VCF.
bcftools +gtc2vcf --genome-studio FinalReport24.txt -o GenotypeReport24.vcf
Output from the run in the terminal is only one line:
gtc2vcf 2021-06-01 https://github.com/freeseek/gtc2vcf
And the GenotypeReport24.vcf
file is created but with no contents in it.
An extract from the Illumina report:
[Header]
GSGT Version 2.0.4
Processing Date 3/29/2021 4:13 PM
Content GSA-24v3-0_A2.bpm
Num SNPs 654027
Total SNPs 654027
Num Samples 24
Total Samples 24
File 24 of 24
[Data]
Sample Index Sample ID Sample Name SNP Index SNP Name Chr Position GT Score GC Score Allele1 - AB Allele2 - AB Allele1 - Top Allele2 - Top Allele1 - Forward Allele2 - Forward Allele1 - Design Allele2 - Design Theta R X Raw Y Raw X Y B Allele Freq Log R Ratio SNP Aux SNP ILMN Strand Top Genomic Sequence Customer Strand
24 03-031 1 1:103380393 1 102914837 0.7987 0.8136 B B G G G G C C 0.963 0.722 1101 3453 0.040 0.682 1.0000 0.3609 0 [T/C] BOT TOP
24 03-031 2 1:109439680 1 108897058 0.8792 0.4803 A A A A A A A A 0.039 0.895 11409 497 0.843 0.052 0.0000 0.4173 0 [A/G] TOP TOP
I spent some hours trying to figure out what i might be doing wrong but couldn't figure it out.
Any tips on what might be going wrong with my steps is appreciated.
Thanks,
Rashindrie
Update
Tried with below command
ref="/tmp/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna"
bcftools +gtc2vcf --no-version -Ov -o $out_prefix --genome-studio "FinalReport24.txt" -f $ref
Output on terminal
gtc2vcf 2021-06-01 https://github.com/freeseek/gtc2vcf
Writing VCF file
Could not recognize INFO field: [Header]
Thanks you for developing this tool! The one single Windows dependency we have is in running GenomeStudio, and getting rid of this is a huge help.
I am wondering if it would be possible to extract SNP table metrics using this tool. For instance we are often faced with the need to extract eg. logR-ratio and B allele frequencies when using PennCNV (http://penncnv.openbioinformatics.org/en/latest/user-guide/input/) among other minor interactions with GenomeStudio. Would it be possible to extract these starting from IDAT files without ever having to interact with GenomeStudio?
Thanks again for your work!!
Hello,
when I try to convert .gtc files to .vcf I get the error "[E::bcf_hdr_read] Input is not detected as bcf or vcf format". It seems like the .gtc header size is bigger than expected. Can you please help me to fix this error?
Thank you.
Hello,
I just try to compile bcftools with your new plugin with error:
gcc -fPIC -shared -g -Wall -O2 -I. -I../htslib -o plugins/affy2vcf.so version.c plugins/affy2vcf.c
In file included from plugins/affy2vcf.c:39:0:
plugins/gtc2vcf.h: In function ‘flank_reverse_complement’:
plugins/gtc2vcf.h:186:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (size_t i = 0; i < len / 2; i++) {
^
plugins/gtc2vcf.h:186:2: note: use option -std=c99 or -std=gnu99 to compile your code
plugins/gtc2vcf.h: In function ‘flank_left_shift’:
plugins/gtc2vcf.h:215:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (const char *ptr = middle + 2; ptr < right; ptr++)
^
plugins/gtc2vcf.h: In function ‘get_position’:
plugins/gtc2vcf.h:306:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int k = 0; k < n_cigar && qlen > 1; k++) {
^
plugins/affy2vcf.c: In function ‘read_bytes’:
plugins/affy2vcf.c:79:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < nbytes; i++)
^
plugins/affy2vcf.c: In function ‘read_string16’:
plugins/affy2vcf.c:132:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < len; i++) {
^
plugins/affy2vcf.c: In function ‘xda_cel_print’:
plugins/affy2vcf.c:298:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < xda_cel->num_cells; i++)
^
plugins/affy2vcf.c:308:12: error: redefinition of ‘i’
for (int i = 0; i < xda_cel->num_masked_cells; i++)
^
plugins/affy2vcf.c:298:12: note: previous definition of ‘i’ was here
for (int i = 0; i < xda_cel->num_cells; i++)
^
plugins/affy2vcf.c:308:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < xda_cel->num_masked_cells; i++)
^
plugins/affy2vcf.c:317:12: error: redefinition of ‘i’
for (int i = 0; i < xda_cel->num_outlier_cells; i++)
^
plugins/affy2vcf.c:308:12: note: previous definition of ‘i’ was here
for (int i = 0; i < xda_cel->num_masked_cells; i++)
^
plugins/affy2vcf.c:317:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < xda_cel->num_outlier_cells; i++)
^
plugins/affy2vcf.c: In function ‘agcc_read_data_header’:
plugins/affy2vcf.c:459:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_header->n_parameters; i++)
^
plugins/affy2vcf.c:465:11: error: redefinition of ‘i’
for (int i = 0; i < data_header->n_parents; i++)
^
plugins/affy2vcf.c:459:11: note: previous definition of ‘i’ was here
for (int i = 0; i < data_header->n_parameters; i++)
^
plugins/affy2vcf.c:465:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_header->n_parents; i++)
^
plugins/affy2vcf.c: In function ‘agcc_read_data_set’:
plugins/affy2vcf.c:477:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_set->n_parameters; i++)
^
plugins/affy2vcf.c:482:11: error: redefinition of ‘i’
for (int i = 0; i < data_set->n_cols; i++) {
^
plugins/affy2vcf.c:477:11: note: previous definition of ‘i’ was here
for (int i = 0; i < data_set->n_parameters; i++)
^
plugins/affy2vcf.c:482:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_set->n_cols; i++) {
^
plugins/affy2vcf.c:492:11: error: redefinition of ‘i’
for (int i = 0; i < data_set->n_cols; i++) {
^
plugins/affy2vcf.c:482:11: note: previous definition of ‘i’ was here
for (int i = 0; i < data_set->n_cols; i++) {
^
plugins/affy2vcf.c:492:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_set->n_cols; i++) {
^
plugins/affy2vcf.c: In function ‘agcc_read_data_group’:
plugins/affy2vcf.c:514:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_group->num_data_sets; i++)
^
plugins/affy2vcf.c: In function ‘agcc_init’:
plugins/affy2vcf.c:548:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < agcc->num_data_groups; i++)
^
plugins/affy2vcf.c: In function ‘agcc_destroy_parameters’:
plugins/affy2vcf.c:576:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < n_parameters; i++) {
^
plugins/affy2vcf.c: In function ‘agcc_destroy_data_header’:
plugins/affy2vcf.c:591:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_header->n_parents; i++)
^
plugins/affy2vcf.c: In function ‘agcc_destroy_data_set’:
plugins/affy2vcf.c:600:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_set->n_cols; i++)
^
plugins/affy2vcf.c: In function ‘agcc_destroy_data_group’:
plugins/affy2vcf.c:610:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_group->num_data_sets; i++)
^
plugins/affy2vcf.c: In function ‘agcc_destroy’:
plugins/affy2vcf.c:623:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < agcc->num_data_groups; i++)
^
plugins/affy2vcf.c: In function ‘agcc_print_parameters’:
plugins/affy2vcf.c:639:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < n_parameters; i++) {
^
plugins/affy2vcf.c:674:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int j = 0; j < parameters[i].n_value / 2; j++)
^
plugins/affy2vcf.c: In function ‘agcc_print_data_header’:
plugins/affy2vcf.c:694:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_header->n_parents; i++)
^
plugins/affy2vcf.c: In function ‘agcc_print_data_set’:
plugins/affy2vcf.c:731:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_set->n_cols; i++)
^
plugins/affy2vcf.c:749:11: error: redefinition of ‘i’
for (int i = 0; i < data_set->n_cols; i++) {
^
plugins/affy2vcf.c:731:11: note: previous definition of ‘i’ was here
for (int i = 0; i < data_set->n_cols; i++)
^
plugins/affy2vcf.c:749:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_set->n_cols; i++) {
^
plugins/affy2vcf.c:774:11: error: redefinition of ‘i’
for (int i = 0; i < data_set->n_rows; i++) {
^
plugins/affy2vcf.c:749:11: note: previous definition of ‘i’ was here
for (int i = 0; i < data_set->n_cols; i++) {
^
plugins/affy2vcf.c:774:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_set->n_rows; i++) {
^
plugins/affy2vcf.c:776:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int j = 0; j < data_set->n_cols; j++) {
^
plugins/affy2vcf.c: In function ‘agcc_print_data_group’:
plugins/affy2vcf.c:788:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < data_group->num_data_sets; i++)
^
plugins/affy2vcf.c: In function ‘agcc_print’:
plugins/affy2vcf.c:799:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < agcc->num_data_groups; i++)
^
plugins/affy2vcf.c: In function ‘agccs_to_tsv’:
plugins/affy2vcf.c:826:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int j = 0; j < 20; j++)
^
plugins/affy2vcf.c:829:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < n; i++) {
^
plugins/affy2vcf.c:833:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int j = 0, k = 0; j < 20; j++) {
^
plugins/affy2vcf.c: In function ‘cels_to_tsv’:
plugins/affy2vcf.c:976:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < n; i++) {
^
plugins/affy2vcf.c:1004:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int k = 0; k < data_header->parameters[j].n_value / 2; k++)
^
plugins/affy2vcf.c: In function ‘models_init’:
plugins/affy2vcf.c:1119:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < 2; i++) {
^
plugins/affy2vcf.c: In function ‘models_destroy’:
plugins/affy2vcf.c:1225:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < 2; i++) {
^
plugins/affy2vcf.c:1227:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int j = 0; j < models->n_snps[i]; j++)
^
plugins/affy2vcf.c: In function ‘annot_init’:
plugins/affy2vcf.c:1316:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < ncols; i++) {
^
plugins/affy2vcf.c:1421:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < ncols; i++) {
^
plugins/affy2vcf.c: In function ‘annot_destroy’:
plugins/affy2vcf.c:1538:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < annot->n_records; i++) {
^
plugins/affy2vcf.c: In function ‘report_destroy’:
plugins/affy2vcf.c:1594:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < report->n_samples; i++)
^
plugins/affy2vcf.c: In function ‘varitr_init_cc’:
plugins/affy2vcf.c:1645:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < n; i++) {
^
plugins/affy2vcf.c: In function ‘varitr_init_txt’:
plugins/affy2vcf.c:1700:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < ncols; i++) {
^
plugins/affy2vcf.c:1716:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < ncols; i++) {
^
plugins/affy2vcf.c:1733:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < ncols; i++) {
^
plugins/affy2vcf.c: In function ‘varitr_loop’:
plugins/affy2vcf.c:1782:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < varitr->nsmpl; i++) {
^
plugins/affy2vcf.c:1839:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < 1 + varitr->nsmpl; i++)
^
plugins/affy2vcf.c:1852:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < 1 + varitr->nsmpl; i++)
^
plugins/affy2vcf.c:1885:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < 1 + varitr->nsmpl; i++)
^
plugins/affy2vcf.c:1895:13: error: redefinition of ‘i’
for (int i = 1; i < 1 + varitr->nsmpl; i++) {
^
plugins/affy2vcf.c:1885:13: note: previous definition of ‘i’ was here
for (int i = 1; i < 1 + varitr->nsmpl; i++)
^
plugins/affy2vcf.c:1895:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 1; i < 1 + varitr->nsmpl; i++) {
^
plugins/affy2vcf.c: In function ‘hdr_init’:
plugins/affy2vcf.c:1949:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < n; i++) {
^
plugins/affy2vcf.c: In function ‘adjust_clusters’:
plugins/affy2vcf.c:2139:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < n; i++) {
^
plugins/affy2vcf.c: In function ‘compute_baf_lrr’:
plugins/affy2vcf.c:2228:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < n; i++) {
^
plugins/affy2vcf.c: In function ‘process’:
plugins/affy2vcf.c:2340:5: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < nsmpl; i++) {
^
plugins/affy2vcf.c:2389:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < 2; i++) {
^
plugins/affy2vcf.c: In function ‘run’:
plugins/affy2vcf.c:2708:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < report->n_samples; i++) {
^
plugins/affy2vcf.c:2729:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < nfiles; i++) {
^
plugins/affy2vcf.c:2825:3: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < nfiles; i++)
^
plugins/affy2vcf.c:2829:11: error: redefinition of ‘i’
for (int i = 0; i < nfiles; i++) {
^
plugins/affy2vcf.c:2729:11: note: previous definition of ‘i’ was here
for (int i = 0; i < nfiles; i++) {
^
plugins/affy2vcf.c:2829:2: error: ‘for’ loop initial declarations are only allowed in C99 mode
for (int i = 0; i < nfiles; i++) {
Any suggestion to solve this compiling problem? I Have CentOS 7 and all compilation with all plugins work perfectly fine.
Best,
Petr.
Hi @freeseek ,
I dont have the bpm manifest file but I have csv manifest file. Is there any options to convert the idat file to gtc & vcf?
Regards,
Karthick
Hello,
after I obtained gtc files from idat files using Human CVN 370 manifest (.egt and .bpm files), I ran this code to get vcf file:
source /software/bcftools/1.9/start_bcftools.sh
bpm_manifest_file="humancnv370v1_c.bpm"
egt_cluster_file="HumanCNV370v1_C.egt"
gtc_list_file="gtc_370.txt"
ref="human_g1k_v37.fasta"
out_prefix="X"
bcftools +gtc2vcf
--no-version -Ov
-b $bpm_manifest_file
-e $egt_cluster_file
-g $gtc_list_file
-f $ref
-x $out_prefix.sex |
bcftools sort -Ov -T ./bcftools-sort.XXXXXX |
bcftools norm --no-version -Ov -o $out_prefix.vcf -c x -f $ref &&
bcftools index -f $out_prefix.vcf
I get the following error:
Reading EGT file HumanCNV370v1_C.egt
Data block version 5 in cluster file not supported
[E::bcf_hdr_read] Input is not detected as bcf or vcf format
Could not read VCF/BCF headers from -
Cleaning
Failed to read from standard input: unknown file type
Can you please help me with this?
Thank you.
Thanks for an excellent tool! I have been trying to use it to generate input for a CNV calling pipeline, and was pleased to discover the -Ot option for GenomeStudio text format export, which looked close enough to the format I needed. However, it seems some fields that make it to the VCF output are not exported to the text format.
Specifically, the ones I miss are NORMX/NORMY/R/THETA. I checked the code of gtcs_to_gs, and all the missing fields seem to depend on BPM_LOOKUPS being set. I couldn't see a reason why it wouldn't be though, so maybe this is the wrong track.
Exporting the same collection of GTCs to VCF had the proper format tags included.
This call:
bcftools +${GTC2VCF} \
-Ot \
--bpm ${BATCH1_MFT_BPM} \
--csv ${BATCH1_MFT_CSV} \
--egt ${BATCH1_EGT} \
--gtcs ${GTCDIR}/${BATCH1_NAME} \
--fasta-ref ${REF} > ${OUT_PREFIX}.FDT.tsv
Produces output with these columns (truncated):
Index
Name
Address
Chr
Position
GenTrain Score
Frac A
Frac C
Frac G
Frac T
204379800081_R02C02.GType
204379800081_R02C02.Score
204379800081_R02C02.B Allele Freq
204379800081_R02C02.Log R Ratio
204379800081_R02C02.X Raw
204379800081_R02C02.Y Raw
204379800081_R02C02.Top Alleles
204379800081_R02C02.Plus/Minus Alleles
204379800081_R02C01.GType
204379800081_R02C01.Score
204379800081_R02C01.B Allele Freq
204379800081_R02C01.Log R Ratio
204379800081_R02C01.X Raw
204379800081_R02C01.Y Raw
...
While an equivalent call requesting vcf output:
bcftools +${GTC2VCF} \
-Ou \
--bpm ${BATCH1_MFT_BPM} \
--csv ${BATCH1_MFT_CSV} \
--egt ${BATCH1_EGT} \
--gtcs ${GTCDIR}/${BATCH1_NAME} \
--fasta-ref ${REF} \
--extra ${OUT_PREFIX}.tsv | \
bcftools sort -Ou -T $TMPDIR/bcftools-sort.XXXXXX | \
bcftools norm -Oz -o ${OUT_PREFIX}.vcf.gz -c x -f $REF
produces a VCF with the expected format tags:
GT:GQ:IGC:BAF:LRR:NORMX:NORMY:R:THETA:X:Y
Tested on the stable version from http://software.broadinstitute.org/software/gtc2vcf/ and the current github version getting the same results.
I can query the VCF to get the data I need, but thought I should report this since the behavior was unexpected.
I already have the call, intensities, and confidence file. I am running the gtc2vcf on my Affymetrix genotype calls and intensities with the code provided but it returns with this error message:
[W::bcf_record_check] Bad BCF record: Invalid CONTIG id -1
Hello, and thanks for a great tool!
I am working on some older genotype data (on the PsychChip) where the IDAT files have unfortunately been lost to time, but where we do have a reasonably rich GenomeStudio text format export, and the original csv manifest file used when generating the export. I want to combine this with newer genotyping waves where we do have the IDATs, and would like to remap the markers using gtc2vcf to hopefully be done with strand and allele issues once and for all. But currently gtc2vcf does not permit --genome-studio to be used with --csv and/or --sam-flank.
Would it be possible to extend gtc2vcf to this use case, or is there some vital information I am missing that makes it a bad idea or impossible?
The GS export has columns (followed by 6-15 repeated for each sample):
1: Index
2: Name
3: Address
4: Chr
5: Position
6: S1.GType
7: S1.Score
8: S1.Theta
9: S1.R
10: S1.X Raw
11: S1.Y Raw
12: S1.X
13: S1.Y
14: S1.B Allele Freq
15: S1.Log R Ratio
16: ...
My csv manifest has columns:
1: IlmnID
2: Name
3: IlmnStrand
4: SNP
5: AddressA_ID
6: AlleleA_ProbeSeq
7: AddressB_ID
8: AlleleB_ProbeSeq
9: GenomeBuild
10: Chr
11: MapInfo
12: Ploidy
13: Species
14: Source
15: SourceVersion
16: SourceStrand
17: SourceSeq
18: TopGenomicSeq
19: BeadSetID
Dear freeseek,
I have some trouble in installing gtc2vcf. When I installed htslib,there is something wrong.
$./configure
checking for gcc... /usr/local/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether /usr/local/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc accepts -g... yes
checking for /usr/local/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cc option to accept ISO C89... none needed
checking for ranlib... /usr/local/anaconda3/bin/x86_64-conda_cos6-linux-gnu-ranlib
checking for grep that handles long lines and -e... /usr/bin/grep
checking for C compiler warning flags... -Wall
checking for pkg-config... /usr/bin/pkg-config
checking pkg-config is at least version 0.9.0... yes
checking for special C compiler options needed for large files... no
checking for _FILE_OFFSET_BITS value needed for large files... no
checking shared library type for unknown-Linux... plain .so
checking whether the compiler accepts -fvisibility=hidden... yes
checking how to run the C preprocessor... /usr/local/anaconda3/bin/x86_64-conda_cos6-linux-gnu-cpp
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for stdlib.h... (cached) yes
checking for unistd.h... (cached) yes
checking for sys/param.h... yes
checking for getpagesize... yes
checking for working mmap... yes
checking for gmtime_r... yes
checking for fsync... yes
checking for drand48... yes
checking for srand48_deterministic... no
checking whether fdatasync is declared... yes
checking for fdatasync... yes
checking for library containing log... -lm
checking for zlib.h... no
checking for inflate in -lz... no
configure: error: zlib development files not found
HTSlib uses compression routines from the zlib library http://zlib.net.
Building HTSlib requires zlib development files to be installed on the build
machine; you may need to ensure a package such as zlib1g-dev (on Debian or
Ubuntu Linux) or zlib-devel (on RPM-based Linux distributions or Cygwin)
is installed.
FAILED. This error must be resolved in order to build HTSlib successfully.
But my zlib-devel has been installed ,the version of zlib-devel:zlib-devel-1.2.7-18.el7.x86_64
I wish you can give me some help. Thank you for your help.
Best wishes,
Crane
Commit messages contain phrases like new release or new version, but there are no versioned releases/tags for this repo. That makes it hard to create a reproducible deployment for reproducible science...
Hello
I see it's necessary two steps to convert from .CEL to .VCF. In the first step is generated xxxxx.AxiomGT1.chp files (where xxxxx is the name of the original file) is this correct?
Now, I'm having problem with the second step. When I run that part of the program I have no errors but also I can't find the VCF files. This is the code I'm running:
bcftools +affy2vcf
--no-version -Ou
--csv "GenomeWideSNP_6.na35.annot.csv"
--fasta-ref "human_g1k_v37.fasta"
--chps /home/adrianib/Proyecto/cc-chp
--snp /home/adrianib/Proyecto/AxiomGT1.snp-posteriors.txt
--extra result.tsv |
bcftools sort -Ou -T ./bcftools-sort.XXXXXX |
bcftools norm --no-version -Ob -o result.bcf -c x -f "human_g1k_v37.fasta" &&
bcftools index -f result.bcf
I see there is no command to indicate the output folder as in the first step. This could be the reason I don't have output VCF files?
In summary, I have this:
Original file: xxxxx.CEL
1st step (CEL to CHP): xxxxx.AxiomGT1.chp
2nd step (CHP to VCF): ?
And my question is: Should I have a xxxxx.VCF file at the end of the second step?
Thanks for your help
Adrian
Hi,
I have attempted the thankless task of using a genomestudio .txt file. Don't have other options.
This is my genomestudio header:
Index Name Address Chr Position GenTrain Score 59_1.GType 59_1.Score 59_1.Theta 59_1.R 59_1.X Raw 59_1.Y Raw 59_1.X 59_1.Y 59_1.B Allele Freq 59_1.Log R Ratio
59_1.Top Alleles 59_1.Import Calls 59_1.Concordance 59_1.Orig Call 59_1.CNV Value 59_1.CNV Confidence 59_1.Plus/Minus Alleles
1 rs1000000 95775890 12 126890980 0.7825049 AB 0.7878883 0.4333902 2.230212 14921 7256 1.232208 0.9980044 0.5075449 0.008405539 AG -1 AG
2 rs1000002 20798118 3 183635768 0.8463691 AB 0.879837 0.4056498 1.041987 7707 3384 0.5987776 0.4432094 0.4658202 -0.06460849 AG -1 TC
This is what I get after running the --genome studio option.
As you can see the gtf almost exclusively has A/N as reference and G/N for alternative.
Counts REF: A=400K+, N=270K+, C=817. No G or T
Counts ALT: G=380K+, C=80K+, N=230K+, T=600. No A
I assume something went wrong there, if ther is a fix, would be rather grateful for advice.
Jakub
> ##contig=<ID=chrUn_GL000218v1,length=161147>
> ##contig=<ID=chrEBV,length=171823>
> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
> ##FORMAT=<ID=IGC,Number=1,Type=Float,Description="Illumina GenCall Confidence Score">
> ##FORMAT=<ID=BAF,Number=1,Type=Float,Description="B Allele Frequency">
> ##FORMAT=<ID=LRR,Number=1,Type=Float,Description="Log R Ratio">
> ##bcftools_+gtc2vcfVersion=1.9+htslib-1.9
> ##bcftools_+gtc2vcfCommand=gtc2vcf -f GCA_000001405.15_GRCh38_no_alt_analysis_set.fna --genome-studio P150645.txt -o P150645.vcf; Date=Sun Apr 19 16:38:13 2020
> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 59_1
> chr12 126890980 rs1000000 A G . . . GT:IGC:BAF:LRR 0/1:0.787888:0.507545:0.00840554
> chr3 183635768 rs1000002 A G . . . GT:IGC:BAF:LRR 0/1:0.879837:0.46582:-0.0646085
> chr4 95733906 rs10000023 A N . . . GT:IGC:BAF:LRR 0/0:0.755057:0.0114496:0.0626198
> chr3 98342907 rs1000003 A N . . . GT:IGC:BAF:LRR 0/0:0.790033:0.0316309:0.157232
> chr4 103374154 rs10000030 N G . . . GT:IGC:BAF:LRR 1/1:0.7819:0.989484:0.0112141
> chr4 38924330 rs10000037 A G . . . GT:IGC:BAF:LRR 0/1:0.899376:0.512505:-0.0105628
> chr4 165621955 rs10000041 A N . . . GT:IGC:BAF:LRR 0/0:0.923617:0.0272044:0.23714
> chr4 5237152 rs10000042 N G . . . GT:IGC:BAF:LRR 1/1:0.784419:1:0.0955165
> chr4 118948220 rs10000049 A N . . . GT:IGC:BAF:LRR 0/0:0.432164:0.00305451:0.0203382
> chr2 237752054 rs1000007 A N . . . GT:IGC:BAF:LRR 0/0:0.908865:0.0369862:-0.0206654
> chr4 43022222 rs10000073 A N . . . GT:IGC:BAF:LRR 0/0:0.925892:0.0150725:0.0507056
> chr4 17348363 rs10000081 A N . . . GT:IGC:BAF:LRR 0/0:0.905235:0:0.0410943
> chr4 21895517 rs10000092 A G . . . GT:IGC:BAF:LRR 0/1:0.839045:0.558548:-0.333746
> chr4 53623677 rs10000105 N G . . . GT:IGC:BAF:LRR 1/1:0.864878:0.981329:0.114554
> chr4 37796830 rs10000119 N G . . . GT:IGC:BAF:LRR 1/1:0.907763:1:-0.069378
> chr4 109106451 rs10000124 N C . . . GT:IGC:BAF:LRR 1/1:0.810363:0.995153:-0.108182
> chr4 80666077 rs10000154 A N . . . GT:IGC:BAF:LRR 0/0:0.926977:0:-0.175949
> chr2 235690982 rs1000016 A N . . . GT:IGC:BAF:LRR 0/0:0.870474:0:-0.068737
> chr4 69033099 rs10000160 N G . . . GT:IGC:BAF:LRR 1/1:0.901467:1:0.174004
hi, devoloper. After I install bcftools-1.11 and gtc2vcf, I run the following code
/data_6t/lizhan/02.software/bcftools-1.11/bcftools +affy2vcf \ --no-version -Ou \ --csv $csv_manifest_file \ --fasta-ref $ref \ --chps $path_to_chp_folder \ --snp $path_to_txt_folder/AxiomGT1.snp-posteriors.txt \ --extra $out_prefix.tsv
but there are some error message.
Writing to ./bcftools-sort.ribgu4
/data_6t/lizhan/02.software/bcftools-1.11/plugins/affy2vcf.so:
dlopen .. /data_6t/lizhan/02.software/bcftools-1.11/plugins/affy2vcf.so: undefined symbol: set_wmode
affy2vcf:
dlopen .. affy2vcf: cannot open shared object file: No such file or directory
The bcftools plugin "affy2vcf" was not found or is not functional in
BCFTOOLS_PLUGINS="/data_6t/lizhan/02.software/bcftools-1.11/plugins".
Is the plugin path correct?
Run "bcftools plugin -l" or "bcftools plugin -lvv" for a list of available plugins.
Could not load "affy2vcf".
First of all, thank You very much for this excellent pipeline!
I have been able to convert idat files successfully to GTC and during the conversion, iaap-cli recognises the sample ID from samples file successfully. How ever, when converting from GTC to VCF, ID is set back to "SentrixBarcode_A_SentrixPosition_A"
Samples CSV file is structured as follows:
[Data]
Sample_ID,SentrixBarcode_A,SentrixPosition_A,Path
During the iaap-cli conversion i get message:
info: ArrayAnalysis.NormToGenCall.Services.NormToGenCallSvc[0]
[07:09:03 1893]: Writing [Sample_ID_Obfuscated] to gtc...
when I query the IDs from the converted VCF file: bcftools query -l
I get:
[SentrixBarcode_A][SentrixPosition_A]
[SentrixBarcode_A][SentrixPosition_A]
[SentrixBarcode_A]_[SentrixPosition_A]
.....
I know I can annotate VCF IDs again, but would rather form a pipeline where this is not nescessary.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.