Comments (15)
I met this problem, too. The reason is that version 154 is not available from 'ftp.ncbi.nih.gov' anymore. Instead, they offer 156 as the latest version. so you can edit the GGD recipe manually by updating the URL and version.
from bcbio-nextgen.
from bcbio-nextgen.
from bcbio-nextgen.
I encountered this problem too, located and edited all 154 entries to 156 in <path_to-bcbio>/tmpbcbio-install/cloudbiolinux/ggd-recipes//dbsnp.yaml
from bcbio-nextgen.
thanks for figuring out!
I've updated the recipe:
https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg38/dbsnp.yaml
from bcbio-nextgen.
Thanks Serhiy!
I believe that also https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg19/dbsnp.yaml needs to be changed accordingly
/Linda
from bcbio-nextgen.
Thanks, Linda!
I've updated the hg19 recipe!
SN
from bcbio-nextgen.
Thanks Serhiy!
I think the gnomad genome recipe should be updated according to the new field names of gnomad files as well
from bcbio-nextgen.
Hi Peter @wangpenhok !
I see 3.1 vcf sites are still there?
https://console.cloud.google.com/storage/browser/gcp-public-data--gnomad/release/3.1/vcf/genomes
What is the issue you are encountering?
SN
from bcbio-nextgen.
Hi Serhiy@naumenko-sa
The vcf sites are okay, but the file downloaded from $gnomad_fields_to_keep_url
may be outdated.
As I had mentioned here, [https://github.com/chapmanb/cloudbiolinux/pull/400#issuecomment-1475735769],
The majority of field names of the latest vcf files are not compatible with names in the file gnomad_fields_to_keep
,
Here are some headers from the latest vcfs:
##INFO=<ID=nhomalt-sas-XY,Number=A,Type=Integer,Description="Count of homozygous individuals in XY samples of South Asian ancestry"> ##INFO=<ID=AC-fin-XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples of Finnish ancestry"> ##INFO=<ID=AN-fin-XX,Number=1,Type=Integer,Description="Total number of alleles in XX samples of Finnish ancestry"> ##INFO=<ID=AF-fin-XX,Number=A,Type=Float,Description="Alternate allele frequency in XX samples of Finnish ancestry"> ##INFO=<ID=nhomalt-fin-XX,Number=A,Type=Integer,Description="Count of homozygous individuals in XX samples of Finnish ancestry"> ##INFO=<ID=AC-nfe-XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples of Non-Finnish European ancestry"> ##INFO=<ID=AN-nfe-XX,Number=1,Type=Integer,Description="Total number of alleles in XX samples of Non-Finnish European ancestry"> ##INFO=<ID=AF-nfe-XX,Number=A,Type=Float,Description="Alternate allele frequency in XX samples of Non-Finnish European ancestry"> ##INFO=<ID=nhomalt-nfe-XX,Number=A,Type=Integer,Description="Count of homozygous individuals in XX samples of Non-Finnish European ancestry"> ##INFO=<ID=AC-sas,Number=A,Type=Integer,Description="Alternate allele count for samples of South Asian ancestry"> ##INFO=<ID=AN-sas,Number=1,Type=Integer,Description="Total number of alleles in samples of South Asian ancestry"> ##INFO=<ID=AF-sas,Number=A,Type=Float,Description="Alternate allele frequency in samples of South Asian ancestry"> ##INFO=<ID=nhomalt-sas,Number=A,Type=Integer,Description="Count of homozygous individuals in samples of South Asian ancestry"> ##INFO=<ID=AC-oth-XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples of Other ancestry"> ##INFO=<ID=AN-oth-XX,Number=1,Type=Integer,Description="Total number of alleles in XX samples of Other ancestry"> ##INFO=<ID=AF-oth-XX,Number=A,Type=Float,Description="Alternate allele frequency in XX samples of Other ancestry"> ##INFO=<ID=nhomalt-oth-XX,Number=A,Type=Integer,Description="Count of homozygous individuals in XX samples of Other ancestry"> ##INFO=<ID=AC-amr-XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples of Latino ancestry"> ##INFO=<ID=AN-amr-XX,Number=1,Type=Integer,Description="Total number of alleles in XX samples of Latino ancestry"> ##INFO=<ID=AF-amr-XX,Number=A,Type=Float,Description="Alternate allele frequency in XX samples of Latino ancestry"> ##INFO=<ID=nhomalt-amr-XX,Number=A,Type=Integer,Description="Count of homozygous individuals in XX samples of Latino ancestry"> ##INFO=<ID=AC-XX,Number=A,Type=Integer,Description="Alternate allele count for XX samples"> ##INFO=<ID=AN-XX,Number=1,Type=Integer,Description="Total number of alleles in XX samples">
It is obvious that _
has been replaced with -
. In addition, male
/female
seems to be XY
/XX
now.
I solved this problem by manually update the gnomad_fields_to_keep
file, but this could be annoying for those who install bcbio relying on the pipeline.
Would you please update the file gnomad_fields_to_keep into the latest version?
Thanks~
from bcbio-nextgen.
Hello, I think there is still a small issue with the hg38 recipe for dbsnp, the version number should also be updated:
version=GCF_000001405.38 should be version=GCF_000001405.40
Thanks a lot for all the work put into bcbio.
from bcbio-nextgen.
Thanks @wangpenhok !
I've updated the fields file and the recipe.
I have not changed the recipe version to avoid gnomad updates for users.
@matthdsm I think you compiled that list years ago, could you please review?
Smaller populations are not in Gnomad anymore.
from bcbio-nextgen.
@matthdsm I think you compiled that list years ago, could you please review?
Hi Serhiy,
It seems you already recompiled the list? Do you need me to do something more?
from bcbio-nextgen.
Thanks Serhiy!
I believe that also https://github.com/chapmanb/cloudbiolinux/blob/master/ggd-recipes/hg19/dbsnp.yaml needs to be changed accordingly
/Linda
Hi Serhiy,
I got the same problem but for GRCh37. Would you mind also updating the dbsnp.yaml for GRCh37?
TT
from bcbio-nextgen.
Just to note I have had the same problem now with GCF_000001405.38 should be version=GCF_000001405.40 but I see it has already been noted in the comments
from bcbio-nextgen.
Related Issues (20)
- Failed in generating genome files HOT 1
- umi stat HOT 1
- Error during alignment using STAR HOT 2
- Does bcbio support smooth restart when a job is stopped? HOT 2
- Can bcbio customize the user-defined GLIBC directory? HOT 7
- bcbio run not running samples in parallel HOT 2
- mm10 Genome Installation Error
- miRBase certificate error
- miRBase genomes URLs have changed
- miRBase download files are not compressed
- Files are no longer available on miRBase ftp site
- miRBase old versions are no longer available
- Anaconda channels HOT 1
- resources assignment when perform parallel jobs HOT 1
- vcfanno bug
- Error upgrading bcbio-nextgen to add the genome and an aligner
- ATAC-seq
- ATAC-seq pipeline: what exampley is ready.bam? HOT 1
- MultiQC error: cannot import name 'TypedDict' from 'typing' HOT 2
- KeyError: 'MB1' ' returned non-zero exit status 1. when running scRNA-seq analysis for indropv3 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bcbio-nextgen.