Git Product home page Git Product logo

imputation-ukb-ref-panel's Introduction

UK Biobank imputation pipelines

About

Genotype imputation is a computational technique for estimating missing genotypes in SNP array data, using a reference panel of haplotypes. This approach extends to low-coverage whole genome sequencing data, aiding in filling missing genotypes or enhancing uncertain genotype calls from sequencing reads.

For both SNP array and low-coverage whole genome sequencing data, we've created two distinct pipelines using the UK Biobank reference panel (>200,000 samples; 700M variants) for genotype imputation. To ensure cost-effective implementation, we leverage efficient state-of-the-art tools, including IMPUTE5 (Rubinacci et al., 2020) for SNP array imputation and GLIMPSE2 (Rubinacci et al., 2023) for low-coverage WGS imputation.

Our pipelines can take input from a multi-sample VCF/BCF file with SNP array genotypes or a set of low-coverage BAM/CRAM files. Using the UK Biobank reference panel, the pipeline executes imputation through applets and dx command jobs, tailor-made for the UKB RAP. At the end of each imputation pipeline, a single multi-sample BCF file is generated per chromosome, encompassing genotype posteriors, dosages, and phased best-guess genotypes. Further outputs like haploid dosages can be acquired by specifying appropriate options in the imputation software.

Website and tutorials

Tutorials on how to use the pipelines can be found at:

https://srubinacci.gitbook.io/uk-biobank-imputation-pipelines/

Citation

If you use the pipelines in your research work, please cite the following papers:

Reference panel

Hofmeister RJ, Ribeiro DM, Rubinacci S, Delaneau O. 2023. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat Genet 55, 1243–1249 (2023).

Low-coverage WGS imputation

Rubinacci et al., Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nat Genet 55, 1088–1090 (2023).

SNP array imputation

Rubinacci et al., Genotype imputation using the Positional Burrows Wheeler Transform. PLoS Genet. 16, e1009049 (2020).

About the project

The UK Biobank imputation pipelines are developed by Simone Rubinacci & Olivier Delaneau.

License

The UK Biobank imputation pipelines are distributed with an MIT license.

imputation-ukb-ref-panel's People

Contributors

srubinacci avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

imputation-ukb-ref-panel's Issues

How to interpret quality score outputs from low-pass WGS imputation on UKB

Thank you so much for this incredible imputation tool! After following the steps for the pipeline for low-pass WGS on UK Biobank data, I am wondering how to interpret the quality score returned in the bcf file. The info scores all appear to be 0.99 or 1 ; is this an expected result and how are these info scores calculated?

AC/AN INFO fields in VCF are inconsistent with GT field, update the values in the VCF in chunk 000 and 022

Hi Simone,
I'm trying to set up your uk-biobank-imputation-pipelines/low-coverage-pipeline on Dnanexus in the US. I’m pulling the genetic map from here: https://github.com/odelaneau/GLIMPSE/tree/master/maps/genetic_maps.b38 since I do not have access to resources on RAP and the phased vcfs from here: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000G_2504_high_coverage/working/20201028_3202_phased and then I remove multiallelic records. Everything for step 2 ( in https://srubinacci.gitbook.io/uk-biobank-imputation-pipelines/low-coverage-pipeline/running-the-low-coverage-pipeline 'First-time usage - Binary reference panel representation') went smoothly for autosomes but for chr X, 2 of the chunks convert_ref_chrX_000 and convert_ref_chrX_022, are failing with the exact error in this discussion odelaneau/GLIMPSE#189. Maybe it is related to the chrX imputation and splitting chromosomes into PAR and notPAR regions and if so I can’t seem to locate the bam file with the non-PAR chr X reads downsampled to 1x. The link in the https://odelaneau.github.io/GLIMPSE/glimpse1/tutorial_chrX.html that says “The data and the scripts for this tutorial can be downloaded HERE” is broken and the git page tutorial doesn’t seem to have that bam. I also downloaded Glimpse 2 and 1 and the tutorials there are the same as on the git page. Any ideas where I could locate tutorial_chrX?
Also, how would I call the merged vcf generated in 3.3 with glimpse_split_reference? In the tutorial link the merged vcf from step 3.3 is called with --input parameter for glimps_phase (probably --input-gl in version 2) and ref from step 2 as reference. The split_reference doesn’t have a –input option. Just call it as reference?
Also, thinking that the pipeline must have been tested successfully for chr x with the resources made available on RAP- I wonder if we could get the splitVCFs hosted there. That might be the quickest resolution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.