Git Product home page Git Product logo

ancestryinference_king's Introduction

Ancestry-Inference

This repository is for ancestry inference. Here we implement a support-vector-machine(SVM)-based method to identify the most likely ancestral group(s) for an individual by leveraging known ancestry in a reference dataset (e.g., the 1000 Genomes Project data).

Referece

We prepared three reference datasets.

  • KGRef: cleaned 1000 genome with five super population groups (AFR,EUR, EAS, SAS and AMR).
  • KGeurref: cleaned EUR samples from 1000 genome. They are NEUR, SEUR and FIN.
  • HGDP_AsianRef: cleaned asian samples from HGDP. They are Central_South_Asia, Est_Asia and Middle_Est.

Related files are saved at https://www.dropbox.com/sh/fanfst7lyc1kn9u/AAAPyJhwiYdHc8H-31I-xbZua?dl=0
After clicking the DropBox link, please click the Download button at the top right corner to download files.
Use unzip to unzip Reference.zip file.

unzip Refernece.zip

Then use unxz to unzip xzipped files. For example

unxz KGeurref.bed.xz

File format

PC file: a text file with header line. It requires the following columns: FID, IID, AFF, PC1, PC2, PC3,...,PC10.

  • FID: Family ID
  • IID: Within-family ID
  • AFF code ('1' = Reference Sample, '2' = Studay Sample)
  • PC: PCs inforamtion. Study samples will be projected to the reference PC space.

Example PCs file from KING. FA, MO and SEX columns are not required for the analysis.

FID IID FA MO SEX AFF PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
HG00096 HG00096 0 0 0 1 0.0110 -0.0271 0.0098 0.0198 -0.0017 -0.0097 -0.0003 0.0010 0.0031 -0.0152
HG00097 HG00097 0 0 0 1 0.0107 -0.0275 0.0090 0.0189 -0.0008 -0.0097 -0.0012 -0.0014 -0.0024 -0.0245
HG00099 HG00099 0 0 0 1 0.0111 -0.0276 0.0102 0.0183 -0.0025 -0.0151 0.0014 0.0079 -0.0090 -0.0121

Popref file: a text file with header line. It would contain three columns. They are FID, IID and Population. Users need to creat this file before the analysis.

FID IID Population
HG00096 HG00096 EUR
HG00097 HG00097 EUR
HG00099 HG00099 EUR

Quickstart

Download KING from https://www.kingrelatedness.com/Download.shtml

Get PCs from KING PCA projection. The affection status (6th column) in study fam file need to be 2. The referecen's affection status is 1 or missing. Nothing is required if the reference is KGref.

king -b KGref,studydata --pca --projection --prefix example

Run R code for ancestry inference. Three arguments are required. They are PC file(examplepc.txt), popref file(example_popref.txt) and prefix(example). Package 'e1071' is required. Package 'ggplot2' and package 'doParallel' are optional.

Rscript Ancestry_Inference.R examplepc.txt example_popref.txt example

Also, we can run ancestry inference in KING from binary file with one command line.

king -b KGref,studydata --pca --projection --pngplot

European only inference

Keep European samples only. Get PCs from KING PCA projection.

king -b KGeurref,EurStudy --pca --projection --prefix EurStudy

Run R code the get the ancestry inference results. Three arguments are required. They are pc file, popref file and prefixname. Please keep the order.

Rscript Ancestry_Inference.R EurStudypc.txt KGeurref_popref.txt prefixname

Output file

example_InferredAncestry.txt

FID	IID	PC1	PC2	Anc_1st	Pr_1st	Anc_2nd	Pr_2nd	Ancestry
2427	NA19919	-0.0299	0.0012	AFR	0.9973	AMR	0.0016	AFR
2425	NA19902	-0.0295	0.0008	AFR	0.997	AMR	0.0017	AFR
2484	NA20335	-0.0239	-0.0029	AFR	0.9954	AMR	0.0016	AFR

PNG file

Interactive plots for ancestry inference results.

Run the following R code in R to get interactive plots. Package 'shiny' and 'ggplot2' are required. Related R files are saved at Rshiny folder.
Please upload a text file with ancestry information. Three columns are required. They are PC1, PC2 and Ancestry.

library(shiny)
runGitHub("AncestryInference_KING", "chenlab-uva", ref = "main", subdir = "Rshiny")

We will see study samples' PC1 and PC2 information after we upload the *InferredAncestry txt file.

The second plot (interactive plot) only show samples from the choosen ancestry group.

Detailed information will be listed if we are clicking the dots from the interactive plot.

Also, we can type a family ID that we are interested in and see samples' detailed information.

Reference

Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen WM (2010) Robust relationship inference in genome-wide association studies. Bioinformatics 26(22):2867-2873

ancestryinference_king's People

Contributors

zhennan-z avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.