Git Product home page Git Product logo

copenhagen's Introduction

Summer course in analysis of high throughput data for population genetics

Tuesday - Analysis of NGS data and population structure

  • 9am - 12pm: Estimation of allele frequencies, SNP calling and genotype calling from NGS data (theory 9-10.15 + practical 10.30-12.00)

Wednesday - Research lecture

  • 4.15pm - 5pm: Research lecture

Thursday - Demography and selection

  • 1pm - 4pm: Selection scans (theory 1-2.15 + practical 2.15-4)

Material

Slides can be found here.

The data for practicals has been already downloaded and it is provided in /ricco/data/matteo/Data. These instructions, including all relevant files and scripts, can be found at /home/matteo/Copenhagen. In short, you don't have to worry about anything for the practicals.

Data

As an illustration, we will use 40 BAM files of human samples (of African, European, East Asian, and Native American descent), a reference genome, and putative ancestral sequence. We will also use 10 BAM files of Latinos in one example. To make things more interesting, we have downsampled our data to an average mean depth of 2X!.

We will also use VCF files for 120 individuals from the same populations. The human data represents a small genomic region (1MB on chromosome 2) extracted from the 1000 Genomes Project.

Preparation

For each day there will be indications on which software and scripts you will be using. However, before doing anything else, please create a folder where you will put all the results and some temporary data.

mkdir Results
mkdir Data

That's it.

Case study

MOTIVATION

Detecting signatures of natural selection in the genome has the twofold meaning of (i) understanding which adaptive processes shaped genetic variation and (ii) identifying putative functional variants. In case of humans, biological pathways enriched with selection signatures include pigmentation, immune-system regulation and metabolic processes. The latter may be related to human adaptation to different diet regimes, depending on local food availability (e.g. the case of lactase persistence in dairy-practicing populations).

The human Ectodysplasin A receptor gene, or EDAR, is part of the EDA signalling pathway which specifies prenatally the location, size and shape of ectodermal appendages (such as hair follicles, teeth and glands). EDAR is a textbook example of positive selection in East Asians (Sabeti et al. 2007 Nature) with tested phenotypic effects (using transgenic mice).

Recently, a genome-wide association study found the same functional variant in EDAR associated to several human facial traits (ear shape, chin protusion, ...) in Native American populations (Adhikari et al. Nat Commun 2016).

HYPOTHESIS

  • Is the functional allele in East Asian at high frequency in other human populations (e.g. Native Americans)?
  • Can we identify signatures of natural selection on EDAR in Native Americans?
  • Is selection targeting the same functional variant?

CHALLENGES

  • Admixed population
  • Low-depth sequencing data
  • Effect of genetif drift
  • ...

PLAN OF ACTION

Goal day 1:

  • Estimate allele frequencies for tested variant for African, European, East Asian and Native American samples from low-depth sequencing data

Optional:

  • Investigate population structure of American samples related to Europeans and Africans
  • Select individuals with high Native American ancestry

Goal day 2:

  • Perfom a sliding windows scan based on allele frequency differentiation
  • Assess statistical significance of selection signatures through simulations
  • Test for extended haplotype homozygosity on high-depth sequencing data

Agenda

Tuesday morning - introduction to NGS data

Lecture

  • Maximum likelihood and Bayesian estimation
  • Genotype likelihoods
  • Allele frequencies, SNPs and genotypes calling
  • Basic data filtering
  • Estimation of allele frequencies and SNP calling
  • Genotype calling
  • Example: estimation of allele frequencies from low-depth sequencing data: the case of EDAR genetic variation in Native Americans

Research lecture

  • TBA

Thursday afternoon - detecting selection

Lecture

  • The effect of selection on the genome
  • Methods to detect selection signals
  • The problem of assessing significance
  • Bias introduced by NGS data
  • Summary statistics from low-depth data

Practical

  • Selection scan based on genetic differentiation and diversity from low-depth data
  • Assessing significance through simulations
  • Selection test based on haplotype diversity
  • Example: detection of natural selection from low-depth sequencing data and haplotype data: the case of EDAR genetic variation in Native Americans

Credits

Thanks to Thorfinn Korneliussen, Anders Albrechtsen, Tyler Linderoth, Filipe G. Vieira, Dean Ousby, Javier Mendoza, Ryan Waples, and possibly many others I forgot to mention.

copenhagen's People

Contributors

mfumagalli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.