Git Product home page Git Product logo

bayescan's Introduction

Codes for the paper : "Seascape genomics provides evidence for thermal adaptation and current‐mediated population structure in American lobster (Homarus americanus)"

Laura Benestan

Québec, 2016

Published in Molecular Ecology, 2016, see Benestan et al

Description

Bayescan is a command-line program that aim to identifying putative candidate loci under natural selection from genetic data, using differences in allele frequencies between populations.

BayeScan is based on the Multinomial-Dirichlet model.

This program can define three categories of putative candidate loci:

  • under diversifying selection
  • under balancing selection
  • under neutrality

For each locus, BayeScan calculates a posterior probability (Posterior odds) - available through the parameter pr_odds - for the model including selection. These posterior probabilities indicate how more likely the model with selection is compared to the neutral model. For instance a pr_odds of 10 means that there is 1/10 probability for a marker to be under selection. This number would be too high when considering a dataset with up to 10,000 markers.

In the context multiple testing such as large number of markers (up to 10,000), run BAYESCAN with appropriate parameter as recommended in Whitlock and Lotterhos, 2015. To do so, you should consider the number of loci in your dataset. You can also consult the bayescan exercice to learn more about how to interpret Bayescan files and outputs.

1. Prepare dataset in the .geste format

Most of the Next Generation Sequencing (NGS) projects generate a VCF (.vcf) or a PLINK file (.tped and .tfam) after aligning the sequences in STACKS.

The first step is to prepare files in an appropriate .geste format for BAYESCAN. Convert the .vcf file to a .geste dataset with the function genomic_converter available in the elegant radiatorpackage in R (see Thierry Gosselin github page for more details).

lobster <- genomic_converter(
  data = "13688snps-562ind.recode.vcf", strata = "population_map_lobster.txt",
  output = c("bayescan"), filename="13688snps-562ind.geste")

2. Run Bayescan in a terminal

Preparing .geste file

BayeScan uses its own input file formats that have the extension .geste. To transform the vcf file (or other formats such as genepop and structure) in a .geste file, you can easily use PGDSpider2

To run Bayescan in bash, use the following command:

./BayeScan2.1_macos64bits -n 5000 -burn 50000 -pr_odds 10000

By running Bayescan, you should check three output files: *Verif.txt: Use this file to verify that BayeScan read the input files properly *.sel: Record the parameters estimated by the model *fst.txt: Use to identify outliers

3. Use R to identify outliers from Bayescan analyses

Using R to visualize the outputs First, download libraries

library(ggplot2) 

Open the bayescan output file with the "_fst.txt" extension.

bayescan=read.table("bayescan-13688snps-562ind.g_fst.txt") 

The first column of the bayescan object is a SNP ID. The next three (prob, log10(P0), and qval) are related to the test of local adaptation considering the logarithm of the posterior odds - log10(PO) - and the q-value for the model with selection. The fifth column gives the size of the locus-specific effect (alpha parameter). The last one provides the locus-specific FST averaged over all populations.

Download the list of SNPs in the right order. The .este format has SNPs in the same order than the vcf used to produce the .geste format. Tehrefore, you can use this command in bash to extract the third column containing the ID info of each SNPs in your vcf:

 grep -v "#" 13688snps-562ind.recode.vcf | cut -f 3 > 13688snps-562ind_id_vcf.txt

Then

SNPb=read.table("13688snps-562ind_id_vcf.txt",header=FALSE) 

Merge the name of the outliers with the results from bayescan.

bayescan=cbind(SNPb, bayescan) 
colnames(bayescan)=c("SNP","PROB","LOG_PO","Q_VALUE","ALPHA","FST") 
write.table(bayescan, "13688snps-bayescan-results.txt", quote=FALSE, sep="\t", row.names=FALSE) 

Change the value of the Q_VALUE column: 0 == 0.0001.  

attach(bayescan)
class(bayescan$Q_VALUE)  
bayescan$Q_VALUE <- as.numeric(bayescan$Q_VALUE) 
bayescan[bayescan$Q_VALUE<=0.0001,"Q_VALUE"]=0.0001 

Round the values.  

bayescan$LOG_PO <- (round(bayescan$LOG_PO, 4)) 
bayescan$Q_VALUE <- (round(bayescan$Q_VALUE, 4)) 
bayescan$ALPHA <- (round(bayescan$ALPHA, 4)) 
bayescan$FST <- (round(bayescan$FST, 6))

Add a column for the type of selection grouping based on a Q-VALUE < 0.05 (you can also choose a Q-VALUE < 0.01).

bayescan$SELECTION <- ifelse(bayescan$ALPHA>=0&bayescan$Q_VALUE<=0.05,"diversifying",ifelse(bayescan$ALPHA>=0&bayescan$Q_VALUE>0.05,"neutral","balancing")) 
bayescan$SELECTION<- factor(bayescan$SELECTION)
levels(bayescan$SELECTION) 

Save the results of the SNPs potentially under positive (divergent) and balancing selection (qvalue < 0.05).

positive <- bayescan[bayescan$SELECTION=="diversifying",] 
neutral <- bayescan[bayescan$SELECTION=="neutral",] 
balancing <- bayescan[bayescan$SELECTION=="balancing",]  

Check the number of SNPs belonging to each category.

xtabs(data=bayescan, ~SELECTION) 

Write the results of the SNPs potentially under selection (qvalue < 0.05).

write.table(neutral, "neutral.txt", row.names=F, quote=F)  
write.table(balancing, "balancing.txt", row.names=F, quote=F) 
write.table(positive, "positive.txt", row.names=F, quote=F) 

4. Use R to visualize Bayescan results

Transformation Log of the Q value in order to create te ggplot graph.

range(bayescan$Q_VALUE) 
bayescan$LOG10_Q <- -log10(bayescan$Q_VALUE) 

Create title for the ggplot graph.

x_title="Log(q-value)" 
y_title="Fst" 

Make the ggplot graph.

graph_1<-ggplot(bayescan,aes(x=LOG10_Q,y=FST, label=bayescan$POS)) 
graph_1+geom_point(aes(fill=bayescan$SELECTION), pch=21, size=2)+ 
  #geom_text()+ 
  scale_fill_manual(name="Selection",values=c("white","red","orange"))+ 
  labs(x=x_title)+ 
  labs(y=y_title)+ 
  theme(axis.title=element_text(size=12, family="Helvetica",face="bold"), legend.position="none")+ 
  theme(axis.text.x=element_text(colour="black"))+ 
  theme(axis.text.y=element_text(colour="black",size=12))+ 
  theme(axis.text.x=element_text(colour="black",size=12))+ 
  theme(panel.border = element_rect(colour="black", fill=NA, size=3),  
        axis.title=element_text(size=18,colour="black",family="Helvetica",face="bold")) +
        theme_classic()

Save the file in a pdf format

ggsave("bayescan_13688_562ind.pdf", dpi=600, width=5, height=5) 
dev.off()

Bayescan_Benestan_2016

You can also simply use the function already available in Bayescan. First load the function in R.

source("plot_R.r")

Make a nice graph using this plot_bayescan function.

plot_bayescan("bayescan-13688snps-562ind.g_fst.txt")

bayescan's People

Contributors

laurabenestan avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.