Git Product home page Git Product logo

agam-selection-atlas's Introduction

Ag1000G selection atlas

This repository contains code for building the Ag1000G selection atlas.

Information for developers

Here's how to set up a local development environment to build the site.

Cloning the repo

The site is deployed via GitHub pages. To get your local system set up in a way that is compatible with the live deployment of the site requires cloning the repo in a particular way.

Clone the GitHub repo for source code development:

$ git clone --recursive [email protected]:malariagen/agam-selection-atlas.git

Make a directory representing the "malariagen.github.io" site:

$ mkdir malariagen.github.io
$ cd malariagen.github.io

Clone the GitHub repo again for deployment of the built site (via the gh-pages branch):

$ git clone [email protected]:malariagen/agam-selection-atlas.git
$ cd agam-selection-atlas
$ git checkout gh-pages

Local directory structure should now look something like:

$ cd ../..
$ tree -L 2
.
├── agam-selection-atlas
│   ├── agam-report-base
│   ├── annotation
│   ├── build
│   ├── deps
│   ├── docs
│   ├── env.sh
│   ├── LICENSE
│   ├── notebooks
│   ├── README.md
│   ├── scripts
│   ├── snakemake
│   └── templates
└── malariagen.github.io
    └── agam-selection-atlas

So the repo has been cloned twice, into two different locations, one of which will be used for site development, the other for site deployment (checked out to gh-pages branch).

Setting up external data resources

To perform the selection scans and build the signals, some external data resources from Ag1000G and vectorbase are needed in the agam-selection-atlas development directory. There is a utility script that will download these to your local system via FTP:

$ cd agam-selection-atlas
$ ./download-external-data.sh

This will download files to a "data" folder in the repo root directory. If you want to store these elsewhere, create a symlink.

Setting up the execution environment

To set up a local execution environment, do:

$ ./binder/install-conda.sh

To activate the environment, do:

$ source binder/env.sh

Building the signals

Building of the selection signals data is controlled via the snakemake/data.yml rules file. Building the signals data can take a bit of time, so only rebuild if necessary. E.g., to build all H12 signals:

$ snakemake -s snakemake/data.yml all_h12

Building the site

Building of the static web site is controlled via the snakemake/site.yml rules file. To completely rebuild the site, do:

$ snakemake -s snakemake/site.yml all

First time round this will take some time, as many pages (especially for genes) need to be built, although any subsequent incremental builds will be faster, especially if they don't require rebuilding the gene pages.

Previewing the site

The final step in the site build copies all built files across to the malariagen.gitub.io/agam-selection-atlas folder where the gh-pages branch is checked out. To preview the site, do:

$ cd ../malariagen.github.io
$ python -m http.server

...then browse to http://0.0.0.0:8000/agam-selection-atlas/dev/.

Note that some changes make need an empty cache and hard reload in the browser.

agam-selection-atlas's People

Contributors

alimanfoo avatar

Stargazers

Sanjay Curtis Nagi avatar Nick Harding avatar

Watchers

 avatar  avatar James Cloos avatar Richard Pearson avatar  avatar Nick Harding avatar  avatar

Forkers

melcampos

agam-selection-atlas's Issues

Postpone loading of bokeh js?

There is a weird lag at the moment when loading a page, especially where the page contains a table, where the initial table style is different then I assume some CSS resource gets loaded and the table changes. Maybe due to bokeh js and css getting loaded before bootstrap?

Use raw not windowed values to compute percentiles

Currently the maximum percentile for each signal is computed as the percentile of the maximum statistic value within the peak against values for all windows. It would be more informative and appropriate to communicating the outlier status of a peak if percentiles were computed against the raw (i.e., per-SNP) statistic values.

Scatter plots of signals by peak model fit score (delta AIC) and max percentile

Consider adding some interactive scatter plots to the site to provide an overview of signals, where each signal is plotted as a point on a scatter plot, with the peak model fit score as the X axis and the max percentile as the Y axis. Signals with strong evidence on both these scores should then appear in the top right of the plot. If the plot was done with bokeh, hover could be added to each signal similar to the hover currently used in the signal summary plots over the genome, including a preview of the peak data and fit. Also clicking on a signal could link through to the signal page.

Chromosome/signal summary plots

Produce some kind of overview plot showing a whole chromosome (arm) with all signals marked. Possibly use colours for populations and marker shapes for statistics. Even better if could be to some extent interactive, i.e., can hover over signals and follow links to signal pages.

This would be good for the all signals page as well as for the population pages.

Tutorial

Add a tutorial section with a walk-through of how to use the site.

Executive summary

Consider writing an executive summary, as a page of prose with brief explanation plus hand-written summary of the major selection signals we think are important and which populations and genes they involve, for anyone coming to the site who isn't a genetics geek.

Add GNS XPEHH

The site currently has GNS scans for H12 and IHS, but there were no GNS comparison in the raw XPEHH set. If we want to be able to compare the number of signals per population including Guinea, we should add XPEHH for GNS (probably using same comparisons as BFS).

Methods

Add content to methods section.

  • Data sources (i.e., what data were used, where did they come from)
  • IHS
  • XPEHH
  • H12
  • XPCLR
  • Delta Tajima's D
  • Peak finding algorithm

Browse windowed data

Consider adding pages and interactive plots to allow browsing of the windowed data for each selection statistic and population, plotted over whole chromosome arms. This would allow unconstrained browsing of the data, for those who want to see what else may be there that is not getting picked up by the peak finding algorithm.

Sortable signal tables

Investigate use of data tables for selection signals, so user can change the sort order (and possibly also create filters/queries).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.