Git Product home page Git Product logo

scvae's Introduction

scVAE: Single-cell variational auto-encoders

This software tool implements two variants of variational auto-encoders: one with a Gaussian prior and another with a Gaussian-mixture prior. In addition, several discrete probability functions and derivations are included to handle sparse count data like single-cell gene expression data. Easy access to recent single-cell and traditional gene expression data sets are also provided. Lastly, the tool can also produce relevant analytics of the data sets and the models.

The methods used by this tool is described and examined in the paper "scVAE: Variational auto-encoders for single-cell gene expression data" by Christopher Heje Grønbech, Maximillian Fornitz Vording, Pascal Nordgren Timshel, Casper Kaae Sønderby, Tune Hannes Pers, and Ole Winther.

The tool has been developed by Christopher and Maximillian at Section for Cognitive Systems at DTU Compute with help from Casper and Lars Maaløe supervised by Ole and in collaboration with Pers Lab. It is being further developed by Christopher at Unit for Genomic Medicine (only available in Danish) at Rigshospitalet.

Setup

The tool is implemented in Python (versions 3.3--3.6) using TensorFlow. Pandas, PyTables, Beautiful Soup, and the stemming module are used for importing and preprocessing data. Analyses of the models and the results are performed using the modules NumPy, SciPy, and scikit-learn, and figures are made using the modules matplotlib, Seaborn, and Pillow.

All included data sets are downloaded and processed automatically as needed.

Installation

This tool is not available as a Python module yet. In the meantime you will first need to install all other modules which it is dependent upon by yourself. This can be done by running

$ pip install numpy scipy scikit-learn kneed tensorflow-gpu tensorflow-probability-gpu pandas tables beautifulsoup4 stemming matplotlib seaborn pillow

(If you do not have a GPU to use with TensorFlow, install the standard version by replacing tensorflow-gpu with tensorflow.)

After this, you can clone this tool to an appropriate folder:

$ git clone https://github.com/chgroenbech/scVAE.git

Running

The standard configuration of the model using the synthetic data set, can be run by just running the main.py script. Be aware that it might take some time to load and preprocess the data the first time for large data sets. Also note that to load and analyse the largest data set, which is made available by 10x Genomics and consists of 1.3 million mouse brain cells, 47 GB of memory is required (32 GB for the original data set in sparse representation and 15 GB for the reconstructed test set).

Per default, data is downloaded to the subfolder data/, models are saved in the subfolder log/, and results are saved in the subfolder results/.

To see how to change the standard configuration or use another data set, run the following command:

$ ./main.py -h

Examples

To reproduce the main results from our paper, you can run the following commands:

  • Purified immune cells data set from 10x Genomics:

      $ ./main.py -i 10x-PBMC-PP -m GMVAE -r negative_binomial -l 100 -H 100 100 -e 500 --decomposition-methods pca tsne
    
  • Mouse brain cells data set from 10x Genomics:

      $ ./main.py -i 10x-MBC -m GMVAE -K 10 -r zero_inflated_negative_binomial -l 25 -H 250 250 -e 500 --decomposition-methods pca tsne
    
  • TCGA data set:

      $ ./main.py -i TCGA-RSEM --map-features -m GMVAE -r negative_binomial -l 50 -H 500 500 -e 500 --decomposition-methods pca tsne
    

You can also model the MNIST data set. Three different versions are supported: the original, the normalised, and the binarised. To run the GMVAE model for, e.g., the binarised version, issue the following command:

$ ./main.py -i mnist_binarised -m GMVAE -r bernoulli -l 10 -H 200 200 -e 500 --decomposition-methods pca tsne

Comparisons

The script cross_analysis.py is provided to compare different models. After running several different models with different network architectures and likelihood functions, this can be run to compare these models.

It requires the relative path to the results folder, so using the standard configuration, it is run using the following command:

$ ./cross_analysis.py -R results/

Logs can be saved by adding the -s argument, and these are saved together with produced figures in the results folder specified. Data sets, models, and prediction methods can also be included or excluded using specific arguments. For documentation on these, use the command ./cross_analysis.py -h.

scvae's People

Contributors

chgroenbech avatar maximillian91 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.