Git Product home page Git Product logo

lab-conrad / resvae Goto Github PK

View Code? Open in Web Editor NEW
11.0 2.0 4.0 7.63 MB

resVAE is a restricted latent variational autoencoder that we wrote to uncover hidden structures in gene expression data, especially using single-cell RNA sequencing. In principle it can be used with any hierarchically structured data though, so feel free to play around with it.

License: GNU General Public License v3.0

Jupyter Notebook 58.50% Python 41.50%
single-cell single-cell-analysis single-cell-omics single-cell-sequencing single-cell-rna-seq gene-sets gene-expression nature-machine-intelligence

resvae's Introduction

resVAE - a restricted latent variational autoencoder

DOI

Paper on Nature Machine Intelligence | Preprint on BioRxiv

resVAE is a restricted latent variational autoencoder that we wrote to uncover hidden structures in gene expression data, especially using single-cell RNA sequencing. In principle it can be used with any hierarchically structured data though, so feel free to play around with it.

How does resVAE work?

Briefly, resVAE is not too different from a standard variational autoencoder. In case you are not familiar with artificial neural networks, imagine an algorithm that compresses data, forces the compressed representation to have a specific distribution (in our case, a Gaussian), and decompresses the data again. If you are more familiar with (variational) autoencoders, but fancy a quick reminder, have a look at this excellent explanation by Kingma and Welling. resVAE deviates from this idea in some aspects. One is that we use preclustered data, and feed the identity function of those clusters to the latent space of our network. Thus, we reserve dimensions for individual classes such as cell types, while keeping the encoder and decoder parts of the network the same. In the context of gene expression, this forces the network to learn features that are shared across cell types, but may be more or less active in one cell type or the other. Having reserved dimensions in the latent space allows us to easily map these features to say cell types or disease states. As it turns out, this enables the identification of functional gene sets, including the possibility of correcting this gene set inference for batch effects or treatment groups by encoding these in the latent variable space.

For more information regarding resVAE, please read our BioRxiv preprint or Nature Machine Intelligence paper.

Getting started

Prerequisites and installation

Although we tried to keep the list of dependencies short, resVAE does require you to download some Python packages. If you are using the install script we provide, these should be handled automatically. This is as easy as downloading the project using git clone on the project link (or pressing the button on the top right of the main project page), opening the downloaded folder using a command line, and running:

pip install -e .

In case of any issues, you can try forcing pip to install any dependencies first:

pip install -r requirements.txt

If this still does not work, try installing the failing packages individually using conda install <package> or pip install <package>.

You can also run resVAE from the project directory without installing it, the options in the example notebook should be fully compatible with this.

⚠️ Please note that you would need tensorflow or tensorflow-gpu version 1 and keras version < 2.4 to run the current version of resVAE.

Required input files

To run resVAE, you will need at the very minimum a single-cell gene expression matrix or something similar where you have a format of samples x features or features x samples. This should be in some delimited text format, with row and column names included. Ideally, you will also have a file with cluster identities. For a somewhat more complex use case, this could also be several different classes, such as cell type, treatment, disease status, and batch. Each of these variables can then be converted to a one-hot format (resVAE contains utility functions to do this) and concatenated.

Running resVAE

An example project demonstrating the workflow of resVAE is included. A more complete documentation of the API (although this is still very much work in progress) can be found in the docs/_build/html/ subfolder.

Contributing

We welcome any contributions to the project, but ask you to adhere to some rules laid out in CONTRIBUTING.md.

Licensing

This work is released under the terms of the GNU GPLv3 license. Note that this is a copyleft license, so by using resVAE in your own projects, you agree to license them under a compatible license. In case you are interested in including resVAE or parts of its code in your own projects but cannot comply with this, please contact us directly.

resvae's People

Contributors

fwten avatar lab-conrad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

resvae's Issues

mportError: cannot import name 'calinski_harabasz_score' from 'sklearn.metrics' (C:\Users\dkodira\AppData\Roaming\Python\Python37\site-packages\sklearn\metrics\__init__.py)

I get this error when I run this:
1
import numpy as np
2
import pandas as pd
3
from resVAE.resvae import resVAE
4
import resVAE.utils as cutils
5
from resVAE.config import config
6
import resVAE.reporting as report
7

8
from matplotlib import pyplot as plt
9
import seaborn as sns
10
import os
Using TensorFlow backend.

ImportError Traceback (most recent call last)
in
1 import numpy as np
2 import pandas as pd
----> 3 from resVAE.resvae import resVAE
4 import resVAE.utils as cutils
5 from resVAE.config import config

~\Documents\Python Scripts\resVAE\resVAE\resvae.py in
39 import h5py
40
---> 41 from sklearn.metrics import calinski_harabasz_score
42 from keras import constraints
43

ImportError: cannot import name 'calinski_harabasz_score' from 'sklearn.metrics' (C:\Users\dkodira\AppData\Roaming\Python\Python37\site-packages\sklearn\metrics_init_.py)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.