Git Product home page Git Product logo

pypairs's Introduction

PyPairs - A python scRNA-Seq classifier

This is a python-reimplementation of the Pairs algorithm as described by A. Scialdone et. al. (2015). Original Paper available under: https://doi.org/10.1016/j.ymeth.2015.06.021

A supervised maschine learning algorithm aiming to classify single cells based on their transcriptomic signal. Initially created to predict cell cycle phase from scRNA-Seq data, this algorithm can be used for various applications.

Build to be fully compatible with Scanpy. For more details see the full documentation.

Getting Started

Note: Version 3 still under development.

Installation

This package is hosted at PyPi ( https://pypi.org/project/pypairs/ ) and can be installed on any system running Python3 via pip with:

pip install pypairs

Alternatively, pypairs can be installed using Conda (most easily obtained via the Miniconda Python distribution:

conda install -c bioconda pypairs

Minimal Example

Assuming you have two scRNA count files (csv, columns = samples, rows = genes) and one annotation file (csv, no header, two rows: "gene, class") a minimal example would look like this

from pypairs import pairs, datasets

# Load samples from the oscope scRNA-Seq dataset with known cell cycle
training_data = datasets.leng15(mode='sorted')

# Run sandbag() to identify marker pairs
marker_pairs = pairs.sandbag(training_data, fraction=0.6)

# Load samples from the oscope scRNA-Seq dataset without known cell cycle
testing_data = datasets.leng15(mode='unsorted')

# Run cyclone() score and predict cell cycle classes
result = pairs.cyclone(testing_data, marker_pairs)

# Further downstream analysis
print(result)

Core Dependencies

Authors

  • Antonio Scialdone - original algorithm
  • Ron Fechtner - implementation and extension in Python

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details

pypairs's People

Contributors

bebatut avatar fbnrst avatar flying-sheep avatar rfechtner avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

pypairs's Issues

Reproducibility issue

If I calculate cell cycle scores twice, I do not get the same results. Minimal example:

from pypairs import pairs, datasets

# Load samples from the oscope scRNA-Seq dataset with known cell cycle
training_data = datasets.leng15(mode='sorted')

# Run sandbag() to identify marker pairs
marker_pairs = pairs.sandbag(training_data, fraction=0.6)

# Load samples from the oscope scRNA-Seq dataset without known cell cycle
testing_data = datasets.leng15(mode='unsorted')

# Run cyclone() score and predict cell cycle classes
result = pairs.cyclone(testing_data, marker_pairs)
result2 = pairs.cyclone(testing_data, marker_pairs)

# Further downstream analysis
print(result.head())
print(result2.head())

Output

               G2M      S   G1 max_class cc_prediction
H1_Exp1.001  0.028  0.134  0.0         S             S
H1_Exp1.002  0.016  0.969  0.0         S             S
H1_Exp1.003  1.000  0.000  0.0       G2M           G2M
H1_Exp1.004  0.000  1.000  0.0         S             S
H1_Exp1.006  0.000  0.996  0.0         S             S
               G2M      S     G1 max_class cc_prediction
H1_Exp1.001  0.035  0.147  0.000         S             S
H1_Exp1.002  0.004  0.978  0.000         S             S
H1_Exp1.003  1.000  0.000  0.000       G2M           G2M
H1_Exp1.004  0.001  1.000  0.000         S             S
H1_Exp1.006  0.000  0.995  0.001         S             S

Notice that the scores in the two table are not the same.
I'm working with pypairs version v3.2 installed via pip.

pypairs breaks if adata contains sparse X

When using sparse X in an adata object pypairs throws a ValueError. I think it would be great if pypairs would also work with sparse data or if it at least would show a clearer error message.

from pypairs import pairs, datasets
from scipy.sparse import csr_matrix

# Load samples from the oscope scRNA-Seq dataset with known cell cycle
training_data = datasets.leng15(mode='sorted')

# Run sandbag() to identify marker pairs
marker_pairs = pairs.sandbag(training_data, fraction=0.6)

# Load samples from the oscope scRNA-Seq dataset without known cell cycle
testing_data = datasets.leng15(mode='unsorted')

testing_data.X = csr_matrix(testing_data.X)
# Run cyclone() score and predict cell cycle classes
result = pairs.cyclone(testing_data, marker_pairs)

# Further downstream analysis
print(result)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-2081ad7235a9> in <module>
      1 # Run cyclone() score and predict cell cycle classes
----> 2 result = pairs.cyclone(testing_data, marker_pairs)
      3 
      4 # Further downstream analysis
      5 print(result)

/opt/conda/lib/python3.7/site-packages/pypairs/tools/cyclone.py in cyclone(data, marker_pairs, gene_names, sample_names, iterations, min_iter, min_pairs, quantile_transform)
    153             pbar.set_postfix(phase=cat, no_marker=len(pairs), no_samples=raw_data.shape[0])
    154 
--> 155         scores[cat] = get_phase_scores(raw_data, iterations, min_iter, min_pairs, pairs, used[cat])
    156 
    157         if settings.verbosity >= 3:

/opt/conda/lib/python3.7/site-packages/pypairs/tools/cyclone.py in get_phase_scores(matrix, iterations, min_iter, min_pairs, pairs, used)
    253 def get_phase_scores(matrix, iterations, min_iter, min_pairs, pairs, used):
    254     phase_scores = np.full(matrix.shape[0], np.nan)
--> 255     get_sample_score_guv(matrix, used, iterations, min_iter, min_pairs, pairs, phase_scores)
    256 
    257     return phase_scores

ValueError: get_sample_score_guv: Input operand 0 does not have enough dimensions (has 0, gufunc core with signature (s),(b),(),(),(),(p,x)->() requires 1)

pairs.sandbag crashes

I tried to run

training_data = datasets.leng15(mode='sorted')
marker_pairs = pairs.sandbag(training_data, fraction=0.6)

And got the following output:

*** Error in `/bin/python': double free or corruption (!prev): 0x0000562d2d171400 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7c619)[0x7fe997268619]
/lib64/libc.so.6(+0x7e918)[0x7fe99726a918]
/lib64/libc.so.6(realloc+0x1b2)[0x7fe99726c752]
apps/conda3/lib/python3.6/site-packages/numba/runtime/_nrt_python.cpython-36m-x86_64-linux-gnu.so(+0x3b73)[0x7fe9077e5b73]
[0x7fe96d3896de]
[0x7fe968b8c301]
python3.6/site-packages/numba/npyufunc/tbbpool.cpython-36m-x86_64-linux-gnu.so(+0x53a6)[0x7fe9669903a6]
python3.6/site-packages/numba/npyufunc/tbbpool.cpython-36m-x86_64-linux-gnu.so(+0x5a80)[0x7fe966990a80]
python3.6/site-packages/numba/npyufunc/../../../../libtbb.so.2(+0x2669a)[0x7fe965c7069a]
python3.6/site-packages/numba/npyufunc/../../../../libtbb.so.2(+0x1ff50)[0x7fe965c69f50]
python3.6/site-packages/numba/npyufunc/../../../../libtbb.so.2(+0x1e9f3)[0x7fe965c689f3]
python3.6/site-packages/numba/npyufunc/../../../../libtbb.so.2(+0x1aa67)[0x7fe965c64a67]
python3.6/site-packages/numba/npyufunc/../../../../libtbb.so.2(+0x1ac89)[0x7fe965c64c89]
/lib64/libpthread.so.0(+0x7e25)[0x7fe9975b6e25]
/lib64/libc.so.6(clone+0x6d)[0x7fe9972e434d]

I am googling around to try to find the problem, but I post just in case someone may have an idea.

cache file required

I had an issue with pypairs when running:

from pypairs import pairs, datasets
training_data = datasets.leng15(mode='sorted')

The program hanged after a while, giving a warning that it could not write into a cache folder.

The issue was solved by creating a cache folder in my working directory.

Maybe you could create this folder automatically.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.