Git Product home page Git Product logo

sccaf's Introduction

install with bioconda

SCCAF: Single Cell Clustering Assessment Framework

Single Cell Clustering Assessment Framework (SCCAF) is a novel method for automated identification of putative cell types from single cell RNA-seq (scRNA-seq) data. By iteratively applying clustering and a machine learning approach to gene expression profiles of a given set of cells, SCCAF simultaneously identifies distinct cell groups and a weighted list of feature genes for each group. The feature genes, which are overexpressed in the particular cell group, jointly discriminate the given cell group from other cells. Each such group of cells corresponds to a putative cell type or state, characterised by the feature genes as markers.

Requirements

This package requirements vary depending on the way that you want to install it (all three are independent, you don't need all these requirements):

  • pip: if installation goes through pip, you will require Python3 and pip3 installed.
  • Bioconda: if installation goes through Bioconda, you will require that conda is installed and configured to use bioconda channels.
  • Docker container: to use SCCAF from its docker container you will need Docker installed.
  • Source code: to use and install from the source code directly, you will need to have git, Python3 and pip.

The tool depends on other Python/conda packages, but these are automatically resolved by the different installation methods.

The tool has been tested with the following versions:

  • conda: versions 4.7.5 and 4.7.10, but it should work with most other versions.
  • Docker: version 18.09.2, but should work with most other versions.
  • Python: versions 3.6.5 and 3.7. We don't expect this to work with Python 2.x.
  • Pip3: version 9.0.3, but any version of pip3 should work.

This software doesn't require any non-standard hardware.

Installation

pip

You can install SCCAF with pip:

pip install sccaf

Installation time on laptop with 16 GB of RAM and academic (LAN) internet connection: <10 minutes.

Bioconda

You can install SCCAF with bioconda (please setup conda and the bioconda channel if you haven't first, as explained here):

conda install sccaf

Installation time on laptop with 16 GB of RAM and academic (LAN) internet connection: <5 minutes.

Available as a container

You can use the SCCAF tool already setup on a Docker container. You need to choose from the available tags here and replace it in the call below where it says <tag>.

docker pull quay.io/biocontainers/sccaf:<tag>

Note: Biocontainer's containers do not have a latest tag, as such a docker pull/run without defining the tag will fail. For instance, a valid call would be (for version 0.0.10):

docker run -it quay.io/biocontainers/sccaf:0.0.10--py_0

Inside the container, you can either use the Python interactive shell or the command line version (see below).

Installation (pull) time on laptop with 16 GB of RAM and academic (LAN) internet connection: ~10 minutes.

Use latest source code

Alternatively, for the latest version, clone this repo and go into its directory, then execute pip3 install .:

git clone https://github.com/SCCAF/sccaf
cd sccaf
# you might want to create a virtualenv for SCCAF before installing
pip3 install .

if your python environment is configured for python 3, then you should be able to replace python3 for just python (although pip3 needs to be kept). In time this will be simplified by a simple pip call.

Installation (pull) time on laptop with 16 GB of RAM and academic (LAN) internet connection: ~10 minutes.

Usage within Python environment

Use with pre-clustered anndata object in the SCANPY package

The main method of SCCAF can be applied directly to an anndata (AnnData is the main data format used by Scanpy) object in Python.

Before applying SCCAF, please make sure the doublets have been excluded and the batch effect has been effectively regressed.

Assessment of the quality of a clustering

Given a clustering stored in an anndata object adata under the key louvain, we would like to understand the quality (discrimination between clusters) with SCCAF:

from SCCAF import SCCAF_assessment, plot_roc
import scanpy as sc

adata = sc.read("path-to-clusterised-and-umapped-anndata-file")
y_prob, y_pred, y_test, clf, cvsm, acc = SCCAF_assessment(adata.X, adata.obs['louvain'], n=100)

returned accuracy is in the acc variable.

The ROC curve can be plotted:

import matplotlib.pyplot as plt

plot_roc(y_prob, y_test, clf, cvsm=cvsm, acc=acc)
plt.show()

Higher accuracy indicate better discrimination. And the ROC curve shows the problematic clusters.

Optimize an over-clustering

Given an over-clustered result, SCCAF optimize the clustering by merging the cell clusters that cannot be discriminated by machine learning.

Selecting the starting clustering

The selection of start clustering (or pre-clustering, which is an over-clustering) aims to find a clustering with only over-clustering but no under-clustering. To achieve this clustering, we suggest to combine well-established clustering (e.g., louvain clustering in SCANPY or K-means or SC3) with data visualization (tSNE). We can assume that all the discriminative cell clusters should be detectable in the tSNE plot. Then, we can find a clustering (e.g, louvain with a chosen resolution, 1.5 in the example case) that separates all the "cell islands" in the tSNE plot. To achieve a higher speed, we also suggest to have as few cell cluster as possible. For example, if both resolution 1.5 and resolution 2.0 do not include under-clustering, we suggest to use resolution 1.5 result as the start clustering.

# The batch effect MUST be regressed before applying SCCAF
adata = sc.read("path-to-clusterised-and-umapped-anndata-file")

# An initial over-clustering needs to be assigned in consistent with the prefix for the optimization.
# i.e., the optimization prefix is `L2`, the starting point of the optimization of `%s_Round0`%prefix, which is `L2_Round0`.

sc.tl.louvain(adata, resolution=1.5, key_added='L2_Round0')
# i.e., we aim to achieve an accuracy >90% for the whole dataset, optimize based on the PCA space:
SCCAF_optimize_all(ad=adata, plot=False, min_acc=0.9, prefix = 'L2', use='pca')

in the above run, all changes will be left on the adata anndata object and no plots will be generated. If you want to see the plots (blocking the progress until you close them) then remove the plots=False.

Within the anndata object, assignments of cells to clusters will be left in adata.obs['<prefix>_Round<roundNumber>'].

Notebook demo

You can find some demonstrative Jupyter Notebooks here:

Usage from the command line

We have added convenience methods to use from the command line argument in the shell. This facilitate as well the inclusion in workflow systems.

Optimisation and general purpose usage

Given an annData dataset with louvain clustering pre-calculated (and batch corrected if needed):

sccaf -i <ann-data-input-file> --optimise --skip-assessment -s louvain -a 0.89 -c 8 --produce-rounds-summary

this will leave the result in new file named output.h5, which could be set via -o. In the current setting this will produce a file named rounds.txt with the name of all optimisation rounds left in the output. This file is used for later parallelisation (among different machines) of an assessment process to determine the step to choose as final clustering.

To understand all options, simply execute sccaf --help.

Parallel run of assessments

Once the optimisation has taken place, an strategy to choose the round to be used as final result is to observe the distribution of accuracies for each on multiple iterations of the assessment process. How the process is distributed is a matter of implementation of the local HPC or cloud system. Essentially, the process that can be repeated, per each round, is:

round=<name-of-the-round-in-the-output>
sccaf-asses -i output.h5 -o results/sccaf_assess_$round.txt --slot-for-existing-clustering $round --iterations 20 --cores 8

running the above for a number of different rounds will leave files in the results folder.

Merging parallel runs to produce plot

Once all assessment runs are done, the merging and plotting step can be run:

sccaf-assess-merger -i results -r rounds.txt -o rounds-acc-comparison-plot.png

This will produce a result like this: plot

sccaf's People

Contributors

chichaumiau avatar pcm32 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sccaf's Issues

error: write h5 file fail

ad.write(filename='ab.h5')
I used SCCAF but find this problem ,can you help me? thank you

OSError Traceback (most recent call last)
d:\program files\python37\lib\site-packages\anndata_io\utils.py in func_wrapper(elem, key, val, *args, **kwargs)
187 try:
--> 188 return func(elem, key, val, *args, **kwargs)
189 except Exception as e:

d:\program files\python37\lib\site-packages\anndata_io\h5ad.py in write_series(group, key, series, dataset_kwargs)
265 dtype=h5py.special_dtype(vlen=str),
--> 266 **dataset_kwargs,
267 )

d:\program files\python37\lib\site-packages\h5py_hl\group.py in create_dataset(self, name, shape, dtype, data, **kwds)
138 if name is not None:
--> 139 self[name] = dset
140 return dset

d:\program files\python37\lib\site-packages\h5py_hl\group.py in setitem(self, name, obj)
372 if isinstance(obj, HLObject):
--> 373 h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
374

h5py_objects.pyx in h5py._objects.with_phil.wrapper()

h5py_objects.pyx in h5py._objects.with_phil.wrapper()

h5py\h5o.pyx in h5py.h5o.link()

OSError: Unable to create link (name already exists)

The above exception was the direct cause of the following exception:

Outdated scanpy dependency

Thank you for your cool package. I've been experimenting with it on my own data lately.

The issue that arose after installing it (from PyPI) is that the 'scanpy==1.4.6' dependency is quite outdated. Downgrade of scanpy caused some issues that scanpy team already fixed, i.e. scverse/scanpy#1486.

Could you bump the dependencies in order to avoid this issue?

NameError: name 'default_26' is not defined

After I installed the SCCAF, I got an error as follows:
from SCCAF import *


NameError Traceback (most recent call last)
in
----> 1 from SCCAF import *

/opt/anaconda3/lib/python3.8/site-packages/SCCAF/init.py in
73 '#000080',
74 '#808080',
---> 75 '#000000', ] + default_26
76
77

NameError: name 'default_26' is not defined


Anyone knew the reason and solving method?

how to use original Dimension Reduction information from Seurat.

Hello ,I'm glad to use your softwork. I known SCCAF input file is 'pre-clustered anndata object in the SCANPY package'.I have changed rds object (by Seurat) to Anndata Object , hope to optimze clustering by SCCAF. However, the pre processed dimensional reduction information from Seurat has been add to 'obsm' slot. Therefore I can't use the original dimension reduction information by Seurat.
I would feel grateful if you can give me your personal advice at your convenience.

Error when running with plot=TRUE

Hello,

When I execute SCCAF_optimize_all() with plot = TRUE, it will eventually seem to hang after plotting the ROC plots. I have to close the figure window inorder to gain back control of the terminal and then the run terminates with an error (see below). I am not sure what the problem could be.

Thanks for your help,
shui

SCCAF_optimize_all(ad=adata, plot=True, min_acc=0.9, prefix ='L1', use='pca',n_iter=150)
R1norm_cutoff: 0.500000
R2norm_cutoff: 0.050000
Accuracy: 0.000000
======================
Round1 ...
Mean CV accuracy: 0.8820
Accuracy on the training set: 0.9449
Accuracy on the hold-out set: 0.8499
Accuracy on the training set: 0.9427
Accuracy on the hold-out set: 0.8489
Accuracy on the training set: 0.9464
Accuracy on the hold-out set: 0.8447
Max R1mat: 0.691489
Max R2mat: 0.005358
min_acc: 0.844728
IGRAPH U-W- 50 2 --

  • attr: weight (e)
  • edges:
    11--45 11--45
    ... storing 'L1_Round1' as categorical
    Round2 ...
    Mean CV accuracy: 0.8823
    Accuracy on the training set: 0.9440
    Accuracy on the hold-out set: 0.8455
    ... storing 'L1_Round1_self-projection' as categorical
    Accuracy on the training set: 0.9432
    Accuracy on the hold-out set: 0.8524
    Accuracy on the training set: 0.9438
    Accuracy on the hold-out set: 0.8514
    Max R1mat: 0.394737
    Max R2mat: 0.005553
    min_acc: 0.845477
    IGRAPH U-W- 49 0 --
  • attr: weight (e)
    Converge SCCAF_optimize no. cluster!
    m1: 0.394737
    m2: 0.005553
    Accuracy: 0.845477
    start_iter: 1
    ****R1norm_cutoff: 0.384737
    R2norm_cutoff: 0.004553
    Accuracy: 0.845477
    ======================
    Round2 ...
    Mean CV accuracy: 0.8808
    Accuracy on the training set: 0.9489
    Accuracy on the hold-out set: 0.8495
    ... storing 'L1_Round1_self-projection' as categorical
    Accuracy on the training set: 0.9415
    Accuracy on the hold-out set: 0.8468
    Accuracy on the training set: 0.9444
    Accuracy on the hold-out set: 0.8499
    Max R1mat: 1.125000
    Max R2mat: 0.005300
    min_acc: 0.846835
    IGRAPH U-W- 49 8 --
  • attr: weight (e)
  • edges:
    0--2 0--2 2--21 3--14 13--48 3--14 2--21 13--48
    ... storing 'L1_Round2' as categorical
    Traceback (most recent call last):
    File "", line 1, in
    File "/usr/local/bin/miniconda2/envs/myPythonEnv/lib/python3.7/site-packages/SCCAF/init.py", line 654, in SCCAF_optimize_all
    *args, **kwargs)
    File "/usr/local/bin/miniconda2/envs/myPythonEnv/lib/python3.7/site-packages/SCCAF/init.py", line 876, in SCCAF_optimize
    show=(mplotlib_backend is None))
    File "/usr/local/bin/miniconda2/envs/myPythonEnv/lib/python3.7/site-packages/scanpy/plotting/_anndata.py", line 120, in scatter
    return _scatter_obs(**args)
    File "/usr/local/bin/miniconda2/envs/myPythonEnv/lib/python3.7/site-packages/scanpy/plotting/_anndata.py", line 388, in _scatter_obs
    alpha=alpha,
    File "/usr/local/bin/miniconda2/envs/myPythonEnv/lib/python3.7/site-packages/scanpy/plotting/_utils.py", line 542, in scatter_group
    color = adata.uns[key + '_colors'][imask]
    IndexError: list index out of range

many SCCAF methods can only be used from inside iPython/Jupyter notebooks

This clearly reduces the usability of the tool. Executing something like from SCCAF import * on a pure python environment produces:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pmoreno/venvs/sccaf.sccaf/lib/python3.6/site-packages/SCCAF/__init__.py", line 1549, in <module>
    plotly.offline.init_notebook_mode(connected=True)
  File "/Users/pmoreno/venvs/sccaf.sccaf/lib/python3.6/site-packages/plotly/offline/offline.py", line 272, in init_notebook_mode
    raise ImportError('`iplot` can only run inside an IPython Notebook.')
ImportError: `iplot` can only run inside an IPython Notebook.

there should be some conditionality that detects whether this is being run inside iPython notebooks or not to make use of iplot or else an alternative plotting device.

Errors when using low_res parameter to constrain optimization

Hello,

Thanks for developing this useful tool!

We currently use Seurat to preprocess our single cell data and I wanted to try using your tool and took the following steps.

  1. wrote my seurat object to a loom file
  2. read the loom file into python:

adata = sc.read_loom(FILE)

  1. loaded the over-clustering assignments @ res=1.2 (computed previously in R/Seurat) into the anndata object:

adata.obs['L1_Round0'] = adata.obs['SCT_snn_res_1.2']

  1. updated X_pca to contain the Seurat computed PCA embeddings

adata.obsm['X_pca'] = adata.obsm['pca_cell_embeddings']

  1. executed the following (added a lower constraint res = 0.3):

SCCAF_optimize_all(ad=adata, plot=False, min_acc=0.9, prefix ='L1', low_res='SCT_snn_res_0.3', use='pca',n_iter=150)

I get the errors below and I am not sure what is wrong. When I run the above without the low_res parameter it completes 10-12 rounds successfully (however it seems to prefer under clustering, hence wanting to try running it with low_res set).

OUTPUT:

R1norm_cutoff: 0.500000
R2norm_cutoff: 0.050000
Accuracy: 0.000000

Round1 ...
Mean CV accuracy: 0.8799
Accuracy on the training set: 0.9441
Accuracy on the hold-out set: 0.8487
Accuracy on the training set: 0.9506
Accuracy on the hold-out set: 0.8477
Accuracy on the training set: 0.9435
Accuracy on the hold-out set: 0.8421
Max R1mat: 0.652632
Max R2mat: 0.006597
min_acc: 0.842117
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/bin/miniconda2/envs/myPythonEnv/lib/python3.7/site-packages/SCCAF/init.py", line 654, in SCCAF_optimize_all
*args, **kwargs)
File "/usr/local/bin/miniconda2/envs/myPythonEnv/lib/python3.7/site-packages/SCCAF/init.py", line 861, in SCCAF_optimize
zmat = np.minimum.reduce([(R1mat > R1norm_cutoff), conn_mat.values])
ValueError: operands could not be broadcast together with shapes (50,50) (93,93)

Thanks for your help,
shui

SCCAF cli should exit with an error when no UMAP or tSNE embeddings are available

SCCAF expects the AnnData object passed to have tSNE or UMAP embeddings; if they are not available, SCCAF cli should exit with an error code.

INFO:root:Read ann data file: DONE
INFO:root:First assesment: DONE
INFO:root:Run louvain for starting point: DONE
... storing 'L1_Round0_self-projection' as categorical
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/scanpy/plotting/_anndata.py", line 312, in _scatter_obs
    Y = adata.obsm['X_' + basis][:, components]
  File "/usr/local/lib/python3.7/site-packages/anndata/core/alignedmapping.py", line 159, in __getitem__
    return self._data[key]
KeyError: 'X_umap'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/sccaf", line 129, in <module>
    low_res=args.undercluster_boundary, mplotlib_backend=backend, c_iter=args.conf_sampling_iterations)
  File "/usr/local/lib/python3.7/site-packages/SCCAF/__init__.py", line 647, in SCCAF_optimize_all
    *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/SCCAF/__init__.py", line 804, in SCCAF_optimize
    color_map="RdYlBu_r", legend_loc='on data', frameon=False)
  File "/usr/local/lib/python3.7/site-packages/scanpy/plotting/_anndata.py", line 118, in scatter
    ax=ax)
  File "/usr/local/lib/python3.7/site-packages/scanpy/plotting/_anndata.py", line 317, in _scatter_obs
    .format(basis))
KeyError: 'compute coordinates using visualization tool umap first'
cat: can't open 'rounds.txt': No such file or directory

Pandas cannot sent attribute

Hi!

I was trying to use your package on a pre-clustered anndata object as described in your manual.

from SCCAF import *

sc.tl.leiden(adata_subset, resolution=.5, key_added='L2_Round0')
SCCAF_optimize_all(ad=adata_subset, plot=False, min_acc=0.9, prefix = 'L2', use='pca')

and I get the following output with an error:

R1norm_cutoff: 0.500000
R2norm_cutoff: 0.050000
Accuracy: 0.000000
======================
Round1 ...
Mean CV accuracy: 0.9070
Accuracy on the training set: 0.9648
Accuracy on the hold-out set: 0.8084
Accuracy on the training set: 0.9820
Accuracy on the hold-out set: 0.8063
Accuracy on the training set: 0.9695
Accuracy on the hold-out set: 0.8161
Max R1mat: 0.772727
Max R2mat: 0.021877
min_acc: 0.806326
IGRAPH U-W- 14 4 --
+ attr: weight (e)
+ edges:
0--8 0--9 0--8 0--9
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[102], line 4
      1 import SCCAF
      3 sc.tl.leiden(adata_subset, resolution=.5, key_added='L2_Round0')
----> 4 SCCAF_optimize_all(ad=adata_subset, plot=False, min_acc=0.9, prefix = 'L2', use='pca')

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/SCCAF/__init__.py:648, in SCCAF_optimize_all(ad, min_acc, R1norm_cutoff, R2norm_cutoff, R1norm_step, R2norm_step, prefix, min_i, start, start_iter, *args, **kwargs)
    646 print("Accuracy: %f" % acc)
    647 print("======================")
--> 648 ad, m1, m2, acc, start_iter = SCCAF_optimize(ad=ad,
    649                                              R1norm_cutoff=R1norm_cutoff,
    650                                              R2norm_cutoff=R2norm_cutoff,
    651                                              start_iter=start_iter,
    652                                              min_acc=min_acc, 
    653                                              prefix=prefix,
    654                                              *args, **kwargs)
    655 print("m1: %f" % m1)
    656 print("m2: %f" % m2)

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/SCCAF/__init__.py:872, in SCCAF_optimize(ad, prefix, use, use_projection, R1norm_only, R2norm_only, dist_only, dist_not, plot, basis, plot_dist, plot_cmat, mod, low_res, c_iter, n_iter, n_jobs, start_iter, sparsity, n, fraction, R1norm_cutoff, R2norm_cutoff, dist_cutoff, classifier, mplotlib_backend, min_acc)
    869     print("Converge SCCAF_optimize no. cluster!")
    870     break
--> 872 merge_cluster(ad, old_id1, new_id, groups)
    874 if plot:
    875     sc.pl.scatter(ad, basis=basis, color=[new_id], color_map="RdYlBu_r", legend_loc='on data',
    876                   show=(mplotlib_backend is None))

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/SCCAF/__init__.py:553, in merge_cluster(ad, old_id, new_id, groups)
    551 ad.obs[new_id] = ad.obs[old_id]
    552 ad.obs[new_id] = ad.obs[new_id].astype('category')
--> 553 ad.obs[new_id].cat.categories = make_unique(groups.astype(str))
    554 ad.obs[new_id] = ad.obs[new_id].str.split('_').str[0]
    555 return ad

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/base.py:178, in NoNewAttributesMixin.__setattr__(self, key, value)
    172 if getattr(self, "__frozen", False) and not (
    173     key == "_cache"
    174     or key in type(self).__dict__
    175     or getattr(self, key, None) is not None
    176 ):
    177     raise AttributeError(f"You cannot add any new attribute '{key}'")
--> 178 object.__setattr__(self, key, value)

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/accessor.py:99, in PandasDelegate._add_delegate_accessors.<locals>._create_delegator_property.<locals>._setter(self, new_values)
     98 def _setter(self, new_values):
---> 99     return self._delegate_property_set(name, new_values)

File /zaira/miniconda3/envs/xenium/lib/python3.9/site-packages/pandas/core/arrays/categorical.py:2460, in CategoricalAccessor._delegate_property_set(self, name, new_values)
   2459 def _delegate_property_set(self, name, new_values):
-> 2460     return setattr(self._parent, name, new_values)

AttributeError: can't set attribute

Thank you for your help!

galaxy typos

Hi,

just installed the Galaxy repo for this - thanks.

I noticed a few typos and thought I'd mention them here since I don't know where to find the galaxy wrapper.

Words in bold were corrected.

NGS: single cell
SCCAF Assessment Merger brings together distributed assessments.
Run SCCAF to assess and optimise clustering
SCCAF Assessment runs an assessment of an SCCAF optimisation result or an existing clustering.
SCCAF multiple regress out with multiple categorical keys on an AnnData object.

Require psutil as a dependency to gracefully kill processes when OOM

Pods killed by OOM fail before being gracefully killed with:

/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
  warnings.warn("Failed to kill subprocesses on this platform. Please"
/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
  warnings.warn("Failed to kill subprocesses on this platform. Please"
/usr/local/lib/python3.6/site-packages/joblib/externals/loky/backend/utils.py:55: UserWarning: Failed to kill subprocesses on this platform. Pleaseinstall psutil: https://github.com/giampaolo/psutil
  warnings.warn("Failed to kill subprocesses on this platform. Please"
Traceback (most recent call last):
  File "/usr/local/bin/sccaf-assess", line 71, in <module>
    y_prob, y_pred, y_test, clf, cvsm, acc = sf.SCCAF_assessment(X, y, n_jobs=args.cores)
  File "/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py", line 265, in SCCAF_assessment
    return self_projection(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py", line 352, in self_projection
    cvs = cross_val_score(clf, X_train, np.array(y_train), cv=cv, scoring='accuracy', n_jobs=n_jobs)
  File "/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 391, in cross_val_score
    error_score=error_score)
  File "/usr/local/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 232, in cross_validate
    for train, test in cv.split(X, y, groups))
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 1016, in __call__
    self.retrieve()
  File "/usr/local/lib/python3.6/site-packages/joblib/parallel.py", line 908, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "/usr/local/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 554, in wrap_future_result
    return future.result(timeout=timeout)
  File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/local/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGKILL(-9)}

How to collect and plot the accuracies of all SCCAF rounds

I was wondering how can I collect and plot the accuracies results of all rounds after I successfully run the SCCAF_optimize_all function?

I like the plot shown by the cli sccaf-assess-merger but I failed to run the cli 'sccafso I cannot runsccaf-assess-merger`.

Citation

Hi everyone!

First of all congratulations on this wonderful tool!
I would like to know how could I cite the tool in a paper that I'm finishing.

Kind regards,
Carlos

plot_roc and gray heatmap are producing large number of matplotlib figures open warning

plot_roc: line 1017
heatmap: line 992

/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py:1017: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  plt.figure()
... storing 'L1_Round2_self-projection' as categorical
/usr/local/lib/python3.6/site-packages/scanpy/plotting/utils.py:411: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  subplotpars=sppars(left=0, right=1, bottom=bottom_offset))
/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py:992: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  fig = plt.figure()
/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py:1017: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  plt.figure()
... storing 'L1_Round2_self-projection' as categorical
/usr/local/lib/python3.6/site-packages/scanpy/plotting/utils.py:411: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  subplotpars=sppars(left=0, right=1, bottom=bottom_offset))
/usr/local/lib/python3.6/site-packages/SCCAF/__init__.py:992: RuntimeWarning: More than 20 figures have been opened. Figures created through the pyplot interface (`matplotlib.pyplot.figure`) are retained until explicitly closed and may consume too much memory. (To control this warning, see the rcParam `figure.max_open_warning`).
  fig = plt.figure()

SCCAF should exit with error code when missing anndata construct

Calling SCCAF on an anndata object that doesn't contain UMAP coordinates currently only raises an exception, but it is not exiting with an error code. This is confusing for automated setups:

INFO:root:Read ann data file: DONE
INFO:root:Run louvain for starting point: DONE
... storing 'L1_Round0_self-projection' as categorical
Traceback (most recent call last):
  File "/Users/pmoreno/miniconda3/envs/[email protected]/lib/python3.6/site-packages/scanpy/plotting/anndata.py", line 304, in _scatter_obs
    Y = adata.obsm['X_' + basis][:, components]
  File "/Users/pmoreno/miniconda3/envs/[email protected]/lib/python3.6/site-packages/anndata/core/alignedmapping.py", line 159, in __getitem__
    return self._data[key]
KeyError: 'X_umap'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/pmoreno/miniconda3/envs/[email protected]/bin/sccaf", line 109, in <module>
    low_res=args.undercluster_boundary, mplotlib_backend=backend)
  File "/Users/pmoreno/miniconda3/envs/[email protected]/lib/python3.6/site-packages/SCCAF/__init__.py", line 647, in SCCAF_optimize_all
    *args, **kwargs)
  File "/Users/pmoreno/miniconda3/envs/[email protected]/lib/python3.6/site-packages/SCCAF/__init__.py", line 804, in SCCAF_optimize
    color_map="RdYlBu_r", legend_loc='on data', frameon=False)
  File "/Users/pmoreno/miniconda3/envs/[email protected]/lib/python3.6/site-packages/scanpy/plotting/anndata.py", line 114, in scatter
    ax=ax)
  File "/Users/pmoreno/miniconda3/envs/[email protected]/lib/python3.6/site-packages/scanpy/plotting/anndata.py", line 309, in _scatter_obs
    .format(basis))
KeyError: 'compute coordinates using visualization tool umap first'
cat: rounds.txt: No such file or directory

IndexError when performing SCCAF cluster analysis

Hi, I'm new in the field so I may, so I do apologize for errors I may have committed. My problem is that when I perform the cluster analysis step with SCCAF an IndexError occurs:


ad.raw=ad

sc.tl.leiden(ad,resolution=2)

sc.pl.umap(ad,color=['leiden','nnet2'])

ad.obs['L1_Round0'] = ad.obs['leiden']

SCCAF_optimize_all(min_acc=0.9,ad=ad, use='pca',basis ='umap')
541 mask = adata.obs[key].cat.categories[imask] == adata.obs[key].values
--> 542 color = adata.uns[key + '_colors'][imask]
543 if not isinstance(color[0], str):
544 from matplotlib.colors import rgb2hex

IndexError: list index out of range

The following list represents what it is stored in ad.uns['L1_Round0_self-projection_colors']:

['#1f77b4',
'#ff7f0e',
'#279e68',
'#d62728',
'#aa40fc',
'#8c564b',
'#e377c2',
'#b5bd61',
'#17becf',
'#aec7e8',
'#ffbb78',
'#98df8a',
'#ff9896',
'#c5b0d5',
'#c49c94',
'#f7b6d2',
'#dbdb8d',
'#9edae5',
'#ad494a']

I'm not sure if I've made a mistake somewhere in the analysis but I am grateful for any hints.
Thank you!

ValueError: At least one label specified must be in y_true

Hi, I have converted my Seurat object integrated assay using SeuratDisk to anndata format, and want to run SCCAF. Data is loaded with no error using sc.read. However, when I run SCCAF_optimize_all I get the following warning and error:

anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:261: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
if np.all([l not in y_true for l in labels]):

ValueError: At least one label specified must be in y_true

It seems that the seurat cluster IDs (numbers starting from 0) are treated as scalar.
I was wondering if there is any solution on how to solve this and get the SCCAF_optimize_all to run.
Thanks

SCCAF for scATAC-seq

A super helpful tool for scRNA-seq analysis!

Could SCCAF be used to solve the "cluster resolution problem" in scATAC-seq as well?

Instead of genes, SCCAF could be used with variable peaks. Alternatively, SCCAF could be run on the pseudo-gene activity matrix created from the scATAC-seq dataset. This would give similar features for SCCAF to work with.

Methods for pseudo gene activity quantification:
Seurat: only promoter and gene body accessibility
Cicero: promoter, gene body, & linked distal element accessibility

Thanks in advance!

Run SCCAF "total error" in Galaxy

Hello,

currently I'm trying to work with SCCAF in Galaxy. When I try to Run SCCAF on the data I get following error:
cannot find 'skip_init_assessment' while searching for 'mode.skip_init_assessment'

The data is preprocessed and clustered, also here is a picture for more information:
SCCAF_Galaxy_Error

Thank you in advance!

problem with SCCAF installation using conda

Hello,
I have problem when SCCAF is installed using conda. conda install SCCAF seem successful but Im getting error when importing SCCAF in python:
>>> import SCCAF /home/ubuntu/miniconda3/envs/python_env/lib/python3.7/site-packages/anndata/_core/anndata.py:21: FutureWarning: pandas.core.index is deprecated and will be removed in a future version. The public classes are available in the top-level namespace. from pandas.core.index import RangeIndex Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ubuntu/miniconda3/envs/python_env/lib/python3.7/site-packages/SCCAF/__init__.py", line 75, in <module> '#000000', ] + default_26 NameError: name 'default_26' is not defined

Do you have any idea what can cause the issue ? Installed under python version 3.7.7.
Pip installation works fine just problem with conda.

Scanpy==1.4.6 dependency while working with SCCAF

Thanks..
I am using this module for my analysis but it uses scanpy==1.4.6 if i use upgraded version of scanpy this program fails.

can you please update this scanpy version to recent ones.

Using this version i am not able to calculate the "pts" from scanpy it only works 1.8 above

louvain not declared as dependency

On a fresh installation, I get the following error

>>> from SCCAF import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pmoreno/Development/sccaf.sccaf/SCCAF/__init__.py", line 25, in <module>
    import louvain
ModuleNotFoundError: No module named 'louvain'

joblib needs to be added as dependency

When executing from SCCAF import * you get:

/Users/pmoreno/venvs/sccaf.sccaf/lib/python3.6/site-packages/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.

fixing this in the dev branch.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.