krishnaswamylab / saucie Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Dear author,
Thanks for developing the software! I have a few questions but did not find clear answer from your paper:
Thanks!
updating numpy to 1.13.3 solved the issue
it looks tensorflow-gpu==1.4.0 requires cuda 8, but cuda 8 is not supported in my system (ubntu1804).
Can you update your works to compatible with modern environment?
s.t. tensorflow-gpu==1.14, python3
or, can you provide docker images?
Hello!
I am trying to install saucie and run it on my data, however, the compatibility of softwares is causing errors in the analysis. Tensorflow 1.12.0 is available for python 3.6, for which scanpy and other softwares are not properly getting installed. Besides, tensor flow 2.12 is causing error while running this:
tf.reset_default_graph()
#build the SAUCIE model
model = SAUCIE.SAUCIE(n_features)
AttributeError: module 'tensorflow' has no attribute 'placeholder'
Can you please suggest the solution to this. Thanks in advance
Hello there, just wondering what type of license you have for the tool. Thank you so much in advance. Many thanks Carmen
Im trying to use SAUCIE.py with --batch_correct arg.
In the train_batch_correction method, it looks the method expects a series of raw files.
Does it mean that each of raw file should contain only one sample's rna-seq?
If I want to correct batches among 1000 samples, 1000 csv files should be prepared in the input_dir?
Or one batch per csv file?
I am attempting to use SAUCIE in Python 3.7.4 with tensorflow 1.4.0. My goal is to batch-correct and cluster CyTOF data. FCS files are the input data and are in folder fcs.
SAUCIE.py --format fcs --input_dir fcs --output_dir out --batch_correct --cluster
Can you tell me why I might be getting this error (which also occurs if I use CSV instead of FCS files)? Below is the terminal output.
Thanks.
ps. SAUCIE appears to be a great software. I really hope it gets some good documentation.
Training batch correction models.
Starting to train 1 batch correction models...
Training model 0
Traceback (most recent call last):
File "/software/SAUCIE/SAUCIE.py", line 338, in <module>
train_batch_correction(rawfiles)
File "/software/SAUCIE/SAUCIE.py", line 128, in train_batch_correction
raise(ex)
File "/software/SAUCIE/SAUCIE.py", line 111, in train_batch_correction
alldata = np.concatenate([refx.as_matrix(), nonrefx.as_matrix()], axis=0)
File "/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'as_matrix'
First of all, thank you for making all of these cutting edge tools for single-cell data analysis. I am analysing various CyTOF data sets where I have PBMCs from four groups of volunteers that were stimulated under three different conditions. Due to the complex design, I think SAUCIE would be a good option to try.
Unfortunately, the lack of documentation really impedes my attempt to use this tool. For example, I would like to know how you actually made the plots where you compared the cellular manifolds of acute/convalescent dengue patients etc.
In addition to how to use this tool, knowing what to do afterwards would be really helpful.
Thank you for your assistance!
Best regards,
Mikhael
Greetings,
I am able to run the program without any errors, but the output annotates every cell in the dataset as cluster "0.0". I noticed that the Saucie embeddings, when plotted, constitute just a diagonal line with very little range across the axes. I then tried changing the lambda__c and lambda__d to lower values (from the default values to 0.001), but the cluster annotations were still all "0.0" and it no longer output Saucie embedding values (i.e. the columns were blank).
My dataset consists of ~7700 cells X ~16000 genes in each of 3 samples. Using standard tools like Seurat, the clustering is clear and I have had no issues identifying different cell types using Seurat clustering.
I am running this locally on my mac (OS 10.13) using python 3.5 and tensorflow 1.4.0.
Do you happen to have any suggestions for why this might be happening?
Thanks!
Hello,
First off, I want to congratulate Matthew Amodio for getting his paper accepted in Nature Methods. SAUCIE has the potential to make a major impact in CytOF data analysis and I am very excited to be using this software!
I am characterizing a dataset with SAUCIE that is composed of 780,000 cells with 32 markers. I am currently running the following settings:
saucie = SAUCIE(data.shape[1], layers=[512, 256, 128, 2], layer_c=2, lambda_c=0.1, lambda_d=0.2, limit_gpu_fraction=0.5)
And I am training for 40,000 steps with a minibatch size of 256.
The CytOF data has been normalized, compensated, debarcoded, and arcsinh5 transformed.
I am concerned that my clustering parameters are not well tuned or appropriate.
My clustering results look like the following:
SAUCIE Clusters on a UMAP Embedding (Pardon the axes, I just noticed this as I uploaded.)
SAUCIE Clusters on the SAUCIE Embedding
The clusters seem to be arbitrary with respect to the data manifold and ignores major features in cells.
What am I missing and how can I best optimize SAUCIE to work with my data? What are the recommended optimization steps for SAUCIE?
Thank you!
Brian
Hi,
I am trying to run SAUCIE and am getting the following error,
Training batch correction models.
Starting to train 10 batch correction models...
Training model 0
SAUCIE.py:111: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
alldata = np.concatenate([refx.as_matrix(), nonrefx.as_matrix()], axis=0)
Traceback (most recent call last):
File "SAUCIE.py", line 338, in
train_batch_correction(rawfiles)
File "SAUCIE.py", line 128, in train_batch_correction
raise(ex)
File "SAUCIE.py", line 111, in train_batch_correction
alldata = np.concatenate([refx.as_matrix(), nonrefx.as_matrix()], axis=0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly
Also, it showing training models as 0. If there is any sample format and data would be nice. Here is the command I have used,
python SAUCIE.py --input_dir sample --output_dir sample_out --cluster
Line 154 in c2e5968
---- Num clusters: 9 ---- Percent clustered: 1.000 ----
In your paper, you have the following diagram, which seems to suggest that ID regularization is done prior to the last layer of the decoder.
However, in the paper, you also mention: "The ID regularization was applied to the final decoder layer, which uses a rectified linear unit." So is the ID regularization applied to the last layer of the layer before it?
Hi,
I'm wondering if there's a way to fix the seed, I've already try to fix seed for tensor flow and numpy but when i rerun the analysis with same parameters i got different number of cluster.
Another topic related to the reproducibility is the release of docker of this tool because could be useful for performing without warning and error the analysis.
Best regards,
Simone
Line 159 in c2e5968
Hi, i am trying to use saucie to find cluster, i can run the code but it fin only one cluster. I checked, and after log-transforming the data the maximun is near +11 and the minimun near -4. Someone know why could this happen?
Hi,
great stuff! I was wondering if you would like to (or are already in progress to) turn SAUCIE into a proper python package (a proper setup.py and eventually add it to pypi)? I would be open to help you out with it.
Cheers
The FCS files of my CyTOF data have variables (columns) of different types
I want to use SAUCIE for both batch correction and clustering. For clustering, I want to use only the cell lineage markers.
Thanks.
I am stacked at batch correction, please advise how to proceed here, appreciate!
My code:
x1 = np.concatenate([np.random.uniform(-3, -2, (1000, 40)), np.random.uniform(2, 3, (1000, 40))], axis=0)
x2 = np.concatenate([np.random.uniform(-3, -2, (1000, 40)), np.random.uniform(2, 3, (1000, 40))], axis=0)
x = np.concatenate([x1,x2],axis=0)
load = SAUCIE.Loader(x, shuffle=False)
saucie = SAUCIE.SAUCIE(x.shape[1], lambda_b=.1)
labels=[0 for i in range(2000)]+[1 for i in range(2000)]
saucie.batches=np.array(labels)
saucie.train(load, 100)
Exception Traceback (most recent call last)
in
----> 1 saucie.train(load, 100)
2 embedding = saucie.get_embedding(load)
3 num_clusters, clusters = saucie.get_clusters(load)
/hpcdata/bcbb/yunhua/F20/SAUCIE/model.py in train(self, load, steps, batch_size)
407 # if using batch-correction, must have labels
408 if (self.lambda_b and len(batch) < 2):
--> 409 raise Exception("If using lambda_b (batch correction), you must provide each point's batch as a label")
410
411 ops = [obn('train_op')]
Exception: If using lambda_b (batch correction), you must provide each point's batch as a label
Hi,
I'm very new to tensorflow. It seems to me that the example on the page does not deal with the batch correction. I tried to include a label array and set a value between 0-1 for lambda_b but I still couldn't get it right. The output of the variable embedding then became non-numeric. Can anyone put up an example taking care of the batch correction?
Thanks in advance for your help.
Allen
Hi,
I am trying to run SAUCIE. It turns out from the code that there is no parameter --fcs
. There is a parameter called --format
, but that us undocumented. Similarly it is not entirely clear what are the lambda values? Any elaboration will be greatly useful.
Best,
Sameet
hello, recently I read your paper and it is a perfect work. But I can not find the evaluation index code for every experiment, such as mix score and so on. Thus can you send me some codes about them ?Thanks.
Does SAUCIE not require any normalization at all of the data (e.g TMM/RPKM/TPM etc…)? It just takes raw counts?
I ran the default python SAUCIE.py (default settings) on two datasets (with clustering and batch correction), which gave only "0.0" cluster annotation for all cells. The SAUCIE embedding was just a linear embedding of all the cells (pretty much on the y=x line). Are there additional parameters that must be tuned, i.e. the lambdas? What are they?
Dear author,
I have applied SAUCIE, which is a wonderful tool, to my scRNAseq data. But there might exist some negative values in the imputation result, which range -1e-01 to -1e-10. And what I feed in the model doesn't have any negative values.
Could I transform these minimal negative values into 0? or If I have some wrong codes in the analysis.
Thanks.
import sys
import SAUCIE
import numpy as np
import tensorflow as tf
tf.reset_default_graph()
data = data.values
saucie = SAUCIE.SAUCIE(data.shape[1])
loadtrain = SAUCIE.Loader(data, shuffle=True)
saucie.train(loadtrain, steps=1000)
loadeval = SAUCIE.Loader(data, shuffle=False)
embedding = saucie.get_embedding(loadeval)
number_of_clusters, clusters = saucie.get_clusters(loadeval)
reconstruction = saucie.get_reconstruction(loadeval)
Hi,
I was trying to install SAUCIE as given in the Readme
. However, I ran into the following:
(CytofWorkflow)[sm2556@c16n09 WNV-SMART_tube_data]$ pip3 install -r SAUCIE/requirements.txt --upgrade
Collecting tensorflow==1.12 (from -r SAUCIE/requirements.txt (line 1))
Could not find a version that satisfies the requirement tensorflow==1.12 (from -r SAUCIE/requirements.txt (line 1)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1)
No matching distribution found for tensorflow==1.12 (from -r SAUCIE/requirements.txt (line 1))
This is especially strange, as I was able to use it just fine just last week. This error makes it look like tensorFlow
version 1.12 never existed!
It looks like the module fcsparser is another requirement to this program. Possibly add to the readme.
https://github.com/eyurtsev/fcsparser
Also fcswrite:
https://pypi.org/project/fcswrite/
pip install fcswrite
WARNING:tensorflow:From ./SAUCIE/model.py:342: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
This is not an Issue, just a brief on how I was able to use SAUCIE (master version as of today) with current versions of Tensorflow (2.1.0) and Numpy (1.18.1). It may be helpful to others.
System
CentOS Linux 7 with Linux 3.10.0-514.10.2.el7.x86_64
Conda 4.8.3 with Python 3.7.4
Installation of required packages using Conda
SAUCIE requires certain packages. Here, the packages are installed in the Conda environment that I will use to run SAUCIE.
conda activate base
conda create -n tensorflo fcsparser numpy pandas python tensorflow
conda activate tensorflo
pip install -U fcswrite matplotlib scikit-learn tensorboard
Upgrade SAUCIE's Tensorflow 1 code to work with Tensorflow 2
The tf_upgrade_v2 Python script installed during Tensorflow installation (above) is used. All SAUCIE code files are processed and, with any necessary changes, are put in a new folder that the script creates.
miniconda3/envs/tensorflo/bin/tf_upgrade_v2 --intree SAUCIE/ --outtree SAUCIEnew/
Modify the new SAUCIE.py file
This is for compatibility with latest Numpy. Replace the four instances of as.matrix() with to.numpy().
Use the new SAUCIE
E.g.:
python SAUCIEnew/SAUCIE.py --format fcs --input_dir fcs --output_dir out --batch_correct --cluster
Versions of installed software in the Conda environment
fcsparser 0.2.0
fcswrite 0.5.2
matplotlob 3.2.1
numpy 1.18.1
op-einsum 3.2.1
pandas 1.0.3
pip 20.0.2
python 3.7.6
scipy 1.4.1
tensorboard 2.2.1
tensorflow 2.1.0
During batch correction, I've got a warning like below,
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
The message occurs just after the 'Training model 0' log, and does not occurs in the subsequent
model training.
My environment is,
How can I resolve this warning?
It would be nice to have examples of how to run this with sample data. I was trying to run with input.csv and cols_to_use.txt.
I tried:
python ./SAUCIE-master/SAUCIE.py --input_dir ./input --output_dir ./output
I get:
Finished training models and outputing data!
Except nothing is output..
If I try:
python ./SAUCIE-master/SAUCIE.py --input_dir ./input --output_dir ./output --batch_correct --cluster
I get:
Found 0 batch-corrected models (out of 0 total models)
Found batch correction models.
Found 0 batch-corrected files (out of 1 total files)
Outputing batch corrected data.
Starting to output 1 batch corrected files...
Training cluster model.
C:\Users\kqwt693\AppData\Local\Continuum\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Traceback (most recent call last):
File "./SAUCIE-master/SAUCIE.py", line 359, in <module>
train_cluster(input_files)
File "./SAUCIE-master/SAUCIE.py", line 221, in train_cluster
raise(ex)
File "./SAUCIE-master/SAUCIE.py", line 202, in train_cluster
x = get_data(inputfiles[0], sample=2)
IndexError: list index out of range
The only one that output something was:
python ./SAUCIE-master/SAUCIE.py --input_dir ./input --output_dir ./output --cluster
It would be nice to have instructions/tutorial on what to do next with this output.
Hello,
I am trying the script with labels = None which raises the following error:
You must feed a value for placeholder tensor 'batches' with dtype int32 and shape [?]
Is it correct to set labels None for unlabeled data?
Thank you in advance!
Goodmorning, I am traing to run a batch correction. In the input directory there are 2 csv file where the first column is the index of the rows and the firs row is the index of the columns. The column's index that represent genes is the same for both datatset.
python SAUCIE.py --input_dir input --output_dir output --batch_correct
Training batch correction models.
Traceback (most recent call last):
File "SAUCIE.py", line 338, in
train_batch_correction(rawfiles)
File "SAUCIE.py", line 128, in train_batch_correction
raise(ex)
File "SAUCIE.py", line 103, in train_batch_correction
refx = get_data(ref)
File "SAUCIE.py", line 74, in get_data
newvals = asinh(x)
File "C:\Users\Federico\Desktop\Tesi\Elaborazione\Clustering SAUCIE\utils.py", line 8, in asinh
return f(x)
File "C:\Users\Federico\Anaconda3\envs\Saucie\lib\site-packages\numpy\lib\function_base.py", line 2739, in call
return self._vectorize_call(func=func, args=vargs)
File "C:\Users\Federico\Anaconda3\envs\Saucie\lib\site-packages\numpy\lib\function_base.py", line 2809, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "C:\Users\Federico\Anaconda3\envs\Saucie\lib\site-packages\numpy\lib\function_base.py", line 2769, in _get_ufunc_and_otypes
outputs = func(*inputs)
File "C:\Users\Federico\Desktop\Tesi\Elaborazione\Clustering SAUCIE\utils.py", line 7, in
f = np.vectorize(lambda y: math.asinh(y / scale))
TypeError: unsupported operand type(s) for /: 'str' and 'float'
Do you know how can i fix it?
If I try to do batch correction with jupyter notebook, using a dataset made by the union of the previous two, and a second numpy object that indicates the batches. i recive this other error:
saucie.train(load, steps=100,batch_size=256)
ValueError Traceback (most recent call last)
in
----> 1 saucie.train(load, steps=10)
~\Desktop\Tesi\Elaborazione\Clustering SAUCIE\model.py in train(self, load, steps, batch_size)
411 ops = [obn('train_op')]
412
--> 413 self.sess.run(ops, feed_dict=feed)
414
415 def get_loss(self, load, batch_size=256):
c:\users\federico\anaconda3\envs\saucie\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
--> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
c:\users\federico\anaconda3\envs\saucie\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1098 'Cannot feed value of shape %r for Tensor %r, '
1099 'which has shape %r'
-> 1100 % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
1101 if not self.graph.is_feedable(subfeed_t):
1102 raise ValueError('Tensor %s may not be fed.' % subfeed_t)
ValueError: Cannot feed value of shape (256, 1) for Tensor 'batches:0', which has shape '(?,)'
Do you know how can i fix it?
In the 02_Exploratory_analysis_of_single_cell_data_with_SAUCIE.ipynb example, i run it and get some error issues below:
'data = pca_op.fit_transform(data_raw)'
TypeError Traceback (most recent call last)
in
1 pca_op = sklearn.decomposition.PCA(100)
----> 2 data = pca_op.fit_transform(data_raw)
3 data
/usr/local/lib/python3.6/dist-packages/sklearn/decomposition/_pca.py in fit_transform(self, X, y)
374 C-ordered array, use 'np.ascontiguousarray'.
375 """
--> 376 U, S, V = self.fit(X)
377 U = U[:, :self.n_components]
378
/usr/local/lib/python3.6/dist-packages/sklearn/decomposition/_pca.py in _fit(self, X)
396
397 X = self._validate_data(X, dtype=[np.float64, np.float32],
--> 398 ensure_2d=True, copy=self.copy)
399
400 # Handle n_components==None
/usr/local/lib/python3.6/dist-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
418 f"requires y to be passed, but the target y is None."
419 )
--> 420 X = check_array(X, **check_params)
421 out = X
422 else:
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
576 dtype=dtype, copy=copy,
577 force_all_finite=force_all_finite,
--> 578 accept_large_sparse=accept_large_sparse)
579 else:
580 # If np.array(..) gives ComplexWarning, then we convert the warning
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, force_all_finite, accept_large_sparse)
351
352 if accept_sparse is False:
--> 353 raise TypeError('A sparse matrix was passed, but dense '
354 'data is required. Use X.toarray() to '
355 'convert to a dense numpy array.')
TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
I wonder how to fix this problem?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.