krishnaswamylab / saucie Goto Github PK

View Code? Open in Web Editor NEW

98.0 6.0 29.0 5.76 MB

License: MIT License

Python 100.00%

saucie's People

Contributors

Stargazers

Watchers

saucie's Issues

Input and Output

Dear author,

Thanks for developing the software! I have a few questions but did not find clear answer from your paper:

is the input a count matrix?
is the output a count matrix? Has it been normalized by library size, or log2-transformed?

Thanks!

TypeError: unique() got an unexpected keyword argument 'axis'

updating numpy to 1.13.3 solved the issue

Incompatible environment.

it looks tensorflow-gpu==1.4.0 requires cuda 8, but cuda 8 is not supported in my system (ubntu1804).

Can you update your works to compatible with modern environment?
s.t. tensorflow-gpu==1.14, python3

or, can you provide docker images?

Error in installation and usage of SAUCIE

Hello!
I am trying to install saucie and run it on my data, however, the compatibility of softwares is causing errors in the analysis. Tensorflow 1.12.0 is available for python 3.6, for which scanpy and other softwares are not properly getting installed. Besides, tensor flow 2.12 is causing error while running this:

tf.reset_default_graph()
#build the SAUCIE model
model = SAUCIE.SAUCIE(n_features)
AttributeError: module 'tensorflow' has no attribute 'placeholder'

Can you please suggest the solution to this. Thanks in advance

License

Hello there, just wondering what type of license you have for the tool. Thank you so much in advance. Many thanks Carmen

How should be prepared the input files to use SAUCIE.py? (CSV)

Im trying to use SAUCIE.py with --batch_correct arg.

In the train_batch_correction method, it looks the method expects a series of raw files.
Does it mean that each of raw file should contain only one sample's rna-seq?

If I want to correct batches among 1000 samples, 1000 csv files should be prepared in the input_dir?

Or one batch per csv file?

train_batch_correction AttributeError: 'DataFrame' object has no attribute 'as_matrix'

I am attempting to use SAUCIE in Python 3.7.4 with tensorflow 1.4.0. My goal is to batch-correct and cluster CyTOF data. FCS files are the input data and are in folder fcs.

SAUCIE.py --format fcs --input_dir fcs --output_dir out --batch_correct --cluster

Can you tell me why I might be getting this error (which also occurs if I use CSV instead of FCS files)? Below is the terminal output.

Thanks.

ps. SAUCIE appears to be a great software. I really hope it gets some good documentation.

Training batch correction models.
Starting to train 1 batch correction models...
Training model 0
Traceback (most recent call last):
  File "/software/SAUCIE/SAUCIE.py", line 338, in <module>
    train_batch_correction(rawfiles)
  File "/software/SAUCIE/SAUCIE.py", line 128, in train_batch_correction
    raise(ex)
  File "/software/SAUCIE/SAUCIE.py", line 111, in train_batch_correction
    alldata = np.concatenate([refx.as_matrix(), nonrefx.as_matrix()], axis=0)
  File "/lib/python3.7/site-packages/pandas/core/generic.py", line 5274, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'as_matrix'

A Vignette is Needed

First of all, thank you for making all of these cutting edge tools for single-cell data analysis. I am analysing various CyTOF data sets where I have PBMCs from four groups of volunteers that were stimulated under three different conditions. Due to the complex design, I think SAUCIE would be a good option to try.

Unfortunately, the lack of documentation really impedes my attempt to use this tool. For example, I would like to know how you actually made the plots where you compared the cellular manifolds of acute/convalescent dengue patients etc.

In addition to how to use this tool, knowing what to do afterwards would be really helpful.

Thank you for your assistance!

Best regards,
Mikhael

Cluster annotation only "0.0"

Greetings,

I am able to run the program without any errors, but the output annotates every cell in the dataset as cluster "0.0". I noticed that the Saucie embeddings, when plotted, constitute just a diagonal line with very little range across the axes. I then tried changing the lambda__c and lambda__d to lower values (from the default values to 0.001), but the cluster annotations were still all "0.0" and it no longer output Saucie embedding values (i.e. the columns were blank).
My dataset consists of ~7700 cells X ~16000 genes in each of 3 samples. Using standard tools like Seurat, the clustering is clear and I have had no issues identifying different cell types using Seurat clustering.
I am running this locally on my mac (OS 10.13) using python 3.5 and tensorflow 1.4.0.

Do you happen to have any suggestions for why this might be happening?
Thanks!

SAUCIE Produces Arbitrary Clusters

Hello,

First off, I want to congratulate Matthew Amodio for getting his paper accepted in Nature Methods. SAUCIE has the potential to make a major impact in CytOF data analysis and I am very excited to be using this software!

I am characterizing a dataset with SAUCIE that is composed of 780,000 cells with 32 markers. I am currently running the following settings:

saucie = SAUCIE(data.shape[1], layers=[512, 256, 128, 2], layer_c=2, lambda_c=0.1, lambda_d=0.2, limit_gpu_fraction=0.5)

And I am training for 40,000 steps with a minibatch size of 256.
The CytOF data has been normalized, compensated, debarcoded, and arcsinh5 transformed.

I am concerned that my clustering parameters are not well tuned or appropriate.

My clustering results look like the following:
SAUCIE Clusters on a UMAP Embedding (Pardon the axes, I just noticed this as I uploaded.)

SAUCIE Clusters on the SAUCIE Embedding

The clusters seem to be arbitrary with respect to the data manifold and ignores major features in cells.

What am I missing and how can I best optimize SAUCIE to work with my data? What are the recommended optimization steps for SAUCIE?

Thank you!

Brian

ValueError: all the input array dimensions except for the concatenation axis must match exactly

Hi,
I am trying to run SAUCIE and am getting the following error,

Training batch correction models.
Starting to train 10 batch correction models...
Training model 0
SAUCIE.py:111: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
alldata = np.concatenate([refx.as_matrix(), nonrefx.as_matrix()], axis=0)
Traceback (most recent call last):
File "SAUCIE.py", line 338, in
train_batch_correction(rawfiles)
File "SAUCIE.py", line 128, in train_batch_correction
raise(ex)
File "SAUCIE.py", line 111, in train_batch_correction
alldata = np.concatenate([refx.as_matrix(), nonrefx.as_matrix()], axis=0)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

Also, it showing training models as 0. If there is any sample format and data would be nice. Here is the command I have used,
python SAUCIE.py --input_dir sample --output_dir sample_out --cluster

The act of 'h2' is 'sigmoid' instead of 'lrelu', why?

SAUCIE/model.py

Line 154 in c2e5968

 h2 = tf.layers.dense(h1, self.layers[1], activation=tf.nn.sigmoid, name='encoder1') 

"Percent clustered" appears to actually be a proportion

---- Num clusters: 9 ---- Percent clustered: 1.000 ----

Inconsistent paper diagram and model description

In your paper, you have the following diagram, which seems to suggest that ID regularization is done prior to the last layer of the decoder.

However, in the paper, you also mention: "The ID regularization was applied to the final decoder layer, which uses a rectified linear unit." So is the ID regularization applied to the last layer of the layer before it?

Reproducibility and Seed

Hi,

I'm wondering if there's a way to fix the seed, I've already try to fix seed for tensor flow and numpy but when i rerun the analysis with same parameters i got different number of cluster.

Another topic related to the reproducibility is the release of docker of this tool because could be useful for performing without warning and error the analysis.

Best regards,

Simone

About 'sinh'

The cell-gene matrix from the same batch should be stored in the same '.csv' located in 'input_dir'. Right?
The values of cell-gene matrix stored in '.csv' represent the raw counts, because the 'get_data' function will transform them with 'math.asinh'. Right?
Why 'output_batch_correction' function doesn't take 'sinh' to get imputated raw count matrix?

SAUCIE/SAUCIE.py

Line 159 in c2e5968

#recon = sinh(recon)
If '3.' holds, the data used for clustering will be transformed by 'math.asinh' twice: one is in the batch correction stage, the other is in cluster stage.

Unique cluster

Hi, i am trying to use saucie to find cluster, i can run the code but it fin only one cluster. I checked, and after log-transforming the data the maximun is near +11 and the minimun near -4. Someone know why could this happen?

Turn SAUCIE into python package?

Hi,

great stuff! I was wondering if you would like to (or are already in progress to) turn SAUCIE into a proper python package (a proper setup.py and eventually add it to pypi)? I would be open to help you out with it.

Cheers

Compatibility Issues

Hi,

I tried to install SAUCIE, but ran into issues as metioned in Issue #32 . I found that the work around mentioned in #23 generally works. However, with that I ran into compatibility issues. The only solution that I found to work is here. I think SAUCIE needs to be updated to reflet the

Preprocessing of input FCS files

The FCS files of my CyTOF data have variables (columns) of different types

for immune cell lineage markers like CD3 and CD19
for cell state markers like Ki67 and pAkt
for other variables like 'Time' and 'Event length'

I want to use SAUCIE for both batch correction and clustering. For clustering, I want to use only the cell lineage markers.

Should I preprocess the FCS files to remove any column that is not pertinent for clustering (such as for 'Time' and cell status markers)?
Should the data be processed in any other way, such as log2-transformation or arcsinh transformation? Raw data values in the files range from 0 to >15,000.

Thanks.

How should I "provide each point's batch as a label"?

I am stacked at batch correction, please advise how to proceed here, appreciate!

My code:

x1 = np.concatenate([np.random.uniform(-3, -2, (1000, 40)), np.random.uniform(2, 3, (1000, 40))], axis=0)
x2 = np.concatenate([np.random.uniform(-3, -2, (1000, 40)), np.random.uniform(2, 3, (1000, 40))], axis=0)
x = np.concatenate([x1,x2],axis=0)
load = SAUCIE.Loader(x, shuffle=False)
saucie = SAUCIE.SAUCIE(x.shape[1], lambda_b=.1)
labels=[0 for i in range(2000)]+[1 for i in range(2000)]
saucie.batches=np.array(labels)
saucie.train(load, 100)

Error:

Exception Traceback (most recent call last)
in
----> 1 saucie.train(load, 100)
2 embedding = saucie.get_embedding(load)
3 num_clusters, clusters = saucie.get_clusters(load)

/hpcdata/bcbb/yunhua/F20/SAUCIE/model.py in train(self, load, steps, batch_size)
407 # if using batch-correction, must have labels
408 if (self.lambda_b and len(batch) < 2):
--> 409 raise Exception("If using lambda_b (batch correction), you must provide each point's batch as a label")
410
411 ops = [obn('train_op')]

Exception: If using lambda_b (batch correction), you must provide each point's batch as a label

Batch correction

Hi,

I'm very new to tensorflow. It seems to me that the example on the page does not deal with the batch correction. I tried to include a label array and set a value between 0-1 for lambda_b but I still couldn't get it right. The output of the variable embedding then became non-numeric. Can anyone put up an example taking care of the batch correction?

Thanks in advance for your help.

Allen

Better documentation required.

Hi,

I am trying to run SAUCIE. It turns out from the code that there is no parameter --fcs. There is a parameter called --format, but that us undocumented. Similarly it is not entirely clear what are the lambda values? Any elaboration will be greatly useful.

Best,
Sameet

Can you supply the relevant evaluation index code?

hello, recently I read your paper and it is a perfect work. But I can not find the evaluation index code for every experiment, such as mix score and so on. Thus can you send me some codes about them ?Thanks.

Is data normalization not required?

Does SAUCIE not require any normalization at all of the data (e.g TMM/RPKM/TPM etc…)? It just takes raw counts?

Uninteresting results - additional parameters?

I ran the default python SAUCIE.py (default settings) on two datasets (with clustering and batch correction), which gave only "0.0" cluster annotation for all cells. The SAUCIE embedding was just a linear embedding of all the cells (pretty much on the y=x line). Are there additional parameters that must be tuned, i.e. the lambdas? What are they?

Negative value in the imputation result.

Dear author,

I have applied SAUCIE, which is a wonderful tool, to my scRNAseq data. But there might exist some negative values in the imputation result, which range -1e-01 to -1e-10. And what I feed in the model doesn't have any negative values.

Could I transform these minimal negative values into 0? or If I have some wrong codes in the analysis.

Thanks.

  import sys
  import SAUCIE
  import numpy as np
  import tensorflow as tf
  
  tf.reset_default_graph()
  
  data = data.values
  saucie = SAUCIE.SAUCIE(data.shape[1])
  loadtrain = SAUCIE.Loader(data, shuffle=True)
  saucie.train(loadtrain, steps=1000)
  
  loadeval = SAUCIE.Loader(data, shuffle=False)
  embedding = saucie.get_embedding(loadeval)
  number_of_clusters, clusters = saucie.get_clusters(loadeval)
  reconstruction = saucie.get_reconstruction(loadeval)

Problems while installing

Hi,
I was trying to install SAUCIE as given in the Readme. However, I ran into the following:

   (CytofWorkflow)[sm2556@c16n09 WNV-SMART_tube_data]$ pip3 install -r SAUCIE/requirements.txt --upgrade
Collecting tensorflow==1.12 (from -r SAUCIE/requirements.txt (line 1))
  Could not find a version that satisfies the requirement tensorflow==1.12 (from -r SAUCIE/requirements.txt (line 1)) (from versions: 1.13.0rc1, 1.13.0rc2, 1.13.1, 1.13.2, 1.14.0rc0, 1.14.0rc1, 1.14.0, 2.0.0a0, 2.0.0b0, 2.0.0b1)
No matching distribution found for tensorflow==1.12 (from -r SAUCIE/requirements.txt (line 1))

This is especially strange, as I was able to use it just fine just last week. This error makes it look like tensorFlow version 1.12 never existed!

Another requirement

It looks like the module fcsparser is another requirement to this program. Possibly add to the readme.
https://github.com/eyurtsev/fcsparser

Also fcswrite:
https://pypi.org/project/fcswrite/

pip install fcswrite

Tensorflow 1.12 Deprecation Warning

WARNING:tensorflow:From ./SAUCIE/model.py:342: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead

[info] Use with Tensorflow 2

This is not an Issue, just a brief on how I was able to use SAUCIE (master version as of today) with current versions of Tensorflow (2.1.0) and Numpy (1.18.1). It may be helpful to others.

System

CentOS Linux 7 with Linux 3.10.0-514.10.2.el7.x86_64
Conda 4.8.3 with Python 3.7.4

Installation of required packages using Conda

SAUCIE requires certain packages. Here, the packages are installed in the Conda environment that I will use to run SAUCIE.

conda activate base
conda create -n tensorflo fcsparser numpy pandas python tensorflow 
conda activate tensorflo
pip install -U fcswrite matplotlib scikit-learn tensorboard

Upgrade SAUCIE's Tensorflow 1 code to work with Tensorflow 2

The tf_upgrade_v2 Python script installed during Tensorflow installation (above) is used. All SAUCIE code files are processed and, with any necessary changes, are put in a new folder that the script creates.

miniconda3/envs/tensorflo/bin/tf_upgrade_v2 --intree SAUCIE/ --outtree SAUCIEnew/

Modify the new SAUCIE.py file

This is for compatibility with latest Numpy. Replace the four instances of as.matrix() with to.numpy().

Use the new SAUCIE

E.g.:

python SAUCIEnew/SAUCIE.py --format fcs --input_dir fcs --output_dir out --batch_correct --cluster

Versions of installed software in the Conda environment

fcsparser 0.2.0
fcswrite 0.5.2
matplotlob 3.2.1
numpy 1.18.1
op-einsum 3.2.1
pandas 1.0.3
pip 20.0.2
python 3.7.6
scipy 1.4.1
tensorboard 2.2.1
tensorflow 2.1.0

There's a warning that may cause slower training time.

During batch correction, I've got a warning like below,

/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:96: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
The message occurs just after the 'Training model 0' log, and does not occurs in the subsequent
model training.

My environment is,

Tensorflow 1.14.0
python2.7
and the data contains almost 20,000 features.

How can I resolve this warning?

Add examples

It would be nice to have examples of how to run this with sample data. I was trying to run with input.csv and cols_to_use.txt.

I tried:
python ./SAUCIE-master/SAUCIE.py --input_dir ./input --output_dir ./output
I get:

Finished training models and outputing data!

Except nothing is output..

If I try:
python ./SAUCIE-master/SAUCIE.py --input_dir ./input --output_dir ./output --batch_correct --cluster
I get:

Found 0 batch-corrected models (out of 0 total models)
Found batch correction models.

Found 0 batch-corrected files (out of 1 total files)
Outputing batch corrected data.
Starting to output 1 batch corrected files...
Training cluster model.
C:\Users\kqwt693\AppData\Local\Continuum\Anaconda3\lib\site-packages\h5py\__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Traceback (most recent call last):
  File "./SAUCIE-master/SAUCIE.py", line 359, in <module>
    train_cluster(input_files)
  File "./SAUCIE-master/SAUCIE.py", line 221, in train_cluster
    raise(ex)
  File "./SAUCIE-master/SAUCIE.py", line 202, in train_cluster
    x = get_data(inputfiles[0], sample=2)
IndexError: list index out of range

The only one that output something was:

python ./SAUCIE-master/SAUCIE.py --input_dir ./input --output_dir ./output --cluster

It would be nice to have instructions/tutorial on what to do next with this output.

Analyzing unlabeled data raises error.

Hello,

I am trying the script with labels = None which raises the following error:

You must feed a value for placeholder tensor 'batches' with dtype int32 and shape [?]

Is it correct to set labels None for unlabeled data?
Thank you in advance!

error during batch correction

Goodmorning, I am traing to run a batch correction. In the input directory there are 2 csv file where the first column is the index of the rows and the firs row is the index of the columns. The column's index that represent genes is the same for both datatset.

python SAUCIE.py --input_dir input --output_dir output --batch_correct

Training batch correction models.
Traceback (most recent call last):
File "SAUCIE.py", line 338, in
train_batch_correction(rawfiles)
File "SAUCIE.py", line 128, in train_batch_correction
raise(ex)
File "SAUCIE.py", line 103, in train_batch_correction
refx = get_data(ref)
File "SAUCIE.py", line 74, in get_data
newvals = asinh(x)
File "C:\Users\Federico\Desktop\Tesi\Elaborazione\Clustering SAUCIE\utils.py", line 8, in asinh
return f(x)
File "C:\Users\Federico\Anaconda3\envs\Saucie\lib\site-packages\numpy\lib\function_base.py", line 2739, in call
return self._vectorize_call(func=func, args=vargs)
File "C:\Users\Federico\Anaconda3\envs\Saucie\lib\site-packages\numpy\lib\function_base.py", line 2809, in _vectorize_call
ufunc, otypes = self._get_ufunc_and_otypes(func=func, args=args)
File "C:\Users\Federico\Anaconda3\envs\Saucie\lib\site-packages\numpy\lib\function_base.py", line 2769, in _get_ufunc_and_otypes
outputs = func(*inputs)
File "C:\Users\Federico\Desktop\Tesi\Elaborazione\Clustering SAUCIE\utils.py", line 7, in
f = np.vectorize(lambda y: math.asinh(y / scale))
TypeError: unsupported operand type(s) for /: 'str' and 'float'

Do you know how can i fix it?

If I try to do batch correction with jupyter notebook, using a dataset made by the union of the previous two, and a second numpy object that indicates the batches. i recive this other error:

saucie.train(load, steps=100,batch_size=256)

ValueError Traceback (most recent call last)
in
----> 1 saucie.train(load, steps=10)

~\Desktop\Tesi\Elaborazione\Clustering SAUCIE\model.py in train(self, load, steps, batch_size)
411 ops = [obn('train_op')]
412
--> 413 self.sess.run(ops, feed_dict=feed)
414
415 def get_loss(self, load, batch_size=256):

c:\users\federico\anaconda3\envs\saucie\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
893 try:
894 result = self._run(None, fetches, feed_dict, options_ptr,
--> 895 run_metadata_ptr)
896 if run_metadata:
897 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

c:\users\federico\anaconda3\envs\saucie\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1098 'Cannot feed value of shape %r for Tensor %r, '
1099 'which has shape %r'
-> 1100 % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
1101 if not self.graph.is_feedable(subfeed_t):
1102 raise ValueError('Tensor %s may not be fed.' % subfeed_t)

ValueError: Cannot feed value of shape (256, 1) for Tensor 'batches:0', which has shape '(?,)'

Do you know how can i fix it?

Problem with Preparing the data

In the 02_Exploratory_analysis_of_single_cell_data_with_SAUCIE.ipynb example, i run it and get some error issues below:
'data = pca_op.fit_transform(data_raw)'
TypeError Traceback (most recent call last)
in
1 pca_op = sklearn.decomposition.PCA(100)
----> 2 data = pca_op.fit_transform(data_raw)
3 data

/usr/local/lib/python3.6/dist-packages/sklearn/decomposition/_pca.py in fit_transform(self, X, y)
374 C-ordered array, use 'np.ascontiguousarray'.
375 """
--> 376 U, S, V = self.fit(X)
377 U = U[:, :self.n_components]
378

/usr/local/lib/python3.6/dist-packages/sklearn/decomposition/_pca.py in _fit(self, X)
396
397 X = self._validate_data(X, dtype=[np.float64, np.float32],
--> 398 ensure_2d=True, copy=self.copy)
399
400 # Handle n_components==None

/usr/local/lib/python3.6/dist-packages/sklearn/base.py in _validate_data(self, X, y, reset, validate_separately, **check_params)
418 f"requires y to be passed, but the target y is None."
419 )
--> 420 X = check_array(X, **check_params)
421 out = X
422 else:

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
576 dtype=dtype, copy=copy,
577 force_all_finite=force_all_finite,
--> 578 accept_large_sparse=accept_large_sparse)
579 else:
580 # If np.array(..) gives ComplexWarning, then we convert the warning

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in _ensure_sparse_format(spmatrix, accept_sparse, dtype, copy, force_all_finite, accept_large_sparse)
351
352 if accept_sparse is False:
--> 353 raise TypeError('A sparse matrix was passed, but dense '
354 'data is required. Use X.toarray() to '
355 'convert to a dense numpy array.')

TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.
I wonder how to fix this problem?

krishnaswamylab / saucie Goto Github PK

saucie's People

Contributors

Stargazers

Watchers

Forkers

saucie's Issues

Error:

Recommend Projects

Recommend Topics

Recommend Org