Git Product home page Git Product logo

cell2location's Introduction

Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)

Stars Build Status Documentation Status Downloads Open In Colab Docker image on quay.io

If you use cell2location please cite our paper:

Kleshchevnikov, V., Shmatko, A., Dann, E. et al. Cell2location maps fine-grained cell types in spatial transcriptomics. Nat Biotechnol (2022). https://doi.org/10.1038/s41587-021-01139-4 https://www.nature.com/articles/s41587-021-01139-4

Please note that cell2locations requires 2 user-provided hyperparameters (N_cells_per_location and detection_alpha) - for detailed guidance on setting these hyperparameters and their impact see the flow diagram and the note. Many real datasets (especially human) show within-slide variability in RNA detection sensitivity - requiring you to try both recommended settings of the detection_alpha parameter: detection_alpha=200 for low within-slide technical variability and detection_alpha=20 for high within-slide technical variability.

Cell2location is a principled Bayesian model that can resolve fine-grained cell types in spatial transcriptomic data and create comprehensive cellular maps of diverse tissues. Cell2location accounts for technical sources of variation and borrows statistical strength across locations, thereby enabling the integration of single cell and spatial transcriptomics with higher sensitivity and resolution than existing tools. This is achieved by estimating which combination of cell types in which cell abundance could have given the mRNA counts in the spatial data, while modelling technical effects (platform/technology effect, contaminating RNA, unexplained variance).

Overview of the spatial mapping approach and the workflow enabled by cell2location. From left to right: Single-cell RNA-seq and spatial transcriptomics profiles are generated from the same tissue (1). Cell2location takes scRNA-seq derived cell type reference signatures and spatial transcriptomics data as input (2, 3). The model then decomposes spatially resolved multi-cell RNA counts matrices into the reference signatures, thereby establishing a spatial mapping of cell types (4).

Usage and Tutorials

The tutorial covering the estimation of expresson signatures of reference cell types, spatial mapping with cell2location and the downstream analysis can be found here and tried on Google Colab: https://cell2location.readthedocs.io/en/latest/

Please report bugs via https://github.com/BayraktarLab/cell2location/issues and ask any usage questions about cell2location, scvi-tools or Visium data in scverse community discourse.

Cell2location package is implemented in a general way (using https://pyro.ai/ and https://scvi-tools.org/) to support multiple related models - both for spatial mapping, estimating reference cell type signatures and downstream analysis.

Installation

We suggest using a separate conda environment for installing cell2location.

Create conda environment and install cell2location package

conda create -y -n cell2loc_env python=3.9

conda activate cell2loc_env
pip install cell2location[tutorials]

Finally, to use this environment in jupyter notebook, add jupyter kernel for this environment:

conda activate cell2loc_env
python -m ipykernel install --user --name=cell2loc_env --display-name='Environment (cell2loc_env)'

If you do not have conda please install Miniconda first:

cd /path/to/software
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
# use prefix /path/to/software/miniconda3

Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:

export PYTHONNOUSERSITE="literallyanyletters"

Documentation and API details

User documentation is availlable on https://cell2location.readthedocs.io/en/latest/.

Cell2location architecture is designed to simplify extended versions of the model that account for additional technical and biologial information. We plan to provide a tutorial showing how to add new model classes but please get in touch if you would like to contribute or build on top our package.

Acknowledgements

We thank all paper authors for their contributions: Vitalii Kleshchevnikov, Artem Shmatko, Emma Dann, Alexander Aivazidis, Hamish W King, Tong Li, Artem Lomakin, Veronika Kedlian, Mika Sarkin Jain, Jun Sung Park, Lauma Ramona, Liz Tuck, Anna Arutyunyan, Roser Vento-Tormo, Moritz Gerstung, Louisa James, Oliver Stegle, Omer Ali Bayraktar

We also thank Pyro developers (Fritz Obermeyer, Martin Jankowiak), Krzysztof Polanski, Luz Garcia Alonso, Carlos Talavera-Lopez, Ni Huang for feedback on the package, Martin Prete for dockerising cell2location and other software support.

FAQ

See https://github.com/BayraktarLab/cell2location/discussions

Future development and experimental features

Future developments of cell2location are focused on 1) scalability to 100k-mln+ locations using amortised inference of cell abundance (same ideas as used in VAE), 2) extending cell2location to related spatial analysis tasks that require modification of the model (such as using cell type hierarchy information), and 3) incorporating features presented by more recently proposed methods (such as CAR spatial proximity modelling). We are also experimenting with Numpyro and JAX (https://github.com/vitkl/cell2location_numpyro).

Tips

Conda environment for A100 GPUs

export PYTHONNOUSERSITE="literallyanyletters"
conda create -y -n test_scvi16_cuda113 python=3.9
conda activate test_scvi16_cuda113
conda install -y -c anaconda hdf5 pytables git
pip install scvi-tools
pip install git+https://github.com/BayraktarLab/cell2location.git#egg=cell2location[tutorials]
pip3 install torch==1.11.0+cu113 torchvision==0.12.0+cu113 torchaudio==0.11.0 -f https://download.pytorch.org/whl/torch_stable.html
conda activate test_scvi16_cuda113
python -m ipykernel install --user --name=test_scvi16_cuda113 --display-name='Environment (test_scvi16_cuda113)'

Issues with package version mismatches often originate from python user site rather than conda environment being used to install a subset of packages

Before installing cell2location and it's dependencies, it could be necessary to make sure that you are creating a fully isolated conda environment by telling python to NOT use user site for installing packages by running this line before creating conda environment and every time before activatin conda environment in a new terminal session:

export PYTHONNOUSERSITE="literallyanyletters"

Useful code for reading and combining multiple Visium sections

Keeping info on distinct sections in a csv file (Google Sheet).

sample_annot = pd.read_csv('./sample_annot.csv')

from glob import glob
sample_annot['path'] = pd.Series(
    glob(f'{sp_data_folder}*'),
    index=[sub('^.+WTSI_', '', sub('_GRCh38-2020-A$', '', i)) for i in glob(f'{sp_data_folder}*')]
)[sample_annot['Sample_ID']].values
import os
sample_annot['file'] = [os.path.basename(i) for i in sample_annot['path']]

sample_annot['Sample_ID'].unique()

Reading and concatenating samples.

def read_and_qc(sample_name, file, path=sp_data_folder):
    """
    Read one Visium file and add minimum metadata and QC metrics to adata.obs
    NOTE: var_names is ENSEMBL ID as it should be, you can always plot with sc.pl.scatter(gene_symbols='SYMBOL')
    """
    
    adata = sc.read_visium(path + str(file) +'/',
                           count_file='filtered_feature_bc_matrix.h5',
                           load_images=True)
    adata.obs['sample'] = sample_name
    adata.var['SYMBOL'] = adata.var_names
    adata.var.rename(columns={'gene_ids': 'ENSEMBL'}, inplace=True)
    adata.var_names = adata.var['ENSEMBL']
    adata.var.drop(columns='ENSEMBL', inplace=True)
    
    # just in case there are non-unique ENSEMBL IDs
    adata.var_names_make_unique()

    # Calculate QC metrics
    sc.pp.calculate_qc_metrics(adata, inplace=True)
    adata.var['mt'] = [gene.startswith('mt-') for gene in adata.var['SYMBOL']]
    adata.obs['mt_frac'] = adata[:, adata.var['mt'].tolist()].X.sum(1).A.squeeze()/adata.obs['total_counts']
    
    # add sample name to obs names
    adata.obs["sample"] = [str(i) for i in adata.obs['sample']]
    adata.obs_names = 's' + adata.obs["sample"] \
                          + '_' + adata.obs_names
    adata.obs.index.name = 'spot_id'
    
    file = list(adata.uns['spatial'].keys())[0]
    adata.uns['spatial'][sample_name] = adata.uns['spatial'][file].copy()
    del adata.uns['spatial'][file]
    print(adata.uns['spatial'].keys())
    
    return adata

def read_all_and_qc(
    sample_annot, Sample_ID_col, file_col, sp_data_folder, 
    count_file='filtered_feature_bc_matrix.h5',
):
    """
    Read and concatenate all Visium files.
    """
    # read first sample
    adata = read_and_qc(
        sample_annot[Sample_ID_col][0], sample_annot[file_col][0], 
        path=sp_data_folder
    ) 

    # read the remaining samples
    slides = {}
    for i, s in enumerate(sample_annot[Sample_ID_col][1:]):
        adata_1 = read_and_qc(s, sample_annot[file_col][i], path=sp_data_folder) 
        slides[str(s)] = adata_1

    adata_0 = adata.copy()

    # combine individual samples
    #adata = adata.concatenate(list(slides.values()), index_unique=None)
    adata = adata.concatenate(
        list(slides.values()),
        batch_key="sample",
        uns_merge="unique",
        batch_categories=sample_annot[Sample_ID_col], 
        index_unique=None
    )

    sample_annot.index = sample_annot[Sample_ID_col]
    for c in sample_annot.columns:
        sample_annot.loc[:, c] = sample_annot[c].astype(str)
    adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values
    
    return adata
    
adata = read_all_and_qc(
    sample_annot=sample_annot, 
    Sample_ID_col='Sample_ID', 
    file_col='file', 
    sp_data_folder=sp_data_folder, 
    count_file='filtered_feature_bc_matrix.h5',
)

adata_incl_nontissue = read_all_and_qc(
    sample_annot=sample_annot, 
    Sample_ID_col='Sample_ID', 
    file_col='file', 
    sp_data_folder=sp_data_folder, 
    count_file='raw_feature_bc_matrix.h5',
)

Since Version 0.9.0 (released on 2023-04-11), the function AnnData.concatenate() has been deprecated in favour of anndata.concat() as per the official release notes (Reference). Here is the updated code snippet of read_all_and_qc:

from anndata import concat

def read_all_and_qc(
    sample_annot, Sample_ID_col, file_col, sp_data_folder, 
    count_file='filtered_feature_bc_matrix.h5',
):
    """
    Read and concatenate all Visium files.
    """

    # read all samples and store them in a list
    adatas = []
    for i, s in enumerate(sample_annot[Sample_ID_col]):
        adata_i = read_and_qc(s, Sample_ID_col[file_col][i], path=sp_data_folder) 
        adatas.append(adata_i)
    # combine individual samples
    adata = concat(
        adatas,
        merge="unique",
        uns_merge="unique",
        label="batch",
        keys=sample_annot[Sample_ID_col].tolist(), 
        index_unique=None
    )

    sample_annot.index = sample_annot[Sample_ID_col]
    for c in sample_annot.columns:
        sample_annot.loc[:, c] = sample_annot[c].astype(str)
    adata.obs[sample_annot.columns] = sample_annot.reindex(index=adata.obs['sample']).values

    return adata

adata = read_all_and_qc(
    sample_annot=sample_annot, 
    Sample_ID_col='Sample_ID', 
    file_col='file', 
    sp_data_folder=sp_data_folder, 
    count_file='filtered_feature_bc_matrix.h5',
)

cell2location.models.Cell2location.setup_anndata(
    adata=adata_vis,
    batch_key="batch")

cell2location's People

Contributors

adamgayoso avatar alexanderaivazidis avatar bio-ruxandra-tesloianu avatar emdann avatar jacobhepkema avatar justjhong avatar menelson avatar pre-commit-ci[bot] avatar prete avatar vitkl avatar yozhikoff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

cell2location's Issues

Error in run_regression

Hi, I tried following your first vignette (edit: using my own data) and ran into an error at the run_regression step. I'm getting this error:
AttributeError: 'numpy.ndarray' object has no attribute 'toarray'.

I noticed that if I don't run a previous step (adata_snrna_raw.raw = adata_snrna_raw) then it runs fine but then my gene filtering doesn't go through. It seems like the matrix is already dense, so it errors on .toarray?

Similarly, on these previous steps gene filtering steps, if I remove .toarray it works, but otherwise gives me the same error:

adata_snrna_raw.var['n_cells'] = (adata_snrna_raw.X.toarray() > 0).sum(0)
adata_snrna_raw.var['nonz_mean'] = adata_snrna_raw.X.toarray().sum(0) / adata_snrna_raw.var['n_cells']

Please advise. Thanks!

Estimating reference expression signatures from very large dataset

Hi,

I am trying to estimate reference expression signatures from this very large dataset from the Linnarson lab:

http://mousebrain.org/downloads.html

I am loading there l5_all.loom file as an AnnData object:

loom = sc.read_loom('../data/Zeissel/l5_all.loom')

Then I am selecting some relevant brain regions and go through your recommended preprocessing steps:

loom.obs['sample_name_col'] = loom.obs_names
loom.obs['CombinedClusterName'] = [loom.obs['Class'].iloc[i] + '' + str(loom.obs['Clusters'].iloc[i]) + '' + loom.obs['ClusterName'].iloc[i] for i in range(len(loom.obs['ClusterName']))]
loom.var['SYMBOL'] = loom.var.index
irrelevant_regions = np.array(('Dorsal root ganglion', 'Dorsal root ganglion,Sympathetic ganglion', 'Enteric nervous system',
'Medulla', 'Spinal cord', 'Sympathetic ganglion'))
subset_regions =[r not in irrelevant_regions for r in loom.obs['Region']]
loom = loom[subset_regions,:]
print(np.shape(loom))
import matplotlib as mpl

remove cells and genes with 0 counts everywhere

sc.pp.filter_cells(loom, min_genes=1)
sc.pp.filter_genes(loom, min_cells=1)

calculate the mean of each gene across non-zero cells

loom.var['n_cells'] = (loom.X.toarray() > 0).sum(0)
loom.var['nonz_mean'] = loom.X.toarray().sum(0) / loom.var['n_cells']

plt.hist2d(np.log10(loom.var['nonz_mean']),
np.log10(loom.var['n_cells']), bins=100,
norm=mpl.colors.LogNorm(),
range=[[0,0.5], [1,4.5]]);

nonz_mean_cutoff = np.log10(1.12) # cut off for expression in non-zero cells
cell_count_cutoff = np.log10(loom.shape[0] * 0.0005) # cut off percentage for cells with higher expression
cell_count_cutoff2 = np.log10(loom.shape[0] * 0.03)# cut off percentage for cells with small expression

plt.vlines(nonz_mean_cutoff, cell_count_cutoff, cell_count_cutoff2, color = 'orange');
plt.hlines(cell_count_cutoff, nonz_mean_cutoff, 1, color = 'orange');
plt.hlines(cell_count_cutoff2, 0, nonz_mean_cutoff, color = 'orange');
plt.xlabel('Mean count in cells with mRNA count > 0 (log10)');
plt.ylabel('Count of cells with mRNA count > 0 (log10)');
loom[:,(np.array(np.log10(loom.var['nonz_mean']) > nonz_mean_cutoff)
| np.array(np.log10(loom.var['n_cells']) > cell_count_cutoff2))
& np.array(np.log10(loom.var['n_cells']) > cell_count_cutoff)].shape

select genes based on mean expression in non-zero cells

loom = loom[:,(np.array(np.log10(loom.var['nonz_mean']) > nonz_mean_cutoff)
| np.array(np.log10(loom.var['n_cells']) > cell_count_cutoff2))
& np.array(np.log10(loom.var['n_cells']) > cell_count_cutoff)
& np.array(~loom.var['SYMBOL'].isna())]

The size of the final dataset is (142160, 11812).
Now when I run your regression it just gets stuck:

r, adata_snrna_raw = run_regression(loom, # input data object]

               verbose=True, return_all=True,

               train_args={
                'covariate_col_names': ['CombinedClusterName'], # column listing cell type annotation
                'sample_name_col': 'sample_name_col', # column listing sample ID for each cell
                # column listing technology, e.g. 3' vs 5',
                # when integrating multiple single cell technologies corresponding
                # model is automatically selected
                'tech_name_col': None,

                'stratify_cv': 'CombinedClusterName', # stratify cross-validation by cell type annotation

                'n_epochs': 100, 'minibatch_size': 1024, 'learning_rate': 0.01,

                'use_cuda': True, # use GPU?

                'train_proportion': 0.9, # proportion of cells in the training set (for cross-validation)
                'l2_weight': True,  # uses defaults for the model

                'readable_var_name_col': 'SYMBOL', 'use_raw': False},

               model_kwargs={}, # keep defaults
               posterior_args={}, # keep defaults

               export_args={'path': results_folder + 'regression_model/', # where to save results
                            'save_model': False, # save pytorch model?
                            'run_name_suffix': ''})

reg_mod = r['mod']

In particular it just shows:

Observation names are not unique. To make them unique, call .obs_names_make_unique.
Variable names are not unique. To make them unique, call .var_names_make_unique.

Creating model ### - time 0.11 min

And nothing else. If I subset the loom object to 1000genes x 1000cells everything runs fine and I see the optimization progress bar straight away. I have increased the RAM to 500GB, but still it just gets stuck and does not show the optimization progress. This is now still the same after 12+ hours after starting. I am using the GPU so that should not be the problem.

So I wonder if you have any suggestions for this. Otherwise, splitting the data up into smaller chunks would be an option.

Thanks!

Alexander

pin_memory option should be explicit in cell2location

Dear developers,
I was trying to run the following code:

import scvi
scvi.data.setup_anndata(adata=A1_fil, batch_key="in_tissue")
scvi.data.view_anndata_setup(A1_fil)

modA1fil = cell2location.models.Cell2location(
  A1_fil, cell_state_df=inf_aver_filraw,
  N_cells_per_location=8,
    detection_alpha=200)
modA1fil.train(max_epochs=40000,
          batch_size=None,
          train_size=1,
          use_gpu=True)`

After the print of scvi.data.view_anndata_setup(), Cell2Location recognize the GPU:

GPU available: True, used: True
TPU available: False, using: 0 TPU cores

but it does stop just afterwards with the following error:
RuntimeError: cannot pin 'torch.cuda.FloatTensor' only dense CPU tensors can be pinned.
I saw that it should be easily overcome if I can put pin_memory=False (see here) in cell2location model, how can I do that?

Best,
Carlo

'Variable type field must be a TensorType.' error when running cell2location

Hi,

I've followed the tutorial, however I'm stack during the last part of 2/3. When I run the code above, I get an error.

os.environ["THEANO_FLAGS"] = 'device=cpu,floatX=float32,openmp=True,force_device=True'
r = cell2location.run_cell2location(

      # Single cell reference signatures as pd.DataFrame
      # (could also be data as anndata object for estimating signatures
      #  as cluster average expression - `sc_data=adata_snrna_raw`)
      sc_data=inf_aver,
      # Spatial data as anndata object
      sp_data=adata_vis,

      # the column in sc_data.obs that gives cluster idenitity of each cell
      summ_sc_data_args={'cluster_col': "TaxonomyRank4",
                         # select marker genes of cell types by specificity of their expression signatures
                         'selection': "cluster_specificity",
                         # specificity cutoff (1 = max, 0 = min)
                         'selection_specificity': 0.5
                        },

      train_args={'use_raw': True, # By default uses raw slots in both of the input datasets.
                  'n_iter': 40000, # Increase the number of iterations if needed (see QC below)

                  # Whe analysing the data that contains multiple experiments,
                  # cell2location automatically enters the mode which pools information across experiments
                  'sample_name_col': 'sample'}, # Column in sp_data.obs with experiment ID (see above)


      export_args={'path': data_dir + 'cell2location_model/'
                  }

)

And the error what I get is:

### Summarising single cell clusters ###
### Creating model ### - time 0.0 min
Traceback (most recent call last):
  File "run_cell2location.py", line 191, in <module>
    export_args={'path': data_dir + 'cell2location_model/'
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location/run_c2l.py", line 345, in run_cell2location
    **model_kwargs)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location/models/LocationModelLinearDependentWMultiExperiment.py", line 278, in __init__
    total_size=self.X_data.shape)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pymc3/distributions/distribution.py", line 83, in __new__
    return model.Var(name, dist, data, total_size, dims=dims)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pymc3/model.py", line 1117, in Var
    model=self,
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pymc3/model.py", line 1737, in __init__
    data = as_tensor(data, name, model, distribution)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/pymc3/model.py", line 1691, in as_tensor
    data = tt.as_tensor_variable(data, name=name)
  File "/home/tsztank/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/tensor/basic.py", line 158, in as_tensor_variable
    "Variable type field must be a TensorType.", x, x.type)
theano.tensor.var.AsTensorError: ('Variable type field must be a TensorType.', SparseVariable{csr,int16}, Sparse[int16, csr])

Normalization differences between spatial and scRNAseq dataset and model adjustment using only one spatial dataset

Dear Author,

First, thank you for this great tool and this new API
I'm very interested in applying the cell2location method to my analysis.
I already have an integrated seurat object (15 integrated samples) that I have generated through seurat standard workflow (i.e normalization method = log-normalization) that will be used as the reference (To note, this reference dataset have cell types annotation).
In this object, I have 2 assays : RNA and integrated (I don't have SCT assay).

I also have one spatial dataset ( a seurat object) as query data.
In this spatial object, I have used sctransform normalization. Thus, it has 2 assays : Spatial and SCT.

Now, I want to link this 2 datasets : spatial dataset and scRNA-seq dataset.
Here, spatial object has SCT slot while scRNA-seq object has not (as both being normalized using 2 different methods).

Just a few beginner's question.
In the part of Model-based estimation of reference expression signatures of cell types (1/3).
1/ Reference data must have SCT assay? According to what I have understood, the model use raw counts (thus not having SCT slot in the reference should not be a problem).

In Spatially mapping cell types(2/3), you have selected 2 Visium sections to speed up the analysis.
2/ I have only one Visium, Is this a problem?
3/ The overall question is : Is cell2loction adapted to my situation?

I'm a beginner in Scanpy.
4/ If yes for question 3. how to pass Seurat object to Anndata.

Thanks in advance
Kind regards,
Chuang

error in run_regression with default model in non-minibatch mode

Hi @yozhikoff ,

could you please look into this when you have time?

`### Creating model ### - time 0.01 min

Analysis name: RegressionGeneBackgroundCoverageTorch_122covariates_6948cells_13565genes

Training model to determine n_epochs with CV

0%
0/10000 [00:00<?, ?it/s]


RuntimeError Traceback (most recent call last)
in
30 export_args={'path': results_folder + 'regression_model/', # where to save results
31 'save_model': True, # save pytorch model?
---> 32 'run_name_suffix': ''})
33
34 reg_mod = r['mod']

/nfs/team205/vk7/sanger_projects/BayraktarLab/cell2location/cell2location/run_regression.py in run_regression(sc_data, model_name, verbose, return_all, train_args, model_kwargs, posterior_args, export_args)
196 print('### Training model to determine n_epochs with CV ###')
197 if train_args['mode'] == 'normal':
--> 198 mod.fit_advi_iterative(**fit_kwards)
199 elif train_args['mode'] == 'tracking':
200 mod.fit_advi_iterative(tracking=True, **fit_kwards)

/nfs/team205/vk7/sanger_projects/BayraktarLab/cell2location/cell2location/models/torch_model.py in fit_advi_iterative(self, n, n_type, n_iter, learning_rate, num_workers, train_proportion, l2_weight)
246 optim.zero_grad()
247 y_pred = self.model.forward(**extra_data_train)
--> 248 loss = self.loss(y_pred, x_data, l2_weight=l2_weight)
249 loss.backward()
250 optim.step()

/nfs/team205/vk7/sanger_projects/BayraktarLab/cell2location/cell2location/models/RegressionGeneBackgroundCoverageTorch.py in loss(self, param, data, l2_weight)
260 l2_reg = l2_reg + l2_weight['l2_weight'] * i.pow(2).sum()
261
--> 262 return -self.nb_log_prob(param, data).sum() + l2_reg
263
264 # =====================Other functions======================= #

/nfs/team205/vk7/sanger_projects/BayraktarLab/cell2location/cell2location/models/torch_model.py in nb_log_prob(param, data, eps)
134 + torch.lgamma(data + theta)
135 - torch.lgamma(theta)
--> 136 - torch.lgamma(data + 1)
137 )
138

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!`

Install error

Hi guys,

I have tried to install your package many times (both method 1 or method 2). The installation process went successfully, but return an error during import.

Here is the error:

>>> import cell2location
/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/configdefaults.py:17: TheanoConfigWarning: Config key '/raid1/qinshishang/tmp_cache/' has no value, ignoring it
  from theano.configparser import (AddConfigVar, BoolParam, ConfigParam, EnumStr,
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location/__init__.py", line 2, in <module>
    from .run_c2l import run_cell2location
  File "/miniconda3/envs/cellpymc/lib/python3.7/site-packages/cell2location/run_c2l.py", line 9, in <module>
    import theano
  File "/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/__init__.py", line 110, in <module>
    from theano.compile import (
  File "/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/compile/__init__.py", line 28, in <module>
    from theano.compile.function import function, function_dump
  File "/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/compile/function/__init__.py", line 7, in <module>
    from theano.compile.function.pfunc import pfunc
  File "/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/compile/function/pfunc.py", line 10, in <module>
    from theano.compile.function.types import UnusedInputError, orig_function
  File "/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/compile/function/types.py", line 23, in <module>
    from theano.gof.toolbox import is_same_graph
ImportError: cannot import name 'is_same_graph' from 'theano.gof.toolbox' (/miniconda3/envs/cellpymc/lib/python3.7/site-packages/theano/gof/toolbox.py)

So, I wonder how to deal with that? could anyone give me some suggestions?

PYMC3 has been renamed to PyMC

Hi,
PyMC3 has been renamed PyMC. If this affects you and you have questions, or you want someone to direct your rage at I'm available! Do let me know how i, or any of the PyMC devs can help.

Ravin

Issue installing the latest cell2location

Hi guys,

I've been trying to install the very latest cell2location out-of-the-box with your recommended installation procedure:

conda create -y -n cell2loc_env python=3.9

conda activate cell2loc_env
pip install git+https://github.com/BayraktarLab/cell2location.git#egg=cell2location[tutorials]

I'm doing this in an entirely fresh and isolated conda environment, with the recommended export PYTHONNOUSERSITE="someletters" in my bashrc so that user site-dependence doesn't hamper the installation. However, when I try to import cell2location (either as a python script for in an ipython session) I get the following:

In [1]: from cell2location import run_regression
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Global seed set to 0
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-cdd635c2fe96> in <module>
----> 1 from cell2location import run_regression

/nfs/research/gerstung/nelson/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/__init__.py in <module>
      3 from torch.distributions import biject_to, transform_to
      4
----> 5 from .run_c2l import run_cell2location
      6 from .run_colocation import run_colocation
      7 from .run_regression import run_regression

/nfs/research/gerstung/nelson/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/run_c2l.py in <module>
     16 import theano
     17
---> 18 import cell2location.models as models
     19 import cell2location.plt as c2lpl
     20 from cell2location.cluster_averages import compute_cluster_averages, select_features

/nfs/research/gerstung/nelson/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/models/__init__.py in <module>
----> 1 from ._cell2location_model import Cell2location
      2 from ._cell2location_module import (
      3     LocationModelLinearDependentWMultiExperimentLocationBackgroundNormLevelGeneAlphaPyroModel,
      4 )
      5 from .downstream import CoLocatedGroupsSklearnNMF

/nfs/research/gerstung/nelson/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/models/_cell2location_model.py in <module>
     17     LocationModelLinearDependentWMultiExperimentLocationBackgroundNormLevelGeneAlphaPyroModel,
     18 )
---> 19 from cell2location.models.base._pyro_base_loc_module import Cell2locationBaseModule
     20 from cell2location.models.base._pyro_mixin import PltExportMixin, QuantileMixin
     21 from cell2location.utils import select_slide

/nfs/research/gerstung/nelson/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/models/base/_pyro_base_loc_module.py in <module>
      2 from scvi.module.base import PyroBaseModuleClass
      3
----> 4 from ._pyro_mixin import AutoGuideMixinModule, init_to_value
      5
      6

/nfs/research/gerstung/nelson/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/models/base/_pyro_mixin.py in <module>
     16 from scvi.model._utils import parse_use_gpu_arg
     17
---> 18 from ...distributions.AutoNormalEncoder import AutoGuideList, AutoNormalEncoder
     19
     20

/nfs/research/gerstung/nelson/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/distributions/AutoNormalEncoder.py in <module>
     10 from pyro.infer.autoguide import AutoGuide
     11 from pyro.infer.autoguide import AutoGuideList as PyroAutoGuideList
---> 12 from pyro.infer.autoguide.guides import _deep_getattr, _deep_setattr
     13 from pyro.infer.autoguide.utils import helpful_support_errors
     14 from pyro.nn import PyroModule, PyroParam

ImportError: cannot import name '_deep_getattr' from 'pyro.infer.autoguide.guides' (/nfs/research/gerstung/nelson/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/pyro/infer/autoguide/guides.py)

Perhaps this is some issue related to the newer versions of Pyro? Are you guys able to reproduce this issue in a fresh conda environment?

error in training regression model [models require unnormalised integer counts as input]

Hi,

I am trying cell2location for my data.
followed the tutorial and got an error at estimation of reference signature.

here is execution code.
My computer has only cpu, I do not use gpu option.

# create and train the regression model
from cell2location.models import RegressionModel
mod = RegressionModel(scrna_raw)

# Use all data for training (validation not implemented yet, train_size=1)
mod.train(max_epochs=250, batch_size=2500, train_size=1, lr=0.002, use_gpu=False)

error msg is too long, so I paste final error msg.


ValueError: Expected parameter rate (Tensor of shape (5, 11163)) of distribution Gamma(concentration: torch.Size([5, 11163]), rate: torch.Size([5, 11163])) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
tensor([[ -990.3860,  -367.4243,  -503.9861,  ...,  -152.0115,   176.6569,
            83.2535],
        [ -779.4346,  -123.4887,  -820.1381,  ...,   220.1533,  -531.6564,
           587.0056],
        [ -779.4346,  -123.4887,  -820.1381,  ...,   220.1533,  -531.6564,
           587.0056],
        [ -741.7203,   160.4668,  -708.0273,  ...,  -152.0129,  -166.7664,
          -116.5899],
        [-1287.2499,  -676.4868, -1519.5457,  ...,  -169.2898,  -195.1531,
          -122.2516]])
                Trace Shapes:           
                 Param Sites:           
                Sample Sites:           
       per_cluster_mu_fg dist | 12 11163
                        value | 12 11163
      detection_mean_y_e dist |  5     1
                        value |  5     1
  s_g_gene_add_alpha_hyp dist |         
                        value |         
       s_g_gene_add_mean dist |  5     1
                        value |  5     1
s_g_gene_add_alpha_e_inv dist |  5     1
                        value |  5     1
            s_g_gene_add dist |  5 11163
                        value |  5 11163
         alpha_g_phi_hyp dist |         
                        value |         
         alpha_g_inverse dist |  1 11163
                        value |  1 11163

I know that the error message is because the parameter variable is negative.
Could you please tell me how can I fix this?
Looking forward to your reply.

Regards,

inconsistent tensor dimensions when giving exact cell numbers

Dear developers,

I had an issue when I want to give to C2L the exact numbers of cells (calculated by segmentation per spot) per spot.
Indeed when I run:

modA1fil = cell2location.models.Cell2location(
    A1_fil, cell_state_df=inf_aver_filraw_A1,
    N_cells_per_location=numcellsA['N_cells'],
    detection_alpha=200
)

modA1fil.train(max_epochs=15000,
          batch_size=None,
          train_size=1,
          use_gpu=True)

with A1_fil is the spatial dataset anndata-formatted with scvi-tools, inf_aver_filraw_A1 is the dataframe coming out from the regression model, numcellsA['N_cells'] is a int-64 vector of the same length of A1_fil's total spots, containing the exact number of cells per each spot.

the error is the following:


GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/trace_messenger.py in __call__(self, *args, **kwargs)
    173             try:
--> 174                 ret = self.fn(*args, **kwargs)
    175             except (ValueError, RuntimeError) as e:

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/messenger.py in _context_wrap(context, fn, *args, **kwargs)
     11     with context:
---> 12         return fn(*args, **kwargs)
     13 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/messenger.py in _context_wrap(context, fn, *args, **kwargs)
     11     with context:
---> 12         return fn(*args, **kwargs)
     13 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/nn/module.py in __call__(self, *args, **kwargs)
    425         with self._pyro_context:
--> 426             return super().__call__(*args, **kwargs)
    427 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/cell2location/models/_cell2location_module.py in forward(self, x_data, idx, batch_index)
    279         shape = self.ones_1_n_groups * b_s_groups_per_location / self.n_groups_tensor
--> 280         rate = self.ones_1_n_groups / (n_s_cells_per_location / b_s_groups_per_location)
    281         with obs_plate:

RuntimeError: The size of tensor a (50) must match the size of tensor b (3274) at non-singleton dimension 1

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
/beegfs/scratch/tmp/ipykernel_70698/2057392656.py in <module>
     10 )
     11 
---> 12 modA1fil.train(max_epochs=15000,
     13           # train using full data (batch_size=None)
     14           batch_size=None,

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/cell2location/models/_cell2location_model.py in train(self, max_epochs, batch_size, train_size, lr, **kwargs)
    181         kwargs["lr"] = lr
    182 
--> 183         super().train(**kwargs)
    184 
    185     def export_posterior(

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/scvi/model/base/_pyromixin.py in train(self, max_epochs, use_gpu, train_size, validation_size, batch_size, early_stopping, lr, plan_kwargs, **trainer_kwargs)
    143             **trainer_kwargs,
    144         )
--> 145         return runner()
    146 
    147 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/scvi/train/_trainrunner.py in __call__(self)
     70             self.training_plan.n_obs_training = self.data_splitter.n_train
     71 
---> 72         self.trainer.fit(self.training_plan, self.data_splitter)
     73         self._update_history()
     74 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/scvi/train/_trainer.py in fit(self, *args, **kwargs)
    175                     message="`LightningModule.configure_optimizers` returned `None`",
    176                 )
--> 177             super().fit(*args, **kwargs)

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in fit(self, model, train_dataloader, val_dataloaders, datamodule)
    458         )
    459 
--> 460         self._run(model)
    461 
    462         assert self.state.stopped

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in _run(self, model)
    756 
    757         # dispatch `start_training` or `start_evaluating` or `start_predicting`
--> 758         self.dispatch()
    759 
    760         # plugin will finalized fitting (e.g. ddp_spawn will load trained model)

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in dispatch(self)
    797             self.accelerator.start_predicting(self)
    798         else:
--> 799             self.accelerator.start_training(self)
    800 
    801     def run_stage(self):

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py in start_training(self, trainer)
     94 
     95     def start_training(self, trainer: 'pl.Trainer') -> None:
---> 96         self.training_type_plugin.start_training(trainer)
     97 
     98     def start_evaluating(self, trainer: 'pl.Trainer') -> None:

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py in start_training(self, trainer)
    142     def start_training(self, trainer: 'pl.Trainer') -> None:
    143         # double dispatch to initiate the training loop
--> 144         self._results = trainer.run_stage()
    145 
    146     def start_evaluating(self, trainer: 'pl.Trainer') -> None:

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in run_stage(self)
    807         if self.predicting:
    808             return self.run_predict()
--> 809         return self.run_train()
    810 
    811     def _pre_training_routine(self):

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in run_train(self)
    855 
    856         # hook
--> 857         self.train_loop.on_train_start()
    858 
    859         try:

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/trainer/training_loop.py in on_train_start(self)
     99     def on_train_start(self):
    100         # hook
--> 101         self.trainer.call_hook("on_train_start")
    102 
    103     def on_train_end(self):

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py in call_hook(self, hook_name, *args, **kwargs)
   1226             if hasattr(self, hook_name):
   1227                 trainer_hook = getattr(self, hook_name)
-> 1228                 trainer_hook(*args, **kwargs)
   1229 
   1230             # next call hook in lightningModule

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pytorch_lightning/trainer/callback_hook.py in on_train_start(self)
    150         """Called when the train begins."""
    151         for callback in self.callbacks:
--> 152             callback.on_train_start(self, self.lightning_module)
    153 
    154     def on_train_end(self):

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/scvi/model/base/_pyromixin.py in on_train_start(self, trainer, pl_module)
     45             tens = {k: t.to(pl_module.device) for k, t in tensors.items()}
     46             args, kwargs = pl_module.module._get_fn_args_from_batch(tens)
---> 47             pyro_guide(*args, **kwargs)
     48             break
     49 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/nn/module.py in __call__(self, *args, **kwargs)
    424     def __call__(self, *args, **kwargs):
    425         with self._pyro_context:
--> 426             return super().__call__(*args, **kwargs)
    427 
    428     def __getattr__(self, name):

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/infer/autoguide/guides.py in forward(self, *args, **kwargs)
    529         # if we've never run the model before, do so now so we can inspect the model structure
    530         if self.prototype_trace is None:
--> 531             self._setup_prototype(*args, **kwargs)
    532 
    533         plates = self._create_plates(*args, **kwargs)

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/infer/autoguide/guides.py in _setup_prototype(self, *args, **kwargs)
    479 
    480     def _setup_prototype(self, *args, **kwargs):
--> 481         super()._setup_prototype(*args, **kwargs)
    482 
    483         self._event_dims = {}

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/infer/autoguide/guides.py in _setup_prototype(self, *args, **kwargs)
    169         # run the model so we can inspect its structure
    170         model = poutine.block(self.model, prototype_hide_fn)
--> 171         self.prototype_trace = poutine.block(poutine.trace(model).get_trace)(
    172             *args, **kwargs
    173         )

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/messenger.py in _context_wrap(context, fn, *args, **kwargs)
     10 def _context_wrap(context, fn, *args, **kwargs):
     11     with context:
---> 12         return fn(*args, **kwargs)
     13 
     14 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/trace_messenger.py in get_trace(self, *args, **kwargs)
    196         Calls this poutine and returns its trace instead of the function's return value.
    197         """
--> 198         self(*args, **kwargs)
    199         return self.msngr.get_trace()

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/trace_messenger.py in __call__(self, *args, **kwargs)
    178                 exc = exc_type(u"{}\n{}".format(exc_value, shapes))
    179                 exc = exc.with_traceback(traceback)
--> 180                 raise exc from e
    181             self.msngr.trace.add_node(
    182                 "_RETURN", name="_RETURN", type="return", value=ret

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/trace_messenger.py in __call__(self, *args, **kwargs)
    172             )
    173             try:
--> 174                 ret = self.fn(*args, **kwargs)
    175             except (ValueError, RuntimeError) as e:
    176                 exc_type, exc_value, traceback = sys.exc_info()

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/messenger.py in _context_wrap(context, fn, *args, **kwargs)
     10 def _context_wrap(context, fn, *args, **kwargs):
     11     with context:
---> 12         return fn(*args, **kwargs)
     13 
     14 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/poutine/messenger.py in _context_wrap(context, fn, *args, **kwargs)
     10 def _context_wrap(context, fn, *args, **kwargs):
     11     with context:
---> 12         return fn(*args, **kwargs)
     13 
     14 

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/pyro/nn/module.py in __call__(self, *args, **kwargs)
    424     def __call__(self, *args, **kwargs):
    425         with self._pyro_context:
--> 426             return super().__call__(*args, **kwargs)
    427 
    428     def __getattr__(self, name):

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []

/opt/common/tools/ric.tiget/anaconda3/envs/Ostuni_cell2loc2/lib/python3.9/site-packages/cell2location/models/_cell2location_module.py in forward(self, x_data, idx, batch_index)
    278         # cell group loadings
    279         shape = self.ones_1_n_groups * b_s_groups_per_location / self.n_groups_tensor
--> 280         rate = self.ones_1_n_groups / (n_s_cells_per_location / b_s_groups_per_location)
    281         with obs_plate:
    282             z_sr_groups_factors = pyro.sample(

RuntimeError: The size of tensor a (50) must match the size of tensor b (3274) at non-singleton dimension 1
               Trace Shapes:                    
                Param Sites:                    
               Sample Sites:                    
               m_g_mean dist           | 1     1
                       value           | 1     1
        m_g_alpha_e_inv dist           | 1     1
                       value           | 1     1
                    m_g dist           | 1 13301
                       value           | 1 13301
 n_s_cells_per_location dist 3274 3274 |        
                       value 3274 3274 |        
b_s_groups_per_location dist 3274    1 |        
                       value 3274    1 |   

This error does not appear if I specify N_cells_per_location=8 (or any other number).
How can I proceed? I noticed that providing the exact number of cells per each spot is what you called 'advanced mode', where you also recommend to add 0.1 as pseudocount (though here numcellsA['N_cells'] is not zero in any spot) and to modify vn in order to make the prior more informative. How can I additionally do that?

Thanks a lot,
Best,
Carlo

Error training model

Good day!

I am trying to run cell2location using the integrated pipeline with scvi. When I reached the step to train the model on the ST data (cell2location.models.Cell2location), I get the error below:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/scratch/11488722/ipykernel_67679/1717331508.py in <module>
      1 # create and train the model
----> 2 mod = cell2location.models.Cell2location(
      3     adata_vis, cell_state_df=inf_aver,
      4     # the expected average cell abundance: tissue-dependent
      5     # hyper-prior which can be estimated from paired histology:

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/cell2location/models/_cell2location_model.py in __init__(self, adata, cell_state_df, model_class, detection_mean_per_sample, detection_mean_correction, **model_kwargs)
    115         )
    116         self._model_summary_string = f'cell2location model with the following params: \nn_factors: {self.n_factors_} \nn_batch: {self.summary_stats["n_batch"]} '
--> 117         self.init_params_ = self._get_init_params(deepcopy(locals()))
    118 
    119     def train(

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    170                     y = x
    171                 else:
--> 172                     y = _reconstruct(x, memo, *rv)
    173 
    174     # If is its own copy, don't memoize.

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    268     if state is not None:
    269         if deep:
--> 270             state = deepcopy(state, memo)
    271         if hasattr(y, '__setstate__'):
    272             y.__setstate__(state)

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    170                     y = x
    171                 else:
--> 172                     y = _reconstruct(x, memo, *rv)
    173 
    174     # If is its own copy, don't memoize.

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    268     if state is not None:
    269         if deep:
--> 270             state = deepcopy(state, memo)
    271         if hasattr(y, '__setstate__'):
    272             y.__setstate__(state)

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    170                     y = x
    171                 else:
--> 172                     y = _reconstruct(x, memo, *rv)
    173 
    174     # If is its own copy, don't memoize.

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in _reconstruct(x, memo, func, args, state, listiter, dictiter, deepcopy)
    268     if state is not None:
    269         if deep:
--> 270             state = deepcopy(state, memo)
    271         if hasattr(y, '__setstate__'):
    272             y.__setstate__(state)

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    144     copier = _deepcopy_dispatch.get(cls)
    145     if copier is not None:
--> 146         y = copier(x, memo)
    147     else:
    148         if issubclass(cls, type):

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in _deepcopy_dict(x, memo, deepcopy)
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/copy.py in deepcopy(x, memo, _nil)
    151             copier = getattr(x, "__deepcopy__", None)
    152             if copier is not None:
--> 153                 y = copier(memo)
    154             else:
    155                 reductor = dispatch_table.get(cls)

/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/cell2loc_env/lib/python3.9/site-packages/anndata/_core/views.py in __deepcopy__(self, memo)
     55     # TODO: This makes `deepcopy(obj)` return `obj._view_args.parent._adata_ref`, fix it
     56     def __deepcopy__(self, memo):
---> 57         parent, attrname, keys = self._view_args
     58         return deepcopy(getattr(parent._adata_ref, attrname))
     59 

TypeError: cannot unpack non-iterable NoneType object

What do you think the problem might be?

Thanks in advance!

Human lymphoid organ single cell dataset integration (used in tutorial notebook)

Hello, thank you for creating this great tool!

I'd like to add a dataset to your scRNA-seq reference for lymph node but want to make sure my integration process is consistent with yours. Could you post the code used to integrate the three published scRNA-seq datasets?

Also, I noticed the Park (2020) study identified fibroblasts, but I do not see them in your integrated single cell reference data - were these excluded?

Issue setting up GPU

Good day!

Thanks for developing this great tool. I have been trying to set up my environment to run using GPU to speed up the computation but have not succeeded. I have two types of errors when using different graphic cards.

Initially, with a TeslaV100, I Had the next error:

Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
b'/scratch/4102320/try_flags_92ylr6tr.c:4:19: fatal error: cudnn.h: No such file or directory\n #include <cudnn.h>\n                   ^\ncompilation terminated.\n'
Mapped name None to device cuda0: Tesla V100-SXM2-16GB (0000:04:00.0)

I thought It was because the conda env lacked the cudnn package. I installed it and the error is now different:

Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
b'/scratch/4102320/try_flags_92ylr6tr.c:4:19: fatal error: cudnn.h: No such file or directory\n #include <cudnn.h>\n                   ^\ncompilation terminated.\n'
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/python_pHGG_project/lib/python3.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/python_pHGG_project/lib/python3.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/hpc/pmc_stunnenberg/cruiz/miniconda3/envs/python_pHGG_project/lib/python3.7/site-packages/theano/gpuarray/__init__.py", line 159, in init_dev
    pygpu.blas.gemm(0, tmp, tmp, 0, tmp, overwrite_c=True)
  File "pygpu/blas.pyx", line 149, in pygpu.blas.gemm
  File "pygpu/blas.pyx", line 47, in pygpu.blas.pygpu_blas_rgemm
pygpu.gpuarray.GpuArrayException: (b'cublasCreate: (cublas) Library not initialized. (Possibly because the driver version is too old for the cuda version)', 11)

However, if I used other GPU, (GeForce RTX2080i, Quadro RTX 6000), the error is the same than the first for the Tela GPU

Can not use cuDNN on context None: cannot compile with cuDNN. We got this error:
b'/scratch/4103662/try_flags_q9s6i39u.c:4:19: fatal error: cudnn.h: No such file or directory\n #include <cudnn.h>\n                   ^\ncompilation terminated.\n'
Mapped name None to device cuda: Quadro RTX 6000 (0000:86:00.0)

Info of my packages/modules:

sys 3.7.10 | packaged by conda-forge | (default, Feb 19 2021, 16:07:37) 
[GCC 9.3.0]
ipykernel 5.5.3
ipykernel._version 5.5.3
json 2.0.9
re 2.2.1
IPython 7.22.0
IPython.core.release 7.22.0
logging 0.5.1.2
zlib 1.0
traitlets 5.0.5
traitlets._version 5.0.5
argparse 1.1
ipython_genutils 0.2.0
ipython_genutils._version 0.2.0
platform 1.0.8
IPython.core.crashhandler 7.22.0
pygments 2.8.1
pexpect 4.8.0
ptyprocess 0.7.0
decorator 5.0.5
pickleshare 0.7.5
backcall 0.2.0
sqlite3 2.6.0
sqlite3.dbapi2 2.6.0
_sqlite3 2.6.0
prompt_toolkit 3.0.18
wcwidth 0.2.5
jedi 0.18.0
parso 0.8.2
colorama 0.4.4
ctypes 1.1.0
_ctypes 1.1.0
IPython.core.magics.code 7.22.0
urllib.request 3.7
jupyter_client 6.1.12
jupyter_client._version 6.1.12
zmq 22.0.3
zmq.backend.cython 40304
zmq.backend.cython.constants 40304
zmq.sugar 22.0.3
zmq.sugar.constants 40304
zmq.sugar.version 22.0.3
jupyter_core 4.7.1
jupyter_core.version 4.7.1
tornado 6.1
_curses b'2.2'
dateutil 2.8.1
dateutil._version 2.8.1
six 1.15.0
decimal 1.70
_decimal 1.70
distutils 3.7.10
scanpy 1.7.1
scanpy._metadata 1.7.1
packaging 20.9
packaging.__about__ 20.9
pkg_resources._vendor.six 1.10.0
pkg_resources.extern.six 1.10.0
pkg_resources._vendor.appdirs 1.4.3
pkg_resources.extern.appdirs 1.4.3
pkg_resources._vendor.packaging 20.4
pkg_resources._vendor.packaging.__about__ 20.4
pkg_resources.extern.packaging 20.4
pkg_resources._vendor.pyparsing 2.2.1
pkg_resources.extern.pyparsing 2.2.1
csv 1.0
_csv 1.0
numpy 1.20.2
numpy.version 1.20.2
numpy.core 1.20.2
numpy.core._multiarray_umath 3.1
numpy.lib 1.20.2
numpy.linalg._umath_linalg 0.1.5
scipy 1.6.2
scipy.version 1.6.2
anndata 0.7.5
anndata._metadata 0.7.5
h5py 3.1.0
h5py.version 3.1.0
cached_property 1.5.2
natsort 7.1.1
pandas 1.2.3
pytz 2021.1
pandas.compat.numpy.function 1.20.2
zarr 2.7.0
numcodecs 0.7.3
numcodecs.version 0.7.3
numcodecs.blosc 1.18.1
numcodecs.zstd 1.4.4
numcodecs.lz4 1.9.2
zarr.version 2.7.0
dask 2021.04.0
yaml 5.4.1
tlz 0.11.1
toolz 0.11.1
psutil 5.8.0
cloudpickle 1.6.0
fsspec 0.9.0
scipy._lib._uarray 0.5.1+49.g4c3f1d7.scipy
sinfo 0.3.1
stdlib_list v0.7.0
numba 0.53.1
llvmlite 0.36.0
numba.misc.appdirs 1.4.1
sklearn 0.24.1
sklearn.base 0.24.1
joblib 1.0.1
joblib.externals.loky 2.9.0
joblib.externals.cloudpickle 1.6.0
scipy._lib.decorator 4.0.5
scipy.linalg._fblas b'$Revision: $'
scipy.linalg._flapack b'$Revision: $'
scipy.linalg._flinalg b'$Revision: $'
scipy.special.specfun b'$Revision: $'
scipy.ndimage 2.0
scipy.optimize.minpack2 b'$Revision: $'
scipy.sparse.linalg.isolve._iterative b'$Revision: $'
scipy.sparse.linalg.eigen.arpack._arpack b'$Revision: $'
scipy.optimize._lbfgsb b'$Revision: $'
scipy.optimize._cobyla b'$Revision: $'
scipy.optimize._slsqp b'$Revision: $'
scipy.optimize._minpack  1.10 
scipy.optimize.__nnls b'$Revision: $'
scipy.linalg._interpolative b'$Revision: $'
scipy.integrate._odepack  1.9 
scipy.integrate._quadpack  1.13 
scipy.integrate._ode $Id$
scipy.integrate.vode b'$Revision: $'
scipy.integrate._dop b'$Revision: $'
scipy.integrate.lsoda b'$Revision: $'
scipy.interpolate._fitpack  1.7 
scipy.interpolate.dfitpack b'$Revision: $'
scipy.stats.statlib b'$Revision: $'
scipy.stats.mvn b'$Revision: $'
sklearn.utils._joblib 1.0.1
leidenalg 0.8.3
igraph 0.9.1
texttable 1.6.3
igraph.version 0.9.1
louvain 0.7.0
matplotlib 3.4.1
PIL 8.1.2
PIL._version 8.1.2
PIL.Image 8.1.2
xml.etree.ElementTree 1.3.0
cffi 1.14.5
pyparsing 2.4.7
cycler 0.10.0
kiwisolver 1.3.1
tables 3.6.1
numexpr 2.7.3
numexpr.version 2.7.3
legacy_api_wrap 0.0.0
get_version 2.1
umap 0.5.1
_cffi_backend 1.14.5
pycparser 2.20
pycparser.ply 3.9
pycparser.ply.yacc 3.10
pycparser.ply.lex 3.10
pynndescent 0.5.2
theano 1.0.5
theano.version 1.0.5
mkl 2.3.0
scipy.signal.spline 0.2
pygpu 0.7.6
mako 1.1.4
markupsafe 1.1.1
plotnine 0.7.0
patsy 0.5.1
patsy.version 0.5.1
mizani 0.7.3
palettable 3.3.0
mizani.external.husl 4.0.3
statsmodels 0.12.2
statsmodels.api 0.12.2
statsmodels.__init__ 0.12.2
statsmodels.tools.web 0.12.2
pymc3 3.9.3
xarray 0.17.0
netCDF4 1.5.6
netCDF4._netCDF4 1.5.6
cftime 1.4.1
cftime._cftime 1.4.1
arviz 0.10.0
arviz.data.base 0.10.0
fastprogress 0.2.7
tqdm 4.59.0
tqdm.cli 4.59.0
tqdm.version 4.59.0
tqdm._dist_ver 4.59.0
ipywidgets 7.6.3
ipywidgets._version 7.6.3
torch 1.8.1+cu102
torch.version 1.8.1+cu102
tarfile 0.9.0
torch.cuda.nccl 2708
torch.backends.cudnn 7605
seaborn 0.11.1
seaborn.external.husl 2.1.0

Do you know what the issue might be?

Thanks in advance for your help!

run_regression out of memory for only 19k cells

I tried running the run_regression model on a reference dataset of 19k cells. However, I got a memory error.

RuntimeError: CUDA out of memory. Tried to allocate 2.48 GiB (GPU 0; 31.75 GiB total capacity; 23.38 GiB already allocated; 2.46 GiB free; 27.29 GiB reserved in total by PyTorch)

plot_spatial does not work when one of the cell types has exactly 0 abundance

Screenshot 2020-12-30 at 12 44 47

@yozhikoff please have a look at this

Changing the condition to if min_value > max_value: results in no errors but an empty plot for all cell types.

I think this can be resolved if we change how max_color_quantile threshold is applied for this special case of all 0.

You can reproduce this by taking demo notebook results are setting all values for one cell type to 0 (slide.obs[sel_clust_col[0]] = 0).

ValueError: shapes not aligned

Dear Vitalii,

Following private communication, I post this issue as a reminder of the following bug.

After training the regression model, I am getting the following error

ValueError: shapes (1000,20) and (21,12931) not aligned: 20 (dim 1) != 21 (dim 0)

when trying to plot the QC.

As you mentioned, the problem is that the QC is plotted for a random subset of cells and if not all cell types are represented (20/21) this error occurs.

The way to address this on user side is to increase the number of cells in the subset with the parameter use_n_obs (for example, if mod is the regression model, using mod.plot_QC(use_n_obs=5000)).

Thank you so much for your help.

All the best,

Daniele

Question: Extract cell type contribution per spot

Good day,

I would like to extract and export (csv) the deconvoluted cell proportions of each spot to visualize it in a scatter pie chart.

I was planning on using the q05_cell_abundance_w_sf matrix. However, I am not sure how I can transform the values of each cell abundance into a proportion, since each spot will have a different enrichment value per predicted cell type.

Could you guide me on how I can do that, please?

Thanks in advance!

ERROR: Could not find a version that satisfies the requirement pygpu (from versions: none)

I followed the instruction ton install cell2location but at the pip install git+https://github.com/BayraktarLab/cell2location.git step I got this error :

Collecting git+https://github.com/BayraktarLab/cell2location.git
  Cloning https://github.com/BayraktarLab/cell2location.git to /tmp/pip-req-build-_9qxj1su
Collecting pymc3
  Using cached pymc3-3.11.2-py3-none-any.whl (869 kB)
Collecting torch
  Using cached torch-1.9.0-cp37-cp37m-manylinux1_x86_64.whl (831.4 MB)
Processing /home/nicolas/.cache/pip/wheels/26/68/6f/745330367ce7822fe0cd863712858151f5723a0a5e322cc144/Theano-1.0.5-py3-none-any.whl
ERROR: Could not find a version that satisfies the requirement pygpu (from cell2location==0.5) (from versions: none)
ERROR: No matching distribution found for pygpu (from cell2location==0.5)

I'm using python 3.7.3 through anaconda.

Looking in the cellpymv environment, pygpu is well installed though...

Any idea how to solve this ?

Thanks

Google colab tutorial gives error on 'import cell2location'

The colab tutorial given on the cell2location documentation page gives an error when trying to import cell2location at cell 3, line 15. (DistributionNotFound: The 'pynndescent' distribution was not found and is required by the application). Is there anyway to resolve this ?

Bug related to top_n in run_colocation

Hi guys,

Thanks for this nice package and very helpful tutorial. In run_colocation.py you save the top 10 ranking cell types for each factor to a dataframe using print_gene_loadings on L245:

https://github.com/BayraktarLab/cell2location/blob/master/cell2location/run_colocation.py#L245

This leads to an error when returning the dataframe whenever the user has fewer than 10 cell types because I see that the top_n variable is set to 10 by default in the source of print_gene_loadings:

https://cell2location.readthedocs.io/en/latest/_modules/cell2location/models/base_model.html#BaseModel.print_gene_loadings

Can I suggest that you add a parameter for top_n when run_colocation is called, so that top_n can only take integer values up to the total number of annotations being mapped with cell2location, and then call this top_n value at L245 in run_colocation.py.

Cheers!

Problem importing cell2location after setting THEANO_FLAGS

Hi, I'm trying to run cell2location from the singularity image (cell2location-v0.05-alpha.sif) but I'm getting the following error:

ImportError: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

This is what I'm trying to do:

import os
os.environ["THEANO_FLAGS"] = 'device=cuda0,floatX=float32,force_device=True'
import cell2location

If I change the above to:

import os
os.environ["THEANO_FLAGS"] = 'device=cpu,floatX=float32,openmp=True,force_device=True'
import cell2location

cell2location is successfully imported but I would like to take advantage of GPU acceleration.

Thanks in advance!

cell2location not running

Hi,

I am using cell2location now and when I approached Step 4 (Estimating expression signatures Training the model step), the program is not responding at all sometimes, or frozen at 4% or whatever percentage. I only made one change of the commands as I set 'use_cuda' to False because we don't have GPU equipped.

Hope get some help here.

Thanks!
Sen

Support for multiplicative effects for arbitrary covariates in the spatial data

This feature would enable accounting for various covariates such as donor, age, potentially batch.
This should be added as a separate multiplicative parameter y_{t,g} with regularising prior centered at 1. E.i. exactly as technical/extra categorical covariates in the regression model.

\mu_{s,g} = (m_{g} \left (\sum_{f} {w_{s,f} \: g_{f,g}} \right) + s_{e,g}) y_{s} y_{t,g}

Does this parameter need to have a gene-specific prior?

Error from import cell2location

Hi, I was just wondering whether anyone could interpret this error or give me any tips how to fix it? I've installed cell2location in a conda environment as per the instructions to do so manually but am getting this error upon trying to import it:

In [1]: import sys
   ...: import scanpy as sc
   ...: import anndata
   ...: import pandas as pd
   ...: import numpy as np
   ...: import os
   ...:
   ...: data_type = 'float32'
   ...:
   ...: import cell2location
   ...:

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../../.local/lib/python3.7/site-packages/umap/layouts.py", line 52:
def rdist(x, y):
    <source elided>
    result = 0.0
    dim = x.shape[0]
    ^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=<built-in function getitem>)" at /home/camerongw/.local/lib/python3.7/site-packages/umap/layouts.py (52)

I'm using python-3.7.10.

error run_regression model

Was wondering if you have any insight into where this is coming from. The model run fine and this happened at the plotting stage


ValueError Traceback (most recent call last)
in
29
30 export_args={'path': results_folder + 'regression_model/', # where to save results
---> 31 'save_model': True, # save pytorch model?
32 })
33

~/ENTER/envs/cellpymc/lib/python3.7/site-packages/cell2location/run_regression.py in run_regression(sc_data, model_name, verbose, return_all, train_args, model_kwargs, posterior_args, export_args)
248 for tr in mod.validation_hist.values():
249 deriv = np.gradient(tr, 1)
--> 250 new_n_epochs.append(np.max(np.arange(deriv.shape[0])[(deriv < 0)]))
251 new_n_epochs = np.min(new_n_epochs) + 1
252

<array_function internals> in amax(*args, **kwargs)

~/ENTER/envs/cellpymc/lib/python3.7/site-packages/numpy/core/fromnumeric.py in amax(a, axis, out, keepdims, initial, where)
2732 """
2733 return _wrapreduction(a, np.maximum, 'max', axis, None, out,
-> 2734 keepdims=keepdims, initial=initial, where=where)
2735
2736

~/ENTER/envs/cellpymc/lib/python3.7/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
85 return reduction(axis=axis, out=out, **passkwargs)
86
---> 87 return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
88
89

ValueError: zero-size array to reduction operation maximum which has no identity

cuDNN version on AWS instances [probably theano does not use the GPU]

I'm unsure how the dependencies work in Theano but it is really inconvenient that cell2location cannot be simply installed on some broadly available hardware. Not sure if this is because you pinned down a particular version of Theano for the software, or because Theano is less maintained than pytorch or tensorflow (is it?). I hope that the new implementation in scvi-tools will help avoid those issues :)

I simply run some custom code that calls cell2location

Output of nvidia-smi:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.05    Driver Version: 450.51.05    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           On   | 00000000:00:1E.0 Off |                    0 |
| N/A   50C    P0    70W / 149W |    842MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Output of nvcc:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

Output of cell2location:

### Summarising single cell clusters ###
### Creating model ### - time 0.04 min
### Analysis name: LocationModelLinearDependentW_1experiments_5clusters_1600locations_2000genes
### Training model ###
/home/ubuntu/anaconda3/envs/scVI/lib/python3.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is morerecent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7.

Questions about the difference between pyro-based versus old versions of cell2location

Hi @vitkl,

Thanks for creating this wonderful package! I just noticed that there is an updated version of cell2location which was integrated with scvi tools. When I tried the latest tutorial on the example data you provided (Mapping human lymph node cell types to 10X Visium), for the model training part on the scRNAseq reference, it seems that it will take about 30 mins to train when I use GPU. Is this a normal time for this procedure?

Another question is since I can run it smoothly on the previous version of cell2location, will the cell type proportion estimates different between the newest version and the old version (that uses theano)?

Also is cell2location applicable to higher resolution spatial transcriptomic data which contains spatial locations more than 50k?

Thank you so much for your time!

Issue while saving regression result

Hi, I was running regression on my training data, and after it's done. the saving failed half way through..

### Evaluating parameters / sampling posterior ### - time 212.25 min
### Saving results ###
... storing 'sample_id' as categorical
... <And other storing stuff>

Traceback (most recent call last):
  File "/anaconda3/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/utils.py", line 188, in func_wrapper
    return func(elem, key, val, *args, **kwargs)
  File "/anaconda3/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py", line 254, in write_dataframe
    raise ValueError(f"{reserved!r} is a reserved name for dataframe columns.")
ValueError: '_index' is a reserved name for dataframe columns.
The above exception was the direct cause of the following exception:

The read the sc.h5ad file. It only has its raw matrix X. All the other meta (obs, var, uns) are empty. let along the model..
I used Python script to run it so all is lost..
Is there a way to somehow save the result without running into this issue?

ERROR (theano.gpuarray): Could not initialize pygpu, support disabled

Hi

Thanks for developing the cell2location package. When I run the tutorial and import cell2location, I met this error:

os.environ["THEANO_FLAGS"] = 'device=cuda0,floatX=' + data_type + ',force_device=True
import cell2location
/home/cici/.conda/envs/cellpymc/lib/python3.7/site-packages/theano/gpuarray/dnn.py:184: UserWarning: Your cuDNN version is more recent than Theano. If you encounter problems, try updating Theano or downgrading cuDNN to a version >= v5 and <= v7.
  warnings.warn("Your cuDNN version is more recent than "
Using cuDNN version 7600 on context None
ERROR (theano.gpuarray): Could not initialize pygpu, support disabled
Traceback (most recent call last):
  File "/home/cici/.conda/envs/cellpymc/lib/python3.7/site-packages/theano/gpuarray/__init__.py", line 227, in <module>
    use(config.device)
  File "/home/cici/.conda/envs/cellpymc/lib/python3.7/site-packages/theano/gpuarray/__init__.py", line 214, in use
    init_dev(device, preallocate=preallocate)
  File "/home/cici/.conda/envs/cellpymc/lib/python3.7/site-packages/theano/gpuarray/__init__.py", line 159, in init_dev
    pygpu.blas.gemm(0, tmp, tmp, 0, tmp, overwrite_c=True)
  File "pygpu/blas.pyx", line 149, in pygpu.blas.gemm
  File "pygpu/blas.pyx", line 47, in pygpu.blas.pygpu_blas_rgemm
pygpu.gpuarray.GpuArrayException: (b'cublasCreate: (cublas) Library not initialized. (Possibly because the driver version is too old for the cuda version)', 11)

And this is the output when I type 'nvidia-smi':

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:05:00.0 Off |                  N/A |
|  0%   32C    P8    16W / 280W |    291MiB / 11176MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro K2000        Off  | 00000000:06:00.0 Off |                  N/A |
| 30%   44C    P0    N/A /  N/A |      0MiB /  1999MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

It seems that the Driver is not too old for the cuba version. I also tried to reinstall theano, but it does not help. Could you please provide me some suggestions on how to fix the error?

Thanks!

Install error

The readme states to use pip install plotnine pymc3>=3.8,<3.10 torch pyro-ppl for the plotnine install but on my system

$ ps -p $$
    PID TTY          TIME CMD
3713966 pts/574  00:00:00 bash

pip install "plotnine pymc3>=3.8,<3.10" torch pyro-ppl is required. you might consider updating the readme.

Error saving results in run_regression

Hi,

Thanks very much for developing this nice tool.

I am following the following tutorial to estimate the cell type signatures from my own scRNAseq dataset. I am having an error while calling to the run_regression function. Everything looks fine, the epoch VS ELBO loss plots are generated as well as the ones for the UMI counts. However, I am having this error when saving the results:

### Saving results ###
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    187         try:
--> 188             return func(elem, key, val, *args, **kwargs)
    189         except Exception as e:

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_dataframe(f, key, df, dataset_kwargs)
    257     group.attrs["encoding-version"] = EncodingVersions.dataframe.value
--> 258     group.attrs["column-order"] = list(df.columns)
    259 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

~/.conda/envs/cellpymc/lib/python3.7/site-packages/h5py/_hl/attrs.py in __setitem__(self, name, value)
    102         """
--> 103         self.create(name, data=value)
    104 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/h5py/_hl/attrs.py in create(self, name, data, shape, dtype)
    196             try:
--> 197                 attr = h5a.create(self._id, self._e(tempname), htype, space)
    198             except:

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5a.pyx in h5py.h5a.create()

RuntimeError: Unable to create attribute (object header message is too large)

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-601-cf8487f6980c> in <module>
     30                    export_args={'path': results_folder + 'regression_model/', # where to save results
     31                                 'save_model': True, #save pytorch model?
---> 32                                 'run_name_suffix': ''})
     33 
     34 reg_mod = r['mod']

~/.conda/envs/cellpymc/lib/python3.7/site-packages/cell2location/run_regression.py in run_regression(sc_data, model_name, verbose, return_all, train_args, model_kwargs, posterior_args, export_args)
    325 
    326     # save anndata with exported posterior
--> 327     sc_data.write(filename=path + 'sc.h5ad', compression='gzip')
    328 
    329     # save model object and related annotations

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_core/anndata.py in write_h5ad(self, filename, compression, compression_opts, force_dense, as_dense)
   1848             compression_opts=compression_opts,
   1849             force_dense=force_dense,
-> 1850             as_dense=as_dense,
   1851         )
   1852 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_h5ad(filepath, adata, force_dense, as_dense, dataset_kwargs, **kwargs)
    115             )
    116         else:
--> 117             write_attribute(f, "raw", adata.raw, dataset_kwargs=dataset_kwargs)
    118         write_attribute(f, "obs", adata.obs, dataset_kwargs=dataset_kwargs)
    119         write_attribute(f, "var", adata.var, dataset_kwargs=dataset_kwargs)

~/.conda/envs/cellpymc/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    137     if key in f:
    138         del f[key]
--> 139     _write_method(type(value))(f, key, value, *args, **kwargs)
    140 
    141 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_raw(f, key, value, dataset_kwargs)
    146     group.attrs["shape"] = value.shape
    147     write_attribute(f, "raw/X", value.X, dataset_kwargs=dataset_kwargs)
--> 148     write_attribute(f, "raw/var", value.var, dataset_kwargs=dataset_kwargs)
    149     write_attribute(f, "raw/varm", value.varm, dataset_kwargs=dataset_kwargs)
    150 

~/.conda/envs/cellpymc/lib/python3.7/functools.py in wrapper(*args, **kw)
    838                             '1 positional argument')
    839 
--> 840         return dispatch(args[0].__class__)(*args, **kw)
    841 
    842     funcname = getattr(func, '__name__', 'singledispatch function')

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in write_attribute_h5ad(f, key, value, *args, **kwargs)
    137     if key in f:
    138         del f[key]
--> 139     _write_method(type(value))(f, key, value, *args, **kwargs)
    140 
    141 

~/.conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    193                 f"Above error raised while writing key {key!r} of {type(elem)}"
    194                 f" from {parent}."
--> 195             ) from e
    196 
    197     return func_wrapper

RuntimeError: Unable to create attribute (object header message is too large)

Above error raised while writing key 'raw/var' of <class 'h5py._hl.files.File'> from /.

So far, I have found something like this:

HDF5 has a header limit of 64kb for all metadata of the columns. This include name, types, etc. When you go about roughly 2000 columns, you will run out of space to store all the metadata. This is a fundamental limitation of pytables. I don't think they will make workarounds on their side any time soon. You will either have to split the table up or choose another storage format.

from: https://stackoverflow.com/questions/16639503/unable-to-save-dataframe-to-hdf5-object-header-message-is-too-large

Do you have any idea why the object header could become so long?

Best regards and thank you very much,
Alberto.

run_regression giving an error

I am getting this error when i am running: training the model

Traceback (most recent call last):
File "/scratch/ms/visium_08_10_21/brain_visium_data/Allen_brain/allen_convert.py", line 44, in
r, adata_snrna_raw = run_regression(adata_snrna_raw, # input data object]
File "/home/ms/.local/lib/python3.9/site-packages/cell2location/run_regression.py", line 144, in run_regression X_data = sc_data.raw.X.toarray()
AttributeError: 'numpy.ndarray' object has no attribute 'toarray'

Not sure if it is still the issue that this only works with sparse.matrix and not numpy array?

Update readme, tutorial and acknowledgements

Big changes to code:

Changes to defaults:

  • Changing our default detection_alpha because within-slide tech effects seem to be more prevalent than their absence (detection_alpha=20) and recommend to run both detection_alpha=20 and detection_alpha=200 in the tutorial.

Changes to docs

  • Template issue: bug (version, are you following scvi-tools tutorial)
  • Template issue: usage question (version, are you following scvi-tools tutorial, which technology was used to generate reference data / spatial data, how many cell types, how many batches, how many genes, how many spatial locations)
  • Update tutorial to match scvi-tools tutorial
  • Update tutorial according to this feedback #65 (comment)
  • Delete pymc3 tutorials from documentation website (reorder the rest to make it easier to understand that pymc3 should not be used)
  • Fix documentation website for various classes including user-facing Cell2location class (automodule->autoclass)
  • Update readme - shorten what's possible
  • Update readme - future work
  • Update readme - add link to Nat Biotech paper
  • Update acknowledgements
  • Update copyright date
  • Address this issue #74
  • Add badge with number of downloads [![Downloads](https://pepy.tech/badge/cell2location)](https://pepy.tech/project/cell2location) - but we don't have a lot of downloads on pip because we are not using pip
  • Revise colab version of the notebook (less posterior samples)
  • Add this line to tutorial: "The values are stored in adata.uns[f"mod_coloc_n_fact{n_fact}"] in a similar output format main cell2location results."

Tutorial not running on Docker version

Hello,
Congratulations for this tools, which seems impressive! I would like to run it on my data, so I installed the docker image. Before running it on my data, I wanted to try the tutorial. So i opened the "cell2location_estimating_signatures" jupyter notebook, and tried to run it. From the second block of code (Loading single-cell reference data), I have a file not found error :


OSError Traceback (most recent call last)
in
3
4 ## snRNA reference (raw counts)
----> 5 adata_snrna_raw = anndata.read_h5ad(sc_data_folder + "rawdata/all_cells_20200625.h5ad")
6
7 ## Cell type annotations

/opt/conda/envs/cellpymc/lib/python3.7/site-packages/anndata/_io/h5ad.py in read_h5ad(filename, backed, as_sparse, as_sparse_fmt, chunk_size)
398 )
399
--> 400 with h5py.File(filename, "r") as f:
401 d = {}
402 for k in f.keys():

/opt/conda/envs/cellpymc/lib/python3.7/site-packages/h5py/_hl/files.py in init(self, name, mode, driver, libver, userblock_size, swmr, rdcc_nslots, rdcc_nbytes, rdcc_w0, track_order, **kwds)
406 fid = make_fid(name, mode, userblock_size,
407 fapl, fcpl=make_fcpl(track_order=track_order),
--> 408 swmr=swmr)
409
410 if isinstance(libver, tuple):

/opt/conda/envs/cellpymc/lib/python3.7/site-packages/h5py/_hl/files.py in make_fid(name, mode, userblock_size, fapl, fcpl, swmr)
171 if swmr and swmr_support:
172 flags |= h5f.ACC_SWMR_READ
--> 173 fid = h5f.open(name, flags, fapl=fapl)
174 elif mode == 'r+':
175 fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper()

h5py/h5f.pyx in h5py.h5f.open()

OSError: Unable to open file (unable to open file: name = '/sanger_projects/cell2location_paper/notebooks/selected_data/mouse_visium_snrna/rawdata/all_cells_20200625.h5ad', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I also tried to download the files using the link in the preprint and change the path to the local path on my computer, with the same result.
I am new to docker images, so I am not sure whether the path is related to an online location or locally.

Is there an easy fix for this, or should I try to use another way of accessing cell2location than the docker image?

Thank you in advance.

Best regards,
Florent

Import error

hi,
I have installed the cell2location packages with Conda. My Linux is Centos7. But when I suppose to import the package in Python, it ended for this error :
You can find the C code in this temporary file: /tmp/theano_compilation_error_5q0kyagy
library inux-gnu/4.9.3/cc1plus: is not found.
Traceback (most recent call last):
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/theano/gof/lazylinker_c.py", line 81, in
actual_version, force_compile, _need_reload))
ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/theano/gof/lazylinker_c.py", line 105, in
actual_version, force_compile, _need_reload))
ImportError: Version check of the existing lazylinker compiled file. Looking for version 0.211, but found None. Extra debug information: force_compile=False, _need_reload=True

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/cell2location/init.py", line 2, in
from .run_c2l import run_cell2location
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/cell2location/run_c2l.py", line 9, in
import theano
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/theano/init.py", line 110, in
from theano.compile import (
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/theano/compile/init.py", line 12, in
from theano.compile.mode import *
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/theano/compile/mode.py", line 11, in
import theano.gof.vm
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/theano/gof/vm.py", line 674, in
from . import lazylinker_c
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/theano/gof/lazylinker_c.py", line 140, in
preargs=args)
File "path/miniconda/envs/cellpymc/lib/python3.7/site-packages/theano/gof/cmodule.py", line 2411, in compile_str
(status, compile_stderr.replace('\n', '. ')))
Exception: Compilation failed (return status=1): path/gcc-4.9.3/libexec/gcc/x86_64-unknown-linux-gnu/4.9.3/cc1plus: error while loading shared libraries: libmpfr.so.1: cannot open shared object file: No such file or directory.

Does this error was caused by my system environment or something others? How can I fix it?
Thanks

Multi-GPU training

I am trying cell2loc on GCP with multi-GPUs. It seems to be using only one. How can I train on multiple GPUs. Multiple cheaper GPUs like K80 potentially can fit bigger spatial data. If you have a recommended GPU setting for many thousands of spots, I'd be happy to hear about it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.