statbiomed / spatialdm Goto Github PK

View Code? Open in Web Editor NEW

23.0 2.0 4.0 54.37 MB

Spatial direct messaging detected by bivariate Moran

Home Page: https://spatialdm.readthedocs.io

License: Apache License 2.0

Python 0.42% Jupyter Notebook 99.58%

cell-cell-communication ligand-receptor-interaction spatial-correlation spatial-transcriptomics

spatialdm's People

Contributors

Stargazers

Watchers

Forkers

wyq977 yuifu hiraksarkar joan-yanqiong

spatialdm's Issues

time consuming about permutation!

adata_sub is about 4k cells and 25k genes,
sdm.spatialdm_global(adata_sub, n_perm = 1000, specified_ind=None, method="z-score", nproc=6) cost about 2 minutes,
however,
sdm.spatialdm_global(adata_sub, n_perm = 1000, specified_ind=None, method="permutation", nproc=6) cost about 2 hours!

The process of 'Spatial Clustering of Local Spots' using SpatialDE is taking an excessive amount of time. Could I have overlooked some parameters?

I ran the 'Spatial Clustering of Local Spots' step using the following code:

from threadpoolctl import threadpool_limits
with threadpool_limits(limits=10, user_api='blas'):
    results = SpatialDE.run(adata.obsm['spatial'], bin_spots.transpose())
    histology_results, patterns = SpatialDE.aeh.spatial_patterns(
        adata.obsm['spatial'], bin_spots.transpose(), results, C=3, l=3, verbosity=1)

For 7,000 spots, it took approximately 4 hours. However, for 30,000 spots, it has been running for more than 48 hours and hasn't finished yet. Does anyone have insights into why it takes so long, or did I miss some parameters?

Load 10xgenomics Visium data using scanpy

Hello,

Thank you for sharing the package.

I am trying to load some Visium data following the tutorial. In there, I see that the data is loaded as adata1 = sdm.datasets.dataset.melanoma(), I thought that I could just do adata = sc.read_visium("visium_ov_orig") to load my own files(using a Visium h5 file, this can be replicated with this file ), but then, I am getting errors in other parts of the tutorial, it starts failing when I call:
sdm.extract_lr(adata, 'human', min_cell=3)
with the error:
pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Is there a different way to load this data? I tried checking the codebase and couldn't find a method that seemed more suitable.

Thank you

Supp Note 1 and Example

Hi Zhuoxuan Li (@leeyoyohku).

When I am reading the paper on bioarxic, I couldn't find the Supp. Note 1 that contains the derivation for null distribution of Global Moran's R and other methodology. Can you send me a pdf or update the manuscript? That would be really helpful.

BTW, when I was running the examples (3 jupyter notebook), NaiveDE and SpatialDE needs to installed.

It would be helpful that you could add a short note for the requirement (just for the example not the requirement for the package)

e.g.

To run the examples notebooks, NaiveDE and SpatialDE needs to be installed.

NaiveDE

# latest version
pip install -U git+https://github.com/Teichlab/NaiveDE
# or pypi
# pip install NaiveDE

SpatialDE

pip install spatialde

hilearn

# latest version
pip install -U git+https://github.com/huangyh09/hilearn
# or pypi
# pip install hilearn

Code Update Request: Avoid 'np.float' Deprecation Error in melanoma.ipynb

Thank you for creating such a wonderful software!

I immediately installed SpatialDM and tried running melanoma.ipynb, but I got an error.
https://github.com/StatBiomed/SpatialDM/blob/main/tutorial/melanoma.ipynb

Here is the code cell where the error occurred:

import time
start = time.time()
sdm.spatialdm_global(adata, 1000, specified_ind=None, method='both', nproc=1)     # global Moran selection
sdm.sig_pairs(adata, method='permutation', fdr=True, threshold=0.1)     # select significant pairs
print("%.3f seconds" %(time.time()-start))

The error is as follows:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 3
      1 import time
      2 start = time.time()
----> 3 sdm.spatialdm_global(adata, 1000, specified_ind=None, method='both', nproc=1)     # global Moran selection
      4 sdm.sig_pairs(adata, method='permutation', fdr=True, threshold=0.1)     # select significant pairs
      5 print("%.3f seconds" %(time.time()-start))

File /opt/conda/lib/python3.11/site-packages/spatialdm/main.py:205, in spatialdm_global(adata, n_perm, specified_ind, method, nproc)
    202     raise ValueError("Only one of ['z-score', 'both', 'permutation'] is supported")
    204 with threadpool_limits(limits=nproc, user_api='blas'):
--> 205     pair_selection_matrix(adata, n_perm, specified_ind, method)
    207 adata.uns['global_res'] = pd.concat((adata.uns['ligand'], adata.uns['receptor']),axis=1)
    208 # adata.uns['global_res'].columns = ['Ligand1', 'Ligand2', 'Ligand3', 'Receptor1', 'Receptor2', 'Receptor3', 'Receptor4']

File /opt/conda/lib/python3.11/site-packages/spatialdm/utils.py:114, in pair_selection_matrix(adata, n_perm, sel_ind, method)
    112 # averaged ligand values
    113 L1 = [pd.Series(x[0]).dropna().values for x in ligand.values]
--> 114 L_mat = [adata[:, L1[l]].X.astype(np.float)[:, 0] for l in range(len(L1))]
    115 for i, k in enumerate(ligand.index):
    116     if len(ligand.loc[k].dropna()) > 1:

File /opt/conda/lib/python3.11/site-packages/spatialdm/utils.py:114, in <listcomp>(.0)
    112 # averaged ligand values
    113 L1 = [pd.Series(x[0]).dropna().values for x in ligand.values]
--> 114 L_mat = [adata[:, L1[l]].X.astype(np.float)[:, 0] for l in range(len(L1))]
    115 for i, k in enumerate(ligand.index):
    116     if len(ligand.loc[k].dropna()) > 1:

File /opt/conda/lib/python3.11/site-packages/numpy/__init__.py:305, in __getattr__(attr)
    300     warnings.warn(
    301         f"In the future `np.{attr}` will be defined as the "
    302         "corresponding NumPy scalar.", FutureWarning, stacklevel=2)
    304 if attr in __former_attrs__:
--> 305     raise AttributeError(__former_attrs__[attr])
    307 # Importing Tester requires importing all of UnitTest which is not a
    308 # cheap import Since it is mainly used in test suits, we lazy import it
    309 # here to save on the order of 10 ms of import time for most users
    310 #
    311 # The previous way Tester was imported also had a side effect of adding
    312 # the full `numpy.testing` namespace
    313 if attr == 'testing':

AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

My environment is

Python 3.11.4
numpy 1.24.4

So, I think the above issue would occur when users use numpy >1.20.

I would appreciate it if you could update the code to avoid deprecation concerning np.float.

Load 10xgenomics Visium data using scanpy

Hello,

Thank you for sharing the package.

Is there a different way to load this data? I tried checking the codebase and couldn't find a method that seemed more suitable.

Thank you

positive mask commented out + other questions

Hi,

Congrats for the well-timed and efficient tool!

I wanted to ask in a previous version you had a positive local mask implemented, which is now commented out:

SpatialDM/spatialdm/utils.py

Line 191 in 20d28c2

#adata.uns['local_z_p'] = np.where(pos.T == False, 1, adata.uns['local_z_p'])

What is the reasoning for this?

Also, were the correlations between analytical and permutation-based local p-values calculated before or after applying the mask?

Thanks in advance for your response!

Daniel

Help with the simulations from the paper

Hi,

Great paper and package. Thanks alot for making the code open source. I am trying to simulate the dataset for benchmarking cell-cell interaction tools following the Figure 1 f,g,h of the paper. I read the an adapted version of SVCA is being used for simulation. However I am unable to find the source code (or scripts for running SVCA) to generate or reproduce the ROC plots from the paper. It would be great to get some pointers to simulate the data as described.

Thanks again,
Best

InvalidIndexError when running sdm.extract_lr(adata, 'human', min_cell=3)

I encountered an error when running sdm.extract_lr(adata, 'human', min_cell=3), following is the detailed issue:

sdm.weight_matrix(adata, l=1.2, cutoff=0.2, single_cell=False) # weight_matrix by rbf kernel
----> 3 sdm.extract_lr(adata, 'human', min_cell=3)      # find overlapping LRs from CellChatDB
      5 sdm.spatialdm_global(adata, 1000, specified_ind=None, method='both', nproc=1)     # global Moran selection
      6 sdm.sig_pairs(adata, method='permutation', fdr=True, threshold=0.1)     # select significant pairs

File ~\AppData\Roaming\Python\Python39\site-packages\spatialdm\main.py:149, in extract_lr(adata, species, mean, min_cell, datahost)
    147     meanR = gmean(adata[:, receptor[i]].X, axis=1)
    148 else:
--> 149     meanL = adata[:, ligand[i]].X.mean(axis=1)
    150     meanR = adata[:, receptor[i]].X.mean(axis=1)
    151 if (sum(meanL > 0) >= min_cell) * \
    152         (sum(meanR > 0) >= min_cell):

File ~\AppData\Roaming\Python\Python39\site-packages\anndata\_core\anndata.py:1108, in AnnData.__getitem__(self, index)
   1106 def __getitem__(self, index: Index) -> "AnnData":
   1107     """Returns a sliced view of the object."""
-> 1108     oidx, vidx = self._normalize_indices(index)
   1109     return AnnData(self, oidx=oidx, vidx=vidx, asview=True)

File ~\AppData\Roaming\Python\Python39\site-packages\anndata\_core\anndata.py:1089, in AnnData._normalize_indices(self, index)
   1088 def _normalize_indices(self, index: Optional[Index]) -> Tuple[slice, slice]:
-> 1089     return _normalize_indices(index, self.obs_names, self.var_names)

File ~\AppData\Roaming\Python\Python39\site-packages\anndata\_core\index.py:33, in _normalize_indices(index, names0, names1)
     31 ax0, ax1 = unpack_index(index)
     32 ax0 = _normalize_index(ax0, names0)
---> 33 ax1 = _normalize_index(ax1, names1)
     34 return ax0, ax1

File ~\AppData\Roaming\Python\Python39\site-packages\anndata\_core\index.py:95, in _normalize_index(indexer, index)
     93     return positions  # np.ndarray[int]
     94 else:  # indexer should be string array
---> 95     positions = index.get_indexer(indexer)
     96     if np.any(positions < 0):
     97         not_found = indexer[positions < 0]

File ~\AppData\Roaming\Python\Python39\site-packages\pandas\core\indexes\base.py:3904, in Index.get_indexer(self, target, method, limit, tolerance)
   3901 self._check_indexing_method(method, limit, tolerance)
   3903 if not self._index_as_unique:
-> 3904     raise InvalidIndexError(self._requires_unique_msg)
   3906 if len(target) == 0:
   3907     return np.array([], dtype=np.intp)

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Can you help?

spatialdm_global does not work with a list of specified index, pair_selection_matrix would change adata

Hi I found out there's a bug in spatialdm_global

SpatialDM/spatialdm/main.py

Line 155 in aa17624

adata.uns['global_stat']['z']['st'] = globle_st_compute(adata)

For example, if there are 400 lr pairs after extract_lr, one select 10 pairs for computation.

User cannot specify specified_ind as in globle_st_compute(adata) would also be still 400 (or 400x400)

That would throw a error in calculation due to

adata.uns['global_stat']['z']['st'].shape != np.zeros(total_len).shape

Also in this function, it would change/remove the ligand, receptor and geneInter in adata. Namely, the changes to adata would NOT be local.

For example, if one only specify 2 pairs, after calculation, ligand, receptor and geneInter would be reduced to 2.

You have to re-run extract_lr to get the full list.

SpatialDM/spatialdm/utils.py

Line 92 in aa17624

def pair_selection_matrix(adata, n_perm, sel_ind, method):

Question about global permutation test

Hello, I noticed that the package compares the actual global I with the permuted global I's to compute the permutation p-value:

    adata.uns['global_stat']['perm']['global_p'] = 1 - (adata.uns['global_I'] \
                         > adata.uns['global_stat']['perm']['global_perm'].T).sum(axis=0) / n_perm

My question is why do you compare only one direction (">"), i.e., greater than the mean under NULL? Global I can be negative, i.e., less than the mean under NULL indicating negative spatial association.

Please let me know if I misunderstood the package and the paper!

weight_matrix with n_nearest_neighbors and n_neighbors

Hi, I am wondering why there are two parameters for nearest neighbours in weight matrix calculation.

SpatialDM/spatialdm/main.py

Line 34 in aa17624

 def weight_matrix(adata, l, cutoff=None, n_neighbors=None, n_nearest_neighbors=6, single_cell=False): 

It seems to me that there is no need for both or am I missing sth?

Best

I keep getting error when running spatialdm_local AttributeError: 'NoneType' object has no attribute 'to_adata'

Hi,

I followed the spatial dm example but I keep getting errors when running spatialdm_local.
Below is the code I am running.

start = time.time()
sdm.spatialdm_local(adata, n_perm=1000, method='both', specified_ind=None, nproc=1)     # local spot selection
sdm.sig_spots(adata, method='permutation', fdr=False, threshold=0.1)     # significant local spots
print("%.3f seconds" %(time.time()-start))

And here is the error I am getting.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[47], line 2
      1 start = time.time()
----> 2 sdm.spatialdm_local(adata, n_perm=1000, method='both', specified_ind=None, nproc=1)     # local spot selection
      3 sdm.sig_spots(adata, method='permutation', fdr=False, threshold=0.1)     # significant local spots
      4 print("%.3f seconds" %(time.time()-start))

File /n/data1/mgh/neuro/petti/lab/Users/khan.saad/Conda_envs/python_3.10/lib/python3.10/site-packages/spatialdm/main.py:275, in spatialdm_local(adata, n_perm, method, specified_ind, nproc, scale_X)
    273 ## different approaches
    274 with threadpool_limits(limits=nproc, user_api='blas'):
--> 275     spot_selection_matrix(adata, ligand, receptor, ind, n_perm, method, scale_X)

File /n/data1/mgh/neuro/petti/lab/Users/khan.saad/Conda_envs/python_3.10/lib/python3.10/site-packages/spatialdm/utils.py:179, in spot_selection_matrix(adata, ligand, receptor, ind, n_perm, method, scale_X)
    176 def spot_selection_matrix(adata, ligand, receptor, ind, n_perm, method, scale_X=True):
    177     # local variables (only live in this function scope)
    178     # normalize raw counts
--> 179     raw_norm = adata.raw.to_adata()
    180     raw_norm.X = csr_matrix([norm_max(X) for X in raw_norm.X.T]).T
    181     import scanpy as sc

AttributeError: 'NoneType' object has no attribute 'to_adata'

reading in h5ad file

Hi. I'm trying to read in a h5ad file which was generated from seurat R package. I am using this function: adata = sdm.read_spatialdm_h5ad(filename), but I get KeyError: 'geneInter'. Is this not the right way to read in the data? Thanks.

Bugs discovery for single-cell level spatialDM

Hi, I met a bug when inferring cci in image-based spatial data:

sdm.weight_matrix(adata, l=1.2, cutoff=0.2, single_cell=True) # weight_matrix by rbf kernel

Could you please take a look? Thanks.