scverse / muon Goto Github PK

View Code? Open in Web Editor NEW

208.0 14.0 30.0 5.17 MB

muon is a multimodal omics Python framework

Home Page: https://muon.scverse.org/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

multimodal-data multimodal-omics-analysis multi-omics anndata scanpy mudata muon scrna-seq scatac-seq cite-seq

muon's People

Contributors

Stargazers

Watchers

muon's Issues

In-place filtering is ambiguous for Views

Muon introduced in-place filtering API such as mu.pp.filter_obs(). It modifies all the attributes in place according to the specified axis (e.g. observations). When operated on the view of an object — AnnData or MuData — , in-place filtering behaviour is not strictly defined.

Considering the Zen, in-place filtering should only work for objects that are not views, and throw an error with a clear message on views since there's already an explicit way to get a filtered object that is not a view:

# Should not work on views
ad2 = mu.pp.filter_obs(ad_view, some_obs)
# => TypeError

# Works on views and has explicit behaviour
ad2 = ad_view[some_obs].copy()

Copying MuData objects

Hi! I am wondering if there is a way to copy MuData objects (i.e. MuData_copy = MuData.copy()) without getting TypeError: cannot unpack non-iterable NoneType object

Originally posted by @SamuelAMiller1 in #24 (comment)

Exporting or saving plots to pdf file

Hi there!

I wanted to thank you for this fantastic tool and beautiful tutorials. It has saved me SO much time and headache.

I wondered if there is a way to export the plots I make with muon to pdf, tiff, or png files? I apologize if this was already covered and I missed it; I don't see a 'save' parameter like the one in Scanpy, nor can I use plt.savefig() function as one can with seaborn plots.

I apologize for this trivial question/request, but I hoped to export said pdfs into illustrator to edit the axis labels.

Thank you for your time!

Cheers,
Fran

muon.atac.tl.count_fragments_features atac bug

Hi there,

I think that there is some bug in the count_fragments_features atac features. The gene activity I created from those are very bad looking. This is issue is separate from the other issue where the strand issue is considered. I think the issue might have been with the use of the lil matrix.

The first is with the muon function ,while the second picture is the one which i construct it with a coo matrix(with the strand fix).

Can't save MuData object to h5mu file

I created a MuData object that contains the AnnData for 2 modalities, did some basic filtering of the datasets and then tried to save it with: joint.write("joint_data.h5mu") but this throws the following error:

TypeError                                 Traceback (most recent call last)
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    213         try:
--> 214             return func(elem, key, val, *args, **kwargs)
    215         except Exception as e:

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs)
    174     else:
--> 175         _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
    176 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in get_writer(self, dest_type, typ, modifiers)
     63         if (dest_type, typ, modifiers) not in self.write:
---> 64             raise TypeError(
     65                 f"No method has been defined for writing {typ} elements to {dest_type}"

TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

The above exception was the direct cause of the following exception:

TypeError                                 Traceback (most recent call last)
/tmp/efec76988f/ipykernel_20272/4022115007.py in <module>
----> 1 joint.write("../Merged/929_cancer/929_cancer_joint_data.h5mu")

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/mudata.py in write_h5mu(self, filename, **kwargs)
   1084             raise ValueError("Provide a filename!")
   1085         else:
-> 1086             write_h5mu(filename, self, **kwargs)
   1087             if self.isbacked:
   1088                 self.file.filename = filename

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in write_h5mu(filename, mdata, **kwargs)
    207 
    208     with h5py.File(filename, "w", userblock_size=512) as f:
--> 209         _write_h5mu(f, mdata, **kwargs)
    210     with open(filename, "br+") as f:
    211         nbytes = f.write(

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in _write_h5mu(file, mdata, write_data, **kwargs)
     44         dataset_kwargs=kwargs,
     45     )
---> 46     write_attribute(file, "obsm", mdata.obsm, dataset_kwargs=kwargs)
     47     write_attribute(file, "varm", mdata.varm, dataset_kwargs=kwargs)
     48     write_attribute(file, "obsp", mdata.obsp, dataset_kwargs=kwargs)

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in write_attribute(*args, **kwargs)
    132         DeprecationWarning,
    133     )
--> 134     return write_elem(*args, **kwargs)
    135 
    136 

/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
    218             else:
    219                 parent = _get_parent(elem)
--> 220                 raise type(e)(
    221                     f"{e}\n\n"
    222                     f"Above error raised while writing key {key!r} of {type(elem)} "

TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>

Above error raised while writing key 'obsm' of <class 'h5py._hl.files.File'> to /

I also tried to save only a MuData object with just the raw matrices (no more metadata), but it throws the same error, also when trying to save each of the modalities alone (in a MuData object with only 1 modality).

I am using python '3.8.12', scanpy '1.9.1' and muon '0.1.2'

Thank you for your help, this is a very useful tool.

ATAC QC error

I got this error from running ac.pl.fragment_histogram(atac, region='chr1:1-2000000')
KeyError Traceback (most recent call last)
in
----> 1 ac.pl.fragment_histogram(atac, region='chr1:1-2000000')

~\Anaconda3\lib\site-packages\muon_atac\plot.py in fragment_histogram(data, region, groupby)
330 raise TypeError("Expected AnnData or MuData object with 'atac' modality")
331
--> 332 fragment_path = adata.uns["files"]["fragments"]
333 fragments = tools.fetch_regions_to_df(fragment_path=fragment_path, features=region)
334

~\Anaconda3\lib\site-packages\anndata\compat_overloaded_dict.py in getitem(self, key)
98 return self.overloaded[key].get()
99 else:
--> 100 return self.data[key]
101
102 def setitem(self, key, value):

KeyError: 'files'

I got another error by running ac.tl.nucleosome_signal(atac, n=1e6)

KeyError Traceback (most recent call last)
in
----> 1 ac.tl.nucleosome_signal(atac, n=1e6)

~\Anaconda3\lib\site-packages\muon_atac\tools.py in nucleosome_signal(data, n, nucleosome_free_upper_bound, mononuleosomal_upper_bound)
1031
1032 if "files" not in adata.uns or "fragments" not in adata.uns["files"]:
-> 1033 raise KeyError(
1034 "There is no fragments file located yet. Run muon.atac.tl.locate_fragments first."
1035 )

KeyError: 'There is no fragments file located yet. Run muon.atac.tl.locate_fragments first.'

Not sure what to do?

View distal/promoter peaks after RNA/ATAC integration

Is there a way to get the UMAP with distal vs promoter peaks on the integrated dataset? Similar to the function ac.pl.umap(atac, color=["KLF4"], average="peak_type") but using mu.pl.umap(mdata, color=["KLF4", "chr9:107480158-107492721"]).

Seurat flavored CLR

Description

The implementation of CLR here performs a within feature normalization. Seurat normalizes across features (see here). It would be good to have the option to choose between the two approaches, e.g. via an attribute flavor. Alternatively, the argument axis could be used to indicate along which axis to normalize.

High Cpu Usage

Hello,
when I ran mu.pp.neighbors(mdata),cpu was used 100%.
So how to set up same parameters to prevent HIGH CPU USAGE?

python 3.7 with latest muon

Thanks!

concatenate muon objects

Hi,
Is it possible to concatenate two mudata objects. I do not see a method in the reference API.
Presently I concatenate two objects separately for each modality using anndata.concatenate and then create a new mudata obj.
It would be nice to have an anndata.concatenate like facility for mudata too.

Vijay

mu.pp.intersect_obs introduces Nans to mdata.obs

Running mu.pp.intersect_obs erroneously introduces NaNs into mdata.obs

Minimal working example:

import numpy as np
import muon as mu

def test_for_nans():
    assert mdata.obs['batch'].isna().sum() == 0
    
x = mu.AnnData(X=np.random.normal(size=1000).reshape(-1, 10))
y = mu.AnnData(X=np.random.normal(size=1000).reshape(-1, 10))

batches = np.random.choice(["a", "b", "c"], size=100, replace=True)

mdata = mu.MuData({"rna": x, "prot": y})

mdata.obs['batch'] = batches
test_for_nans() # no error

mdata['rna'].obs['total_count'] = mdata['rna'].X.sum(axis=1)
mdata['rna'].obs['min_count'] = mdata['rna'].X.min(axis=1)
mdata.update()

# filter one of the modalities.
mu.pp.filter_obs(mdata['rna'], 'min_count', lambda x: (x < -2))
mu.pp.intersect_obs(mdata)

test_for_nans() # assert is False so it returns an error, in fact all of mdata.obs['batch'] are nans

In my tests above the mdata.obs['batch'] are all NaNs after running intersect obs.
Weirdly in bigger datasets, sometimes a really small number of data entries are not NaNs.

System
OS: CentOS Linux release 7.8.2003 (Core)
Python 3.9.5
Versions of libraries involved
numpy 1.20.3
muon 0.1.1 (installed from github today (2021-11-30) using pip install git+https://github.com/gtca/muon )

Illegal instruction: 4

When I try to import muon I get this error
"Illegal instruction: 4"

I've read this happened in other packages like tensor flow in users with M1 and M1 Pro chips, although there is no clear solution. I've updated my OS to the latest (macOS Monterey 12.6) but the bug persists. My laptop has an M1 Pro chip. Not sure if this is something that the developers can fix but just wanted to put it out there in case someone else encounters the same issue.

Documentation for `muon.atac.tl.scan_sequences` not up to date

Hi,

The documentation for muon.atac.tl.scan_sequences (https://muon.readthedocs.io/en/latest/api/generated/muon.atac.tl.scan_sequences.html) does not seem to be up to date. The function's signature and the parameter description do not match.

Best

Can't save MuData object to h5mu file due to OrderedDict

Hi,

When trying to write the mudata object, h5py crashes due to some .uns values being OrderedDict instead of normal dictionaries.

TypeError: No method has been defined for writing <class 'collections.OrderedDict'> elements to <class 'h5py._hl.group.Group'>

Here is how to reproduce the bug:

import muon as mu
import os


# Read and create mdata object
data_dir = 'pbmc10k/'
mdata = mu.read_10x_h5(os.path.join(data_dir, "filtered_feature_bc_matrix.h5"))
mdata.var_names_make_unique()

# Write
mdata.write("test.h5mu")

The pbmc10k is the data used in this tutorial.

Apparently, when reading the inputs, mu.read_10x_h5 creates two OrderedDicts in mdata.mod['atac'].uns. If I set them to dict then it saves the object correctly but it would be nice to handle this automatically.

mdata.mod['atac'].uns['files'] = dict(mdata.mod['atac'].uns['files'])
mdata.mod['atac'].uns['atac'] = dict(mdata.mod['atac'].uns['atac'])

Thanks you for your time!

System

OS: Ubuntu 22
Python 3.8.13
anndata 0.8.0
mudata 0.2.0
muon 0.1.3

Can't get copy for a view

Describe the bug
Thanks developers for providing such a great tool.
I found that it will triger a AttributeError when I make a copy for a view.

To Reproduce
Steps to reproduce the behaviour.

a = mu.MuData(
    {
        "test": sc.AnnData(
            X=np.ones([2, 2]),
            obs=pd.DataFrame(index=["a", "b"]),
            var=pd.DataFrame(index=["a", "b"]),
        ),
    }
)
a.copy()
b = a['a',:]
# b._adata_ref = b._mudata_ref
b.copy()

Then we will get a AttributeError:

AttributeError: 'MuData' object has no attribute '_adata_ref'

And if we set _adata_ref variable for the view, this error would be avoided.

Implement `backed` for AnnData and .h5mu files, MuData

AnnData supports backing attributes instead of loading the full file into the memory.

Implementing this functionality for muon's I/O, akin to read_h5ad_backed here, will make it more comfortable to use .h5mu files.

This implementation should probably include a new backed: bool argument for I/O functions without adding new public functions.
Backed functionality for MuData objects implies using backed=True for all AnnData objects inside it.

Implement dsb normalisation for CITE-seq data

Feature barcoding is one of the approaches to complement gene expression data with other laters of information from the same cell.
For instance, CITE-seq allows to quantitatively detect surface proteins using antibody-bound oligos.
This layer of information corresponds to the Antibody Capture feature type in a 10X Genomics-formatted output.

It seems there is still no consensus on the best performing normalisation method for CITE-seq. In addition to the ones discussed at the linked thread, there is also a dsb method for normalising and denoising CITE-seq data but it is only available in R.

As a pre-processing method for a specific modality, it would live in the muon/_prot/preproc.py file, which would be importable and called as expected:

from muon import prot as pt
pt.pp.dsb(...)

This method uses information from the raw (non-filtered) matrix so there's also a question of designing an API that is not too confusing. Ideally, we would want to stick to a familiar interface for preprocessing. To make things simple, this method might operate on two MuData objects, one with filtered counts and another one with raw counts:

pt.pp.dsb(mdata, mdata_raw, ...)

The normalisation and denoising would be performed on the mdata_raw object and normalised protein counts for respective cells would be copied into the mdata object. Threshold for negative and positive (cells) droplets can be expected to be provided manually (e.g. empty_counts_min & empty_counts_max, cell_counts_min & cell_counts_max).

It should also be made possible to work with a single mdata object but it has to be the raw data. Since it is usually not the case, a warning should be displayed.

pt.pp.dsb(mdata_raw, ...)
# => display warning

Indentation contains mixed spaces and tabs

Description

Indentations in muon/_atac/preproc.py contains mixed spaces and tabs.

Use a representation from mudata.obsm to calculate neighbors

Hi,

is there an easy way to use e.g. mdata.obsm['X_mofa'] representation to calculate neighbors? Now I create a new AnnData object from the representation, calculate neighbors with sc.pp.neighbors(), and then store the connectivities and distances in the mdata.obsp, but it would be nice to have a wrapper around this in muon. Thanks!

Introduce explicit syntax to detach backed modalities

If a modality has been backed inside .h5mu, on copy/write operations we might want to have a clearly defined behaviour. In particular the proposal is to differentiate between modalities linked to the MuData object and modalities detached from it:

mdata = mu.read('rna_atac.h5mu', backed=True)

rna = mdata['rna']
#          ^__ linked
rna.copy('rna.h5ad')
# => Error referring the user to the MuData object

rna = mdata.mod['rna']
#          ^__ detached
rna.copy('rna.h5ad')
# => Success

This proposal has been brought forward by @ilia-kats and has been originally discussed in the issue #17.

Plotting function should document parameters better

As discussed in #49, documentation should be more clear that mu.pl.* support arguments from corresponding sc.pl.* functions.

Datasets module

I think it would be useful to add a datasets module. It's very useful for prototyping and debugging.

For prototyping, you've always got a couple extra objects to try a function on. This is also great for downstream library authors.

For debugging: you can quickly grab a dataset no matter what system you're on. And when trying to debug remotely with a user, it's an easy shared source of truth for replication.

Improve mofa interface

GPU interface
- Choose GPU device
- Stochastic VI

Drawing embedding of variables shared in multiple mods will result in an error, even if axis=-1 is set.

Describe the bug
Drawing umap graphs with the same variables in multiple mods will result in errors, even if axis=-1 is set.

To Reproduce

import muon as mu
import scanpy as sc
import pandas as pd

df = pd.DataFrame([[1,2,3]]*3, index=['a','b', 'c'], columns=['e','f','g'])
ad = sc.AnnData(df)
ad.obsm['X_umap'] = pd.DataFrame([[1,2]] * 3).values
md = mu.MuData({'aa': ad, 'bb': ad[:, :2]}, axis=-1)

print(md)
#MuData object with n_obs × n_vars = 3 × 3
#  2 modalities
#    aa:	3 x 3
#      obsm:	'X_umap'
#    bb:	3 x 2
#      obsm:	'X_umap'

mu.pl.embedding(md, 'aa:umap', color='e')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-21-820ff8a24f5e> in <module>
----> 1 mu.pl.embedding(md, 'aa:umap', color='e')

~/softwares/anaconda3/lib/python3.8/site-packages/muon/_core/plot.py in embedding(data, basis, color, use_raw, layer, **kwargs)
    228                         )
    229                 x = fmod_adata.X.toarray() if issparse(fmod_adata.X) else fmod_adata.X
--> 230                 obs = obs.join(
    231                     pd.DataFrame(x, columns=mod_keys, index=fmod_adata.obs_names),
    232                     how="left",

~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
   9097         5  K5  A5  NaN
   9098         """
-> 9099         return self._join_compat(
   9100             other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
   9101         )

~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
   9128                     sort=sort,
   9129                 )
-> 9130             return merge(
   9131                 self,
   9132                 other,

~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
    119         validate=validate,
    120     )
--> 121     return op.get_result()
    122 
    123 

~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in get_result(self)
    715         join_index, left_indexer, right_indexer = self._get_join_info()
    716 
--> 717         llabels, rlabels = _items_overlap_with_suffix(
    718             self.left._info_axis, self.right._info_axis, self.suffixes
    719         )

~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in _items_overlap_with_suffix(left, right, suffixes)
   2306 
   2307     if not lsuffix and not rsuffix:
-> 2308         raise ValueError(f"columns overlap but no suffix specified: {to_rename}")
   2309 
   2310     def renamer(x, suffix):

ValueError: columns overlap but no suffix specified: Index(['e'], dtype='object')

System

muon 0.1.3

Additional context
I noticed that in 0.1.3 there is support for setting the layers used by each mod in a dictionary way. But that doesn't get rid of the error.I suppose this is due to when iterating over each mod, if a variable is shared across multiple mods, it will result in the variable not being successfully joined. This may also be the reason why mudata emphasizes ensuring the uniqueness of variables in the mod as much as possible.
But when a mod is only saving a subset of another mod (i.e. axis=-1), this error should be got around. I think it is possible to simply add a mod parameter, so that when iterating, it only iterates over the manually specified mod.

Error in mu.tl.mofa

Describe the bug
Value passed for key 'LFs' is of incorrect shape. Values of varm must match dimensions (1,) of parent. Value had shape (61452, 10) while it should have had (150850,)

To Reproduce
mu.tl.mofa(mudata)

Additional context
The mudata object is not updated if I subset either the rna or atac slot. As you can see here, the MuData is still 150k features but atac + rna is only 60k features. The object is only updated when its written and read again

MuData object with n_obs × n_vars = 7098 × 150850
rna: 7098 x 21894
obs: 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'celltype'
var: 'gene_ids', 'feature_types', 'genome', 'interval', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std'
uns: 'hvg', 'log1p', 'pca', 'neighbors', 'umap'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
layers: 'counts'
obsp: 'distances', 'connectivities'
atac: 7098 x 39558
obs: 'n_genes_by_counts', 'total_counts', 'nucleosome_signal', 'tss_score'
var: 'gene_ids', 'feature_types', 'genome', 'interval', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
uns: 'atac', 'files', 'hvg', 'lsi', 'neighbors', 'umap'
obsm: 'X_lsi', 'X_umap'
varm: 'LSI'
layers: 'counts'
obsp: 'distances', 'connectivities'

CITE-seq scatter plot of pairwise RNA-prot correlations

Thanks for muon. Following the CITE-seq tutorial, I'd like to do simple scatter plots and regression to show the RNA-prot correlation for particular markers, something like in the Seurat muli-modal tutorial.:

I tried sc.pl.scatter(mdata, x="prot:CD3D", y="rna:CD3D") This raises: KeyError: 'There is no key prot:CD3D in MuData .obs or in .obs of any modalities.' Of course the gene/protein names are stored in .var not .obs.

Suggestions appreciated, thank you

unify logging

We didn't establish a clear policy around logging early on, so now some parts of muon use Scanpy's logger, while others issue plain warnings using the warnings module. Ideally, this should be unified, one way or another.

.copy() doesn't work for backed files

Modalities backed inside a .h5mu file can't be copied to a new location as expected:

>>> mdata = mu.read("rna_atac.h5mu", backed=True)
>>> mdata['rna'].copy("rna.h5ad")
KeyError: "Unable to open object (object 'mod' doesn't exist)"

can't perform sequential mu.pp.filter_obs

I want to perform multiple filtering steps on my object e.g.:

mu.pp.filter_obs(mdata, 'n_genes_by_counts', lambda x: (x >= 100))
mu.pp.filter_obs(mdata,  'total_counts', lambda x: (x >= 500) & (x <= 50000)

Error message:

muon/_core/preproc.py in filter_obs(data, var, func)
    733         # filter_obs() for each modality
    734         for m, mod in data.mod.items():
--> 735             obsmap = data.obsmap[m][obs_subset]
    736             obsmap = obsmap[obsmap != 0] - 1
    737             filter_obs(mod, mod.obs_names[obsmap])

anndata/_core/aligned_mapping.py in __getitem__(self, key)
    146 
    147     def __getitem__(self, key: str) -> V:
--> 148         return self._data[key]
    149 
    150     def __setitem__(self, key: str, value: V):

KeyError: 'x'

Minimal working example:

import numpy as np
import muon as mu
x = mu.AnnData(X=np.random.normal(size=1000).reshape(-1, 100))
y = mu.AnnData(X=np.random.normal(size=2000).reshape(-1, 100))

md = mu.MuData({"x": x, "y": y})

md['x'].obs['total_count'] = md['x'].X.sum(axis=1)
md['x'].obs['min_count'] = md['x'].X.min(axis=1)
md.update()

# filter one of the modalities.
mu.pp.filter_obs(md, 'x:min_count', lambda x: (x < -2))
mu.pp.filter_obs(md, 'x:total_count', lambda x: (x >0))

If you put md.update() between the two filter statements it does work. But to me this is not ideal if you are filtering on lots of categories sequentially

Is this desired behaviour ot something that could be updated such that one could just one run update command after sequential filters??

OS: CentOS Linux release 7.8.2003 (Core)
Python 3.9.5
Versions of libraries involved
- numpy 1.20.3
- muon 0.1.1

Thanks!

Implement louvain and leiden algorithms accepting multiple graphs

Both louvain.find_partition_multiplex() and leidenalg.find_partition_multiplex() accept multiple graphs as input. That would enable mu.tl.louvain and mu.tl.leiden that would take .obsp[neighbors_key] (see louvain and leiden interfaces in scanpy) from each modality — or cast to sc.tl.louvain / sc.tl.leiden when an AnnData object is provided.

This becomes slightly more sophisticated when MuData itself has .obsp['connectivities']. For the beginning, we could probably assume that mu.tl.louvain / mu.tl.leiden are multiplex-only algorithms when operating on MuData objects, and for taking use of mdata.obsp['connectivities'], it's sc.tl.louvain / sc.tl.leiden that should be used.

Incorrect behavior in muon.atac.tl.count_fragments_features

Describe the bug
The extend_upstream and extend_downstream arguments in muon.atac.tl.count_fragments_features do not work as expected because they do not factor in strands. For genes on both strands, the upstream region is extended to Start - extend_upstream, but for genes on the - strand, it should be extended to End + extend_upstream.

To Reproduce
I created a debugging dataset consisting of 2 cells using this fragments file below:

1	63082	63282	cell1	2
1	65082	65282	cell1	2
1	67082	67282	cell1	1
1	69082	69282	cell1	1
1	71082	71282	cell1	1
1	83678	83878	cell1	10
1	85678	85878	cell1	10
1	87678	87878	cell1	10
1	89678	89878	cell1	100
1	91678	91878	cell1	100
2	131043	131243	cell2	200
2	133043	133243	cell2	200
2	135043	135243	cell2	20
2	137043	137243	cell2	20
2	139043	139243	cell2	20
2	215701	215901	cell2	2
2	217701	217901	cell2	2
2	219701	219901	cell2	2
2	221701	221901	cell2	4
2	223701	223901	cell2	4

for these two genes (in GFF3 format); ENSMMUG00000000634 is on the + strand and ENSMMUG00000000051 is on the - strand.

1	ensembl	gene	71582	83178	.	+	.	ID=gene:ENSMMUG00000000634;Name=ZNF692;biotype=protein_coding;description=zinc finger protein 692 [Source:VGNC Symbol%3BAcc:VGNC:100195];gene_id=ENSMMUG00000000634;logic_name=ensembl;version=4
2	ensembl	gene	139543	215201	.	-	.	ID=gene:ENSMMUG00000000051;Name=LMLN;biotype=protein_coding;description=leishmanolysin like peptidase [Source:VGNC Symbol%3BAcc:VGNC:74296];gene_id=ENSMMUG00000000051;logic_name=ensembl;version=4

Expected behaviour
On the test dataset, using extend_upstream = 5000 should count 3 fragments for cell1 upstream of ENSMMUG00000000634 (+ strand) and 6 fragments for cell2 upstream of ENSMMUG00000000051 (- strand). Instead, it counts 3 fragments and 60 fragments, respectively. extend_downstream = 5000 actually has the desired behavior for the - strand (given that I am interested in the upstream region). So I was able to work around this by running muon.atac.tl.count_fragments_features separately on the + strand (using extend_upstream = 5000) and - strand genes in my data (using extend_downstream = 5000). But it would be preferable for the function to directly incorporate strand information.

System

OS: CentOS Linux release 7.9.2009
Python version 3.7.2
Versions of libraries involved: AnnData 0.8.0, pandas 1.3.5, muon 0.1.2

Additional context
I attached the following files for the trial described above

test-anndata.h5ad: AnnData object in h5ad format.
test-fragments.txt.gz: fragments file compressed with bgzip
test-fragments.txt.gz.tbi: Tabix index for fragments file
test-annotation.pkl: pandas data frame containing gene annotations (two genes only) in pickle format

My results can be reproduced with the following:

import anndata as ad
import pandas as pd
import muon as mu

test = ad.read_h5ad('test-anndata.h5ad')
annotation = pd.read_pickle('test-annotation.pkl')
mu.atac.tl.locate_fragments(test,fragments='test-fragments.txt.gz')
result = mu.atac.tl.count_fragments_features(test,annotation,extend_upstream=5000,extend_downstream=0)

print(result['cell1','ENSMMUG00000000634'].X)
#  (0, 0)	3.0
print(result['cell2','ENSMMUG00000000051'].X)
#  (0, 0)	60.0

result = mu.atac.tl.count_fragments_features(test,annotation,extend_upstream=0,extend_downstream=5000)

print(result['cell1','ENSMMUG00000000634'].X)
#  (0, 0)	30.0
print(result['cell2','ENSMMUG00000000051'].X)
#  (0, 0)	6.0

test.tar.gz

Aggregating peaks doesn't work for backed objects

Aggregating peak counts from the .raw slot when plotting ATAC-seq data is not supported when the AnnData object is backed:

mdata = mu.read('data/pbmc10k.h5mu', backed=True)
ac.pl.pca(mdata['atac'], color="KLF4")
# => IndexError: tuple index out of range

It is expected to work, and it should be straightforward to fix.

Implement MuData object slicing

To match prospective user's expectations, it should be possible to slice MuData objects.

While the actual implementation of this feature is expected to be quite verbose, the syntax is straightforward:

mdata[:100,:]  # => mdata object with first 100 samples/cells and all features
mdata[list_of_sample_ids]  # => mdata object with respective samples/cells subsetted
mdata[:,list_of_features]  # => mdata object with respective features subsetted

The implementation would alter MuData.__getitem__ method.

An additional point of consideration is whether a view should be returned. In that case, a view functionality has to be implemented for MuData (e.g. see AnnData._init_as_view). That would probably mean referencing each AnnData inside MuData object as a view.

error in function ac.pl.fragment_histogram(atac, region=''")

Hello, thanks for muon!

I was trying to reproduce the steps described in the ATAC tutorial (https://muon-tutorials.readthedocs.io/en/latest/single-cell-rna-atac/pbmc10k/2-Chromatin-Accessibility-Processing.html) on my independent dataset, but I get an error from the ac.pl.fragment_histogram function.

The code:

import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad
import scipy
import anndata2ri

## Plotting utils
import matplotlib.pyplot as plt
import matplotlib
sc.settings.verbosity = 3  # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=150, color_map='viridis', transparent=True,
                              ipython_format="png2x")
sc.logging.print_header()

import muon as mu
# Import a module with ATAC-seq-related functions
from muon import atac as ac

atac = mu.read(path + "output/file_postrnatutorial.h5mu")

atac.var['mt'] = atac.var_names.str.startswith('MT-')  # annotate the group of mitochondrial genes as 'mt'

sc.pp.calculate_qc_metrics(atac, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
mu.pp.filter_var(atac, 'n_cells_by_counts', lambda x: x >= 10)
mu.pp.filter_obs(atac, 'n_genes_by_counts', lambda x: (x >= 2000) & (x <= 40000))
mu.pp.filter_obs(atac, 'total_counts', lambda x: (x >= 4000) & (x <= 100000))
atac.obs['NS']=1
ac.pl.fragment_histogram(atac, region='chr1:9808-10702')

The error is:

ValueError                                Traceback (most recent call last)
<ipython-input-16-0ef9424cd717> in <module>
----> 1 ac.pl.fragment_histogram(K29TIL_UD_atac, region='chr1:9808-10702')

/usr/local/lib/python3.6/dist-packages/muon/_atac/plot.py in fragment_histogram(data, region, groupby, barcodes)
    372     else:
    373         # Handle sns.distplot deprecation and sns.histplot addition
--> 374         g = hist(fragments.length, **kwargs)
    375         g.set_xlabel("Fragment length (bp)")
    376     g.set(xlim=(0, 1000))

/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
   1434             estimate_kws=estimate_kws,
   1435             line_kws=line_kws,
-> 1436             **kwargs,
   1437         )
   1438 

/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
    435 
    436             # Do the histogram computation
--> 437             heights, edges = estimator(observations, weights=weights)
    438 
    439             # Rescale the smoothed curve to match the histogram

/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in __call__(self, x1, x2, weights)
    369         """Count the occurrances in each bin, maybe normalize."""
    370         if x2 is None:
--> 371             return self._eval_univariate(x1, weights)
    372         else:
    373             return self._eval_bivariate(x1, x2, weights)

/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in _eval_univariate(self, x, weights)
    346         bin_edges = self.bin_edges
    347         if bin_edges is None:
--> 348             bin_edges = self.define_bin_edges(x, weights=weights, cache=False)
    349 
    350         density = self.stat == "density"

/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in define_bin_edges(self, x1, x2, weights, cache)
    264 
    265             bin_edges = self._define_bin_edges(
--> 266                 x1, weights, self.bins, self.binwidth, self.binrange, self.discrete,
    267             )
    268 

/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in _define_bin_edges(self, x, weights, bins, binwidth, binrange, discrete)
    252         elif binwidth is not None:
    253             step = binwidth
--> 254             bin_edges = np.arange(start, stop + step, step)
    255         else:
    256             bin_edges = np.histogram_bin_edges(

ValueError: arange: cannot compute length

The correct fragment file is stored in atac.uns['files'] and I've installed the relevant dependencies with pip install 'muon[atac]'. Any idea what the issue might be?

System

OS: [macOS Catalina]
Python version [3.6.9]
Versions of libraries involved [scanpy==1.7.2 anndata==0.7.5 umap==0.4.6 numpy==1.19.5 scipy==1.5.4 pandas==1.1.5 scikit-learn==0.24.2 statsmodels==0.12.1 python-igraph==0.8.2 louvain==0.7.0 leidenalg==0.8.2]

Thanks!
Rasa

a small BUG in mu.tl.mofa

when

import jax
import numpy as np
import anndata as ad
import muon as mu
from muon import MuData

z = jax.random.normal(key=jax.random.PRNGKey(1), shape=(100,3)) * jax.numpy.array([2, 3, 4])
w = jax.random.normal(key=jax.random.PRNGKey(1), shape=(3,200))
y = z @ w

a1 = ad.AnnData(np.array(y[:,:150]))
a2 = ad.AnnData(np.array(y[:,150:]))
mdata = MuData({"a1": a1, "a2": a2})
mdata.var_names_make_unique()

mdata.obs["group"] = jax.random.choice(key=jax.random.PRNGKey(1), a=jax.numpy.array([0, 1]), shape=(100,))
mdata.obs.group = mdata.obs.group.astype("category")
mu.tl.mofa(mdata, groups_label="group")

        #########################################################
        ###           __  __  ____  ______                    ###
        ###          |  \/  |/ __ \|  ____/\    _             ###
        ###          | \  / | |  | | |__ /  \ _| |_           ###
        ###          | |\/| | |  | |  __/ /\ \_   _|          ###
        ###          | |  | | |__| | | / ____ \|_|            ###
        ###          |_|  |_|\____/|_|/_/    \_\              ###
        ###                                                   ###
        #########################################################



Loaded view='a1' group='0' with N=59 samples and D=150 features...
Loaded view='a1' group='1' with N=41 samples and D=150 features...
Loaded view='a2' group='0' with N=59 samples and D=50 features...
Loaded view='a2' group='1' with N=41 samples and D=50 features...


Model options:
- Automatic Relevance Determination prior on the factors: True
- Automatic Relevance Determination prior on the weights: True
- Spike-and-slab prior on the factors: False
- Spike-and-slab prior on the weights: True
Likelihoods:
- View 0 (a1): gaussian
- View 1 (a2): gaussian




######################################
## Training the model with seed 1 ##
######################################



Converged!



#######################
## Training finished ##
#######################


Saving model in /tmp/mofa_20220802-101431.hdf5...
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [6], in <cell line: 15>()
     13 mdata.obs["group"] = jax.random.choice(key=jax.random.PRNGKey(1), a=jax.numpy.array([0, 1]), shape=(100,))
     14 mdata.obs.group = mdata.obs.group.astype("category")
---> 15 mu.tl.mofa(mdata, groups_label="group")

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/muon/_core/tools.py:581, in mofa(data, groups_label, use_raw, use_layer, use_var, use_obs, likelihoods, n_factors, scale_views, scale_groups, center_groups, ard_weights, ard_factors, spikeslab_weights, spikeslab_factors, n_iterations, convergence_mode, gpu_mode, use_float32, smooth_covariate, smooth_warping, smooth_kwargs, save_parameters, save_data, save_metadata, seed, outfile, expectations, save_interrupted, verbose, quiet, copy)
    579 else:
    580     if groups_label:
--> 581         z = pd.DataFrame(z, index=zs).loc[mdata.obs.index.values].to_numpy()
    582     data.obsm["X_mofa"] = z
    584 # Weights

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/pandas/core/frame.py:694, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    684         mgr = dict_to_mgr(
    685             # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
    686             # attribute "name"
   (...)
    691             typ=manager,
    692         )
    693     else:
--> 694         mgr = ndarray_to_mgr(
    695             data,
    696             index,
    697             columns,
    698             dtype=dtype,
    699             copy=copy,
    700             typ=manager,
    701         )
    703 # For data is list-like, or Iterable (will consume into list)
    704 elif is_list_like(data):

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/pandas/core/internals/construction.py:351, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    346 # _prep_ndarray ensures that values.ndim == 2 at this point
    347 index, columns = _get_axes(
    348     values.shape[0], values.shape[1], index=index, columns=columns
    349 )
--> 351 _check_values_indices_shape_match(values, index, columns)
    353 if typ == "array":
    355     if issubclass(values.dtype.type, str):

File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/pandas/core/internals/construction.py:418, in _check_values_indices_shape_match(values, index, columns)
    414 if values.shape[1] != len(columns) or values.shape[0] != len(index):
    415     # Could let this raise in Block constructor, but we get a more
    416     #  helpful exception message this way.
    417     if values.shape[0] == 0:
--> 418         raise ValueError("Empty data passed with indices specified.")
    420     passed = values.shape
    421     implied = (len(index), len(columns))

ValueError: Empty data passed with indices specified.

HOWEVER,

mdata.obs.group = mdata.obs.group.astype(str)

can run !

So,it is a type error.

Simplify sparsity check

Description

I may be missing something: Is there a reason why scipy's issparse function is not used e.g. here:

https://github.com/PMBio/muon/blob/912b52ae874b7ddc028b3ea0fc166871fd9e41b1/muon/_atac/preproc.py#L96

Empty uns after reloading causes mu.pp.neighbors error

Thanks for the great work on this toolkit!

I came across a potential bug while testing mu.pp.neighbors: when I construct a MuData object, save it and re-load it with mu.read, if the .uns slot was empty it gets set as None when reloaded. This causes mu.pp.neighbors to throw an error.

Example:

I make a new MuData with nothing in mdata.uns:

> mdata = mu.MuData({'rna': rna_adata, 'atac': atac_adata})
> mdata.uns
{}

Saving and re-loading:

> mdata.write("/path/to/mdata.h5mu")
> mdata = mu.read("/path/to/mdata.h5mu")
> mdata.uns

Running WNN (having saved KNN graphs for each modality):

> mu.pp.neighbors(mdata)
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_669/3502185260.py in <module>
----> 1 mu.pp.neighbors(mdata)

~/my-conda-envs/sc2021-multiomics/lib/python3.9/site-packages/muon/_core/preproc.py in neighbors(mdata, n_neighbors, n_bandwidth_neighbors, n_multineighbors, neighbor_keys, metric, low_memory, key_added, weight_key, add_weights_to_modalities, eps, copy, random_state)
    598     mdata.obsp[dists_key] = neighbordistances
    599     mdata.obsp[conns_key] = connectivities
--> 600     mdata.uns[key_added] = neighbors_dict
    601 
    602     mdata.update_obs()

TypeError: 'NoneType' object does not support item assignment

The issue seems to be solved if I reset the .uns slot manually

> mdata.uns = {}
> mu.pp.neighbors(mdata)

System

-----
muon 		0.1.0
anndata     0.7.6
scanpy      1.8.1
sinfo       0.3.4
-----
PIL                 8.3.1
anndata2ri          1.0.6
backcall            0.2.0
beta_ufunc          NA
binom_ufunc         NA
cffi                1.14.6
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.2
decorator           5.0.9
dunamai             1.5.5
get_version         3.5
google              NA
h5py                3.3.0
ipykernel           6.0.3
ipython_genutils    0.2.0
jedi                0.18.0
jinja2              3.0.1
joblib              1.0.1
kiwisolver          1.3.1
llvmlite            0.36.0
markupsafe          2.0.1
matplotlib          3.4.2
matplotlib_inline   NA
mpl_toolkits        NA
muon                0.1.0
natsort             7.1.1
nbinom_ufunc        NA
numba               0.53.1
numexpr             2.7.3
numpy               1.21.1
packaging           21.0
pandas              1.3.1
parso               0.8.2
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prompt_toolkit      3.0.19
ptyprocess          0.7.0
pycparser           2.20
pyexpat             NA
pygments            2.9.0
pynndescent         0.5.4
pyparsing           2.4.7
pytz                2021.1
rpy2                3.4.2
scipy               1.7.1
seaborn             0.11.1
six                 1.16.0
sklearn             0.24.2
statsmodels         0.12.2
storemagic          NA
tables              3.6.1
threadpoolctl       2.2.0
tornado             6.1
tqdm                4.62.0
traitlets           5.0.5
tzlocal             NA
umap                0.5.1
wcwidth             0.2.5
zmq                 22.2.1
-----
IPython             7.26.0
jupyter_client      6.1.12
jupyter_core        4.7.1
-----
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0]
Linux-4.15.0-112-generic-x86_64-with-glibc2.31
26 logical CPU cores, x86_64
-----
Session information updated at 2021-08-24 12:49

pysam installation issue on Windows

I got a warning and an error regarding the dependency of pysam when I tried to run the atac-seq tutorial. I tried installing pysam a couple of times but keep having issues. After a little search on google, I found that pysam cannot be installed on windows OS. Was wondering whether is it true or is there any solution for this.

Best,
Thank you,

Aggregating peaks doesn't work at all

Describe the bug

While going through the tutorial on the 10xMultiome data and plotting the PCA with values for cut counts in peaks corresponding to different genes all plots are colored the same.

Expected behaviour
Correct aggregation across specific peak sets.

System

OS: Linux 5.13.12-arch1-1 #1 SMP PREEMPT Wed, 18 Aug 2021 20:49:03 +0000 x86_64 GNU/Linux
Python version [3.9]

Versions of libraries involved

alabaster==0.7.12
anndata==0.7.6
argon2-cffi==20.1.0
async-generator==1.10
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
bleach==3.3.1
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
click==8.0.1
cycler==0.10.0
debugpy==1.4.1
decorator==4.4.2
defusedxml==0.7.1
docutils==0.16
entrypoints==0.3
flit==3.3.0
flit_core==3.3.0
h5py==3.3.0
idna==3.2
imagesize==1.2.0
ipykernel==6.0.3
ipython==7.25.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==3.0.1
joblib==1.0.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
jupyterthemes==0.20.0
kiwisolver==1.3.1
leidenalg==0.8.7
lesscpy==0.15.0
llvmlite==0.36.0
loompy==3.0.6
MarkupSafe==2.0.1
matplotlib==3.4.2
matplotlib-inline==0.1.2
mistune==0.8.4
muon @ file:///[my local path]/muon
natsort==7.1.1
nbclient==0.5.3
nbconvert==6.1.0
nbformat==5.1.3
nbsphinx==0.8.7
nest-asyncio==1.5.1
networkx==2.5.1
notebook==6.4.0
numba==0.53.1
numexpr==2.7.3
numpy==1.21.1
numpy-groupies==0.9.13
packaging==21.0
pandas==1.3.1
pandocfilters==1.4.3
parso==0.8.2
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.3.1
ply==3.11
prometheus-client==0.11.0
prompt-toolkit==3.0.19
protobuf==3.17.3
ptyprocess==0.7.0
pycparser==2.20
Pygments==2.9.0
pynndescent==0.5.4
pyparsing==2.4.7
pyrsistent==0.18.0
pysam==0.16.0.1
python-dateutil==2.8.2
python-igraph==0.9.6
pytz==2021.1
pyzmq==22.1.0
qtconsole==5.1.1
QtPy==1.9.0
readthedocs-sphinx-search==0.1.0
requests==2.26.0
scanpy==1.8.1
scikit-learn==0.24.2
scipy==1.7.0
seaborn==0.11.1
Send2Trash==1.7.1
sinfo==0.3.4
six==1.16.0
sklearn==0.0
snowballstemmer==2.1.0
Sphinx==4.1.2
sphinx-automodapi==0.13
sphinx-rtd-theme==0.5.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
statsmodels==0.12.2
stdlib-list==0.8.0
tables==3.6.1
terminado==0.10.1
testpath==0.5.0
texttable==1.6.4
threadpoolctl==2.2.0
toml==0.10.2
tornado==6.1
tqdm==4.61.2
traitlets==5.0.5
umap-learn==0.5.1
urllib3==1.26.6
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1
xlrd==1.2.0

Additional context

I believe it's related to #21 or it's fix. In fact reverting the b2347af produces correct behavior.

I've checked manually and the subsetting used in b2347af doesn't work, i.e. returns full matrix (top of the picture) with AnnData 0.7.6.

missing index file atac fragments

Hi, probably a long shot but I need to re-analyse the Teaseq dataset that you use in your multimodal vignette but I can't find the .tbi index (data/teaseq/GSM5123951_X066-MP0C1W3_leukopak_perm-cells_tea_200M_atac_filtered_fragments.tsv.gz.tbi) for the atac fragment files on geo. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158013

do you happen to have these files and could share them?
thank you!

ATACseq+RNAseq Integration Tutorial Error Loading Data

Describe the bug
Hi there, I am not sure if this is a bug, but I cannot seem to run this tutorial locally.

See screenshots below:
Not loading the 'files' ATAC unstructured annotation
Correct importing of data, according to tutorial
Everything works fine up until the ATACseq analysis part.

I get a key error related to 'files'. Which is supposed to be an unstructured annotation in Anndata.

atac.obs['NS'] = 1 
ac.pl.fragment_histogram(atac, region='chr1:1-2000000')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Input In [37], in <cell line: 2>()
      1 atac.obs['NS'] = 1 
----> 2 ac.pl.fragment_histogram(atac, region='chr1:1-2000000')

File ~/mambaforge/envs/bioinfo/lib/python3.10/site-packages/muon/_atac/plot.py:341, in fragment_histogram(data, region, groupby, barcodes)
    338 else:
    339     raise TypeError("Expected AnnData or MuData object with 'atac' modality")
--> 341 fragment_path = adata.uns["files"]["fragments"]
    342 fragments = tools.fetch_regions_to_df(fragment_path=fragment_path, features=region)
    344 fragments["length"] = fragments.End - fragments.Start

File ~/mambaforge/envs/bioinfo/lib/python3.10/site-packages/anndata/compat/_overloaded_dict.py:100, in OverloadedDict.__getitem__(self, key)
     98     return self.overloaded[key].get()
     99 else:
--> 100     return self.data[key]

KeyError: 'files'

I assume this is related to the fact that I have a different directory structure, or that somehow I have downloaded the wrong files from 10X Genomics.

System

OS: Ubuntu 20.04
Python version: 3.8
Versions of libraries involved: Mamba environment with latest Muon version.

Any help is appreciated.
Thanks,
Giuliano.

Enable .obsp / .varp for MuData

Having functional .obsp / .varp attributes would for instance allow to store neighbourhood graphs generated based on multiple modalities. Will likely require implementing AnnData's PairwiseArrays analogue.

MOFA: samples_names and samples_groups do not match if samples in AnnData are not sorted by group

Example:
Samples in AnnData are sample1_group1, sample2_group1, sample1_group2, sample2_group2, sample3_group2, sample3_group1.

Resulting mofa samples_groups and samples_names
modal.data_opts['samples_groups'] = ["group1", "group1", "group2", "group2", "group2", "group1"]
modal.data_opts['samples_names'] = [["sample1_group1", "sample2_group1", "sample3_group1"],
["sample1_group2", "sample2_group2", "sample3_group2"]]

As samples_groups are concatenated later in entry_point.py the correspondence of samples to groups is wrong.

HTML representation for Jupyter notebooks

Jupyter notebooks allow objects to have rich representation https://ipython.readthedocs.io/en/stable/config/integrating.html by implementing a spacial formatter, e.g. _repr_html_() for HTML.

This is something that xarray uses to display rich and interactive object representation. Having this feature will enhance user experience with MuData objects when working in Jupyter notebooks and similar environments.

To switch between 'text' and 'html' representations, global options for muon might need to be introduced, e.g. mu.set_options(display_style='html').

Triger AttributeError "X not found" when use layer parameter in mu.tl.embedding

Describe the bug
Triger AttributeError "X not found" when use layer parameter in mu.tl.embedding.

To Reproduce
mu.pl.embedding( md, "X_umap", color=[*some gene*], cmap="Reds", layer="sct_corrected", )

then a attribute error was triggered

AttributeError Traceback (most recent call last)

~/softwares/anaconda3/lib/python3.8/site-packages/muon/_core/plot.py in embedding(data, basis, color, use_raw, layer, >**kwargs)
120 if layer is not None:
121 if layer in data.mod[m].layers:
--> 122 fmod_adata.X = data.mod[m][:, mod_keys].layers[layer].X
123 if use_raw:
124 warnings.warn(f"Layer='{layer}' superseded use_raw={use_raw}")

~/softwares/anaconda3/lib/python3.8/site-packages/scipy/sparse/base.py in getattr(self, attr)
685 return self.getnnz()
686 else:
--> 687 raise AttributeError(attr + " not found")
688
689 def transpose(self, axes=None, copy=False):

AttributeError: X not found

Additional context
This error appears to be caused by a typo. In fact a simple deletion of the .X will fix this bug.https://github.com/PMBio/muon/blob/e77bbf3a2acb1c92dae4c13419cd7335602c2735/muon/_core/plot.py#L122

Thanks developers for providing such a great tool.

Shape Issue with mu.tl.mofa

Dear Muon Team,

I used your program with great success (really great usability!), up until I wanted to integrate the RNA and ATAC data into a joint latent space.
My object looks as following:

muData object with n_obs × n_vars = 9817 × 304944
obs: 'sample'
var: 'feature_types', 'genome'
2 modalities
rna: 9817 x 16660
obs: 'sample', 'int_id', 'seq_id_gex_id', 'seq_id_atac', 'reporter', 'n_counts', 'log_cell_probs', 'cell_confirmed', 'log_counts', 'n_genes', [...]
var: 'gene_ids', 'feature_types', 'genome', 'interval', 'n_cells-0', 'ambient_genes_values-0', 'is_ambient-0', 'is_ambi_key-0',
[...]
uns: 'final_doublets_cat_colors', 'louvain', 'louvain_3_colors', 'louvain_colors', 'neighbors', 'reporter_colors', 'sample_colors', 'batch_colors', 'log1p', 'umap'
obsm: 'X_pca', 'X_umap'
layers: 'counts'
obsp: 'connectivities', 'distances'
atac: 9817 x 288284
obs: 'seurat_id', 'aggr_barcodes', 'sample', 'int_id', 'seq_id_gex_id', 'seq_id_atac', 'reporter', 'size_factors', 'louvain'
var: 'feature_types', 'modality', 'genome'
uns: 'files', 'log1p', 'neighbors', 'louvain', 'umap', 'sample_colors'
obsm: 'X_pca', 'X_umap'
layers: 'counts'
obsp: 'distances', 'connectivities'

When I then try to run
mu.tl.mofa(mdata2, groups_label = 'sample')
the result is always the same with:

ValueError:` shapes (9817,16660) and (9817,16660) not aligned: 16660 (dim 1) != 9817 (dim 0)

And I just cant make sense of it. Thank you very much for your help!

conda package

Hey,

I can't seem to find muon on any conda channel. Would suggest to add muon to conda-forge.

Cheers

Peak GC content quantification in ATAC module

Hi there,

First of all, every time I go back to working with muon it gets better and better, so thanks for the great work!

I was wondering whether you had thought of adding a muon.atac.pp function to compute GC content of peaks. This is frequently used for peak-gene association testing (e.g. here), and as far as I am aware there is no easy functionality in python for this (I've always used addGCBias from chromVAR in R).

Reading a backed AnnData from the .h5mu fails

Expected to be able to read a modality in a backed mode but it currently fails:

mu.read('pbmc10k.h5mu/atac', backed=True)
# or 
mu.read_h5ad('pbmc10k.h5mu', mod='atac', backed=True)
# => TypeError: __init__() got an unexpected keyword argument 'mode'

It seems like mu.read_h5ad() passes {mode: hdf5_mode} to the AnnData constructor, which can't handle it.

Update dsb interface

dsb normalisation has been implemented in #9.

The reference implementation now features an updated interface (DSBNormalizeProtein), which we should consider to adopt. This should improve user experience when applying dsb normalisation.

scverse / muon Goto Github PK

muon's People

Contributors

Stargazers

Watchers

Forkers

muon's Issues

Description

Description

Description

Recommend Projects

Recommend Topics

Recommend Org