scverse / muon Goto Github PK
View Code? Open in Web Editor NEWmuon is a multimodal omics Python framework
Home Page: https://muon.scverse.org/
License: BSD 3-Clause "New" or "Revised" License
muon is a multimodal omics Python framework
Home Page: https://muon.scverse.org/
License: BSD 3-Clause "New" or "Revised" License
Muon introduced in-place filtering API such as mu.pp.filter_obs()
. It modifies all the attributes in place according to the specified axis (e.g. observations). When operated on the view of an object — AnnData
or MuData
— , in-place filtering behaviour is not strictly defined.
Considering the Zen, in-place filtering should only work for objects that are not views, and throw an error with a clear message on views since there's already an explicit way to get a filtered object that is not a view:
# Should not work on views
ad2 = mu.pp.filter_obs(ad_view, some_obs)
# => TypeError
# Works on views and has explicit behaviour
ad2 = ad_view[some_obs].copy()
Hi! I am wondering if there is a way to copy MuData objects (i.e. MuData_copy = MuData.copy()
) without getting TypeError: cannot unpack non-iterable NoneType object
Originally posted by @SamuelAMiller1 in #24 (comment)
Hi there!
I wanted to thank you for this fantastic tool and beautiful tutorials. It has saved me SO much time and headache.
I wondered if there is a way to export the plots I make with muon to pdf, tiff, or png files? I apologize if this was already covered and I missed it; I don't see a 'save' parameter like the one in Scanpy, nor can I use plt.savefig() function as one can with seaborn plots.
I apologize for this trivial question/request, but I hoped to export said pdfs into illustrator to edit the axis labels.
Thank you for your time!
Cheers,
Fran
Hi there,
I think that there is some bug in the count_fragments_features atac features. The gene activity I created from those are very bad looking. This is issue is separate from the other issue where the strand issue is considered. I think the issue might have been with the use of the lil matrix.
The first is with the muon function ,while the second picture is the one which i construct it with a coo matrix(with the strand fix).
I created a MuData object that contains the AnnData for 2 modalities, did some basic filtering of the datasets and then tried to save it with: joint.write("joint_data.h5mu")
but this throws the following error:
TypeError Traceback (most recent call last)
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
213 try:
--> 214 return func(elem, key, val, *args, **kwargs)
215 except Exception as e:
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in write_elem(f, k, elem, modifiers, *args, **kwargs)
174 else:
--> 175 _REGISTRY.get_writer(dest_type, t, modifiers)(f, k, elem, *args, **kwargs)
176
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/specs/registry.py in get_writer(self, dest_type, typ, modifiers)
63 if (dest_type, typ, modifiers) not in self.write:
---> 64 raise TypeError(
65 f"No method has been defined for writing {typ} elements to {dest_type}"
TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>
The above exception was the direct cause of the following exception:
TypeError Traceback (most recent call last)
/tmp/efec76988f/ipykernel_20272/4022115007.py in <module>
----> 1 joint.write("../Merged/929_cancer/929_cancer_joint_data.h5mu")
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/mudata.py in write_h5mu(self, filename, **kwargs)
1084 raise ValueError("Provide a filename!")
1085 else:
-> 1086 write_h5mu(filename, self, **kwargs)
1087 if self.isbacked:
1088 self.file.filename = filename
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in write_h5mu(filename, mdata, **kwargs)
207
208 with h5py.File(filename, "w", userblock_size=512) as f:
--> 209 _write_h5mu(f, mdata, **kwargs)
210 with open(filename, "br+") as f:
211 nbytes = f.write(
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/mudata/_core/io.py in _write_h5mu(file, mdata, write_data, **kwargs)
44 dataset_kwargs=kwargs,
45 )
---> 46 write_attribute(file, "obsm", mdata.obsm, dataset_kwargs=kwargs)
47 write_attribute(file, "varm", mdata.varm, dataset_kwargs=kwargs)
48 write_attribute(file, "obsp", mdata.obsp, dataset_kwargs=kwargs)
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in write_attribute(*args, **kwargs)
132 DeprecationWarning,
133 )
--> 134 return write_elem(*args, **kwargs)
135
136
/data/leuven/miniconda3/envs/pytorch2/lib/python3.8/site-packages/anndata/_io/utils.py in func_wrapper(elem, key, val, *args, **kwargs)
218 else:
219 parent = _get_parent(elem)
--> 220 raise type(e)(
221 f"{e}\n\n"
222 f"Above error raised while writing key {key!r} of {type(elem)} "
TypeError: No method has been defined for writing <class 'mudata._core.mudata.MuAxisArrays'> elements to <class 'h5py._hl.group.Group'>
Above error raised while writing key 'obsm' of <class 'h5py._hl.files.File'> to /
I also tried to save only a MuData object with just the raw matrices (no more metadata), but it throws the same error, also when trying to save each of the modalities alone (in a MuData object with only 1 modality).
I am using python '3.8.12', scanpy '1.9.1' and muon '0.1.2'
Thank you for your help, this is a very useful tool.
I got this error from running ac.pl.fragment_histogram(atac, region='chr1:1-2000000')
KeyError Traceback (most recent call last)
in
----> 1 ac.pl.fragment_histogram(atac, region='chr1:1-2000000')
~\Anaconda3\lib\site-packages\muon_atac\plot.py in fragment_histogram(data, region, groupby)
330 raise TypeError("Expected AnnData or MuData object with 'atac' modality")
331
--> 332 fragment_path = adata.uns["files"]["fragments"]
333 fragments = tools.fetch_regions_to_df(fragment_path=fragment_path, features=region)
334
~\Anaconda3\lib\site-packages\anndata\compat_overloaded_dict.py in getitem(self, key)
98 return self.overloaded[key].get()
99 else:
--> 100 return self.data[key]
101
102 def setitem(self, key, value):
KeyError: 'files'
I got another error by running ac.tl.nucleosome_signal(atac, n=1e6)
KeyError Traceback (most recent call last)
in
----> 1 ac.tl.nucleosome_signal(atac, n=1e6)
~\Anaconda3\lib\site-packages\muon_atac\tools.py in nucleosome_signal(data, n, nucleosome_free_upper_bound, mononuleosomal_upper_bound)
1031
1032 if "files" not in adata.uns or "fragments" not in adata.uns["files"]:
-> 1033 raise KeyError(
1034 "There is no fragments file located yet. Run muon.atac.tl.locate_fragments first."
1035 )
KeyError: 'There is no fragments file located yet. Run muon.atac.tl.locate_fragments first.'
Not sure what to do?
Is there a way to get the UMAP with distal vs promoter peaks on the integrated dataset? Similar to the function ac.pl.umap(atac, color=["KLF4"], average="peak_type") but using mu.pl.umap(mdata, color=["KLF4", "chr9:107480158-107492721"]).
The implementation of CLR here performs a within feature normalization. Seurat normalizes across features (see here). It would be good to have the option to choose between the two approaches, e.g. via an attribute flavor
. Alternatively, the argument axis
could be used to indicate along which axis to normalize.
Hi,
Is it possible to concatenate two mudata objects. I do not see a method in the reference API.
Presently I concatenate two objects separately for each modality using anndata.concatenate and then create a new mudata obj.
It would be nice to have an anndata.concatenate like facility for mudata too.
Vijay
Running mu.pp.intersect_obs erroneously introduces NaNs into mdata.obs
Minimal working example:
import numpy as np
import muon as mu
def test_for_nans():
assert mdata.obs['batch'].isna().sum() == 0
x = mu.AnnData(X=np.random.normal(size=1000).reshape(-1, 10))
y = mu.AnnData(X=np.random.normal(size=1000).reshape(-1, 10))
batches = np.random.choice(["a", "b", "c"], size=100, replace=True)
mdata = mu.MuData({"rna": x, "prot": y})
mdata.obs['batch'] = batches
test_for_nans() # no error
mdata['rna'].obs['total_count'] = mdata['rna'].X.sum(axis=1)
mdata['rna'].obs['min_count'] = mdata['rna'].X.min(axis=1)
mdata.update()
# filter one of the modalities.
mu.pp.filter_obs(mdata['rna'], 'min_count', lambda x: (x < -2))
mu.pp.intersect_obs(mdata)
test_for_nans() # assert is False so it returns an error, in fact all of mdata.obs['batch'] are nans
In my tests above the mdata.obs['batch'] are all NaNs after running intersect obs.
Weirdly in bigger datasets, sometimes a really small number of data entries are not NaNs.
System
OS: CentOS Linux release 7.8.2003 (Core)
Python 3.9.5
Versions of libraries involved
numpy 1.20.3
muon 0.1.1 (installed from github today (2021-11-30) using pip install git+https://github.com/gtca/muon
)
When I try to import muon I get this error
"Illegal instruction: 4"
I've read this happened in other packages like tensor flow in users with M1 and M1 Pro chips, although there is no clear solution. I've updated my OS to the latest (macOS Monterey 12.6) but the bug persists. My laptop has an M1 Pro chip. Not sure if this is something that the developers can fix but just wanted to put it out there in case someone else encounters the same issue.
Hi,
The documentation for muon.atac.tl.scan_sequences
(https://muon.readthedocs.io/en/latest/api/generated/muon.atac.tl.scan_sequences.html) does not seem to be up to date. The function's signature and the parameter description do not match.
Best
GJ
Hi,
When trying to write the mudata
object, h5py
crashes due to some .uns
values being OrderedDict
instead of normal dictionaries.
TypeError: No method has been defined for writing <class 'collections.OrderedDict'> elements to <class 'h5py._hl.group.Group'>
Here is how to reproduce the bug:
import muon as mu
import os
# Read and create mdata object
data_dir = 'pbmc10k/'
mdata = mu.read_10x_h5(os.path.join(data_dir, "filtered_feature_bc_matrix.h5"))
mdata.var_names_make_unique()
# Write
mdata.write("test.h5mu")
The pbmc10k
is the data used in this tutorial.
Apparently, when reading the inputs, mu.read_10x_h5
creates two OrderedDict
s in mdata.mod['atac'].uns
. If I set them to dict
then it saves the object correctly but it would be nice to handle this automatically.
mdata.mod['atac'].uns['files'] = dict(mdata.mod['atac'].uns['files'])
mdata.mod['atac'].uns['atac'] = dict(mdata.mod['atac'].uns['atac'])
Thanks you for your time!
System
Describe the bug
Thanks developers for providing such a great tool.
I found that it will triger a AttributeError
when I make a copy for a view.
To Reproduce
Steps to reproduce the behaviour.
a = mu.MuData(
{
"test": sc.AnnData(
X=np.ones([2, 2]),
obs=pd.DataFrame(index=["a", "b"]),
var=pd.DataFrame(index=["a", "b"]),
),
}
)
a.copy()
b = a['a',:]
# b._adata_ref = b._mudata_ref
b.copy()
Then we will get a AttributeError:
AttributeError: 'MuData' object has no attribute '_adata_ref'
And if we set _adata_ref
variable for the view, this error would be avoided.
AnnData supports backing attributes instead of loading the full file into the memory.
Implementing this functionality for muon's I/O, akin to read_h5ad_backed
here, will make it more comfortable to use .h5mu
files.
This implementation should probably include a new backed: bool
argument for I/O functions without adding new public functions.
Backed functionality for MuData objects implies using backed=True
for all AnnData objects inside it.
Feature barcoding is one of the approaches to complement gene expression data with other laters of information from the same cell.
For instance, CITE-seq allows to quantitatively detect surface proteins using antibody-bound oligos.
This layer of information corresponds to the Antibody Capture
feature type in a 10X Genomics-formatted output.
It seems there is still no consensus on the best performing normalisation method for CITE-seq. In addition to the ones discussed at the linked thread, there is also a dsb
method for normalising and denoising CITE-seq data but it is only available in R.
As a pre-processing method for a specific modality, it would live in the muon/_prot/preproc.py
file, which would be importable and called as expected:
from muon import prot as pt
pt.pp.dsb(...)
This method uses information from the raw (non-filtered) matrix so there's also a question of designing an API that is not too confusing. Ideally, we would want to stick to a familiar interface for preprocessing. To make things simple, this method might operate on two MuData objects, one with filtered counts and another one with raw counts:
pt.pp.dsb(mdata, mdata_raw, ...)
The normalisation and denoising would be performed on the mdata_raw
object and normalised protein counts for respective cells would be copied into the mdata
object. Threshold for negative and positive (cells) droplets can be expected to be provided manually (e.g. empty_counts_min
& empty_counts_max
, cell_counts_min
& cell_counts_max
).
It should also be made possible to work with a single mdata
object but it has to be the raw data. Since it is usually not the case, a warning should be displayed.
pt.pp.dsb(mdata_raw, ...)
# => display warning
Indentations in muon/_atac/preproc.py
contains mixed spaces and tabs.
Hi,
is there an easy way to use e.g. mdata.obsm['X_mofa']
representation to calculate neighbors? Now I create a new AnnData object from the representation, calculate neighbors with sc.pp.neighbors()
, and then store the connectivities and distances in the mdata.obsp
, but it would be nice to have a wrapper around this in muon. Thanks!
If a modality has been backed inside .h5mu, on copy/write operations we might want to have a clearly defined behaviour. In particular the proposal is to differentiate between modalities linked to the MuData object and modalities detached from it:
mdata = mu.read('rna_atac.h5mu', backed=True)
rna = mdata['rna']
# ^__ linked
rna.copy('rna.h5ad')
# => Error referring the user to the MuData object
rna = mdata.mod['rna']
# ^__ detached
rna.copy('rna.h5ad')
# => Success
This proposal has been brought forward by @ilia-kats and has been originally discussed in the issue #17.
As discussed in #49, documentation should be more clear that mu.pl.*
support arguments from corresponding sc.pl.*
functions.
I think it would be useful to add a datasets module. It's very useful for prototyping and debugging.
For prototyping, you've always got a couple extra objects to try a function on. This is also great for downstream library authors.
For debugging: you can quickly grab a dataset no matter what system you're on. And when trying to debug remotely with a user, it's an easy shared source of truth for replication.
Describe the bug
Drawing umap graphs with the same variables in multiple mods will result in errors, even if axis=-1 is set.
To Reproduce
import muon as mu
import scanpy as sc
import pandas as pd
df = pd.DataFrame([[1,2,3]]*3, index=['a','b', 'c'], columns=['e','f','g'])
ad = sc.AnnData(df)
ad.obsm['X_umap'] = pd.DataFrame([[1,2]] * 3).values
md = mu.MuData({'aa': ad, 'bb': ad[:, :2]}, axis=-1)
print(md)
#MuData object with n_obs × n_vars = 3 × 3
# 2 modalities
# aa: 3 x 3
# obsm: 'X_umap'
# bb: 3 x 2
# obsm: 'X_umap'
mu.pl.embedding(md, 'aa:umap', color='e')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-820ff8a24f5e> in <module>
----> 1 mu.pl.embedding(md, 'aa:umap', color='e')
~/softwares/anaconda3/lib/python3.8/site-packages/muon/_core/plot.py in embedding(data, basis, color, use_raw, layer, **kwargs)
228 )
229 x = fmod_adata.X.toarray() if issparse(fmod_adata.X) else fmod_adata.X
--> 230 obs = obs.join(
231 pd.DataFrame(x, columns=mod_keys, index=fmod_adata.obs_names),
232 how="left",
~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in join(self, other, on, how, lsuffix, rsuffix, sort)
9097 5 K5 A5 NaN
9098 """
-> 9099 return self._join_compat(
9100 other, on=on, how=how, lsuffix=lsuffix, rsuffix=rsuffix, sort=sort
9101 )
~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py in _join_compat(self, other, on, how, lsuffix, rsuffix, sort)
9128 sort=sort,
9129 )
-> 9130 return merge(
9131 self,
9132 other,
~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
119 validate=validate,
120 )
--> 121 return op.get_result()
122
123
~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in get_result(self)
715 join_index, left_indexer, right_indexer = self._get_join_info()
716
--> 717 llabels, rlabels = _items_overlap_with_suffix(
718 self.left._info_axis, self.right._info_axis, self.suffixes
719 )
~/softwares/anaconda3/lib/python3.8/site-packages/pandas/core/reshape/merge.py in _items_overlap_with_suffix(left, right, suffixes)
2306
2307 if not lsuffix and not rsuffix:
-> 2308 raise ValueError(f"columns overlap but no suffix specified: {to_rename}")
2309
2310 def renamer(x, suffix):
ValueError: columns overlap but no suffix specified: Index(['e'], dtype='object')
System
Additional context
I noticed that in 0.1.3 there is support for setting the layers used by each mod in a dictionary way. But that doesn't get rid of the error.I suppose this is due to when iterating over each mod, if a variable is shared across multiple mods, it will result in the variable not being successfully joined. This may also be the reason why mudata
emphasizes ensuring the uniqueness of variables in the mod as much as possible.
But when a mod is only saving a subset of another mod (i.e. axis=-1), this error should be got around. I think it is possible to simply add a mod parameter, so that when iterating, it only iterates over the manually specified mod.
Describe the bug
Value passed for key 'LFs' is of incorrect shape. Values of varm must match dimensions (1,) of parent. Value had shape (61452, 10) while it should have had (150850,)
To Reproduce
mu.tl.mofa(mudata)
Additional context
The mudata object is not updated if I subset either the rna or atac slot. As you can see here, the MuData is still 150k features but atac + rna is only 60k features. The object is only updated when its written and read again
MuData object with n_obs × n_vars = 7098 × 150850
rna: 7098 x 21894
obs: 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'celltype'
var: 'gene_ids', 'feature_types', 'genome', 'interval', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'highly_variable_rank', 'means', 'variances', 'variances_norm', 'mean', 'std'
uns: 'hvg', 'log1p', 'pca', 'neighbors', 'umap'
obsm: 'X_pca', 'X_umap'
varm: 'PCs'
layers: 'counts'
obsp: 'distances', 'connectivities'
atac: 7098 x 39558
obs: 'n_genes_by_counts', 'total_counts', 'nucleosome_signal', 'tss_score'
var: 'gene_ids', 'feature_types', 'genome', 'interval', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
uns: 'atac', 'files', 'hvg', 'lsi', 'neighbors', 'umap'
obsm: 'X_lsi', 'X_umap'
varm: 'LSI'
layers: 'counts'
obsp: 'distances', 'connectivities'
Thanks for muon. Following the CITE-seq tutorial, I'd like to do simple scatter plots and regression to show the RNA-prot correlation for particular markers, something like in the Seurat muli-modal tutorial.:
I tried sc.pl.scatter(mdata, x="prot:CD3D", y="rna:CD3D")
This raises: KeyError: 'There is no key prot:CD3D in MuData .obs or in .obs of any modalities.'
Of course the gene/protein names are stored in .var
not .obs
.
Suggestions appreciated, thank you
We didn't establish a clear policy around logging early on, so now some parts of muon use Scanpy's logger, while others issue plain warnings using the warnings
module. Ideally, this should be unified, one way or another.
Modalities backed inside a .h5mu file can't be copied to a new location as expected:
>>> mdata = mu.read("rna_atac.h5mu", backed=True)
>>> mdata['rna'].copy("rna.h5ad")
KeyError: "Unable to open object (object 'mod' doesn't exist)"
I want to perform multiple filtering steps on my object e.g.:
mu.pp.filter_obs(mdata, 'n_genes_by_counts', lambda x: (x >= 100))
mu.pp.filter_obs(mdata, 'total_counts', lambda x: (x >= 500) & (x <= 50000)
Error message:
muon/_core/preproc.py in filter_obs(data, var, func)
733 # filter_obs() for each modality
734 for m, mod in data.mod.items():
--> 735 obsmap = data.obsmap[m][obs_subset]
736 obsmap = obsmap[obsmap != 0] - 1
737 filter_obs(mod, mod.obs_names[obsmap])
anndata/_core/aligned_mapping.py in __getitem__(self, key)
146
147 def __getitem__(self, key: str) -> V:
--> 148 return self._data[key]
149
150 def __setitem__(self, key: str, value: V):
KeyError: 'x'
Minimal working example:
import numpy as np
import muon as mu
x = mu.AnnData(X=np.random.normal(size=1000).reshape(-1, 100))
y = mu.AnnData(X=np.random.normal(size=2000).reshape(-1, 100))
md = mu.MuData({"x": x, "y": y})
md['x'].obs['total_count'] = md['x'].X.sum(axis=1)
md['x'].obs['min_count'] = md['x'].X.min(axis=1)
md.update()
# filter one of the modalities.
mu.pp.filter_obs(md, 'x:min_count', lambda x: (x < -2))
mu.pp.filter_obs(md, 'x:total_count', lambda x: (x >0))
If you put md.update()
between the two filter statements it does work. But to me this is not ideal if you are filtering on lots of categories sequentially
Is this desired behaviour ot something that could be updated such that one could just one run update
command after sequential filters??
Thanks!
Both louvain.find_partition_multiplex()
and leidenalg.find_partition_multiplex()
accept multiple graphs as input. That would enable mu.tl.louvain
and mu.tl.leiden
that would take .obsp[neighbors_key]
(see louvain and leiden interfaces in scanpy) from each modality — or cast to sc.tl.louvain
/ sc.tl.leiden
when an AnnData object is provided.
This becomes slightly more sophisticated when MuData itself has .obsp['connectivities']
. For the beginning, we could probably assume that mu.tl.louvain
/ mu.tl.leiden
are multiplex-only algorithms when operating on MuData objects, and for taking use of mdata.obsp['connectivities']
, it's sc.tl.louvain
/ sc.tl.leiden
that should be used.
Describe the bug
The extend_upstream
and extend_downstream
arguments in muon.atac.tl.count_fragments_features do not work as expected because they do not factor in strands. For genes on both strands, the upstream region is extended to Start - extend_upstream
, but for genes on the -
strand, it should be extended to End + extend_upstream
.
To Reproduce
I created a debugging dataset consisting of 2 cells using this fragments file below:
1 63082 63282 cell1 2
1 65082 65282 cell1 2
1 67082 67282 cell1 1
1 69082 69282 cell1 1
1 71082 71282 cell1 1
1 83678 83878 cell1 10
1 85678 85878 cell1 10
1 87678 87878 cell1 10
1 89678 89878 cell1 100
1 91678 91878 cell1 100
2 131043 131243 cell2 200
2 133043 133243 cell2 200
2 135043 135243 cell2 20
2 137043 137243 cell2 20
2 139043 139243 cell2 20
2 215701 215901 cell2 2
2 217701 217901 cell2 2
2 219701 219901 cell2 2
2 221701 221901 cell2 4
2 223701 223901 cell2 4
for these two genes (in GFF3 format); ENSMMUG00000000634 is on the +
strand and ENSMMUG00000000051 is on the -
strand.
1 ensembl gene 71582 83178 . + . ID=gene:ENSMMUG00000000634;Name=ZNF692;biotype=protein_coding;description=zinc finger protein 692 [Source:VGNC Symbol%3BAcc:VGNC:100195];gene_id=ENSMMUG00000000634;logic_name=ensembl;version=4
2 ensembl gene 139543 215201 . - . ID=gene:ENSMMUG00000000051;Name=LMLN;biotype=protein_coding;description=leishmanolysin like peptidase [Source:VGNC Symbol%3BAcc:VGNC:74296];gene_id=ENSMMUG00000000051;logic_name=ensembl;version=4
Expected behaviour
On the test dataset, using extend_upstream = 5000
should count 3 fragments for cell1 upstream of ENSMMUG00000000634 (+
strand) and 6 fragments for cell2 upstream of ENSMMUG00000000051 (-
strand). Instead, it counts 3 fragments and 60 fragments, respectively. extend_downstream = 5000
actually has the desired behavior for the -
strand (given that I am interested in the upstream region). So I was able to work around this by running muon.atac.tl.count_fragments_features
separately on the +
strand (using extend_upstream = 5000
) and -
strand genes in my data (using extend_downstream = 5000
). But it would be preferable for the function to directly incorporate strand information.
System
Additional context
I attached the following files for the trial described above
test-anndata.h5ad
: AnnData object in h5ad format.test-fragments.txt.gz
: fragments file compressed with bgziptest-fragments.txt.gz.tbi
: Tabix index for fragments filetest-annotation.pkl
: pandas data frame containing gene annotations (two genes only) in pickle formatMy results can be reproduced with the following:
import anndata as ad
import pandas as pd
import muon as mu
test = ad.read_h5ad('test-anndata.h5ad')
annotation = pd.read_pickle('test-annotation.pkl')
mu.atac.tl.locate_fragments(test,fragments='test-fragments.txt.gz')
result = mu.atac.tl.count_fragments_features(test,annotation,extend_upstream=5000,extend_downstream=0)
print(result['cell1','ENSMMUG00000000634'].X)
# (0, 0) 3.0
print(result['cell2','ENSMMUG00000000051'].X)
# (0, 0) 60.0
result = mu.atac.tl.count_fragments_features(test,annotation,extend_upstream=0,extend_downstream=5000)
print(result['cell1','ENSMMUG00000000634'].X)
# (0, 0) 30.0
print(result['cell2','ENSMMUG00000000051'].X)
# (0, 0) 6.0
Aggregating peak counts from the .raw
slot when plotting ATAC-seq data is not supported when the AnnData object is backed:
mdata = mu.read('data/pbmc10k.h5mu', backed=True)
ac.pl.pca(mdata['atac'], color="KLF4")
# => IndexError: tuple index out of range
It is expected to work, and it should be straightforward to fix.
To match prospective user's expectations, it should be possible to slice MuData objects.
While the actual implementation of this feature is expected to be quite verbose, the syntax is straightforward:
mdata[:100,:] # => mdata object with first 100 samples/cells and all features
mdata[list_of_sample_ids] # => mdata object with respective samples/cells subsetted
mdata[:,list_of_features] # => mdata object with respective features subsetted
The implementation would alter MuData.__getitem__
method.
An additional point of consideration is whether a view should be returned. In that case, a view functionality has to be implemented for MuData (e.g. see AnnData._init_as_view
). That would probably mean referencing each AnnData inside MuData object as a view.
Hello, thanks for muon!
I was trying to reproduce the steps described in the ATAC tutorial (https://muon-tutorials.readthedocs.io/en/latest/single-cell-rna-atac/pbmc10k/2-Chromatin-Accessibility-Processing.html) on my independent dataset, but I get an error from the ac.pl.fragment_histogram function.
The code:
import numpy as np
import pandas as pd
import scanpy as sc
import anndata as ad
import scipy
import anndata2ri
## Plotting utils
import matplotlib.pyplot as plt
import matplotlib
sc.settings.verbosity = 3 # verbosity: errors (0), warnings (1), info (2), hints (3)
sc.settings.set_figure_params(dpi=150, color_map='viridis', transparent=True,
ipython_format="png2x")
sc.logging.print_header()
import muon as mu
# Import a module with ATAC-seq-related functions
from muon import atac as ac
atac = mu.read(path + "output/file_postrnatutorial.h5mu")
atac.var['mt'] = atac.var_names.str.startswith('MT-') # annotate the group of mitochondrial genes as 'mt'
sc.pp.calculate_qc_metrics(atac, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
mu.pp.filter_var(atac, 'n_cells_by_counts', lambda x: x >= 10)
mu.pp.filter_obs(atac, 'n_genes_by_counts', lambda x: (x >= 2000) & (x <= 40000))
mu.pp.filter_obs(atac, 'total_counts', lambda x: (x >= 4000) & (x <= 100000))
atac.obs['NS']=1
ac.pl.fragment_histogram(atac, region='chr1:9808-10702')
The error is:
ValueError Traceback (most recent call last)
<ipython-input-16-0ef9424cd717> in <module>
----> 1 ac.pl.fragment_histogram(K29TIL_UD_atac, region='chr1:9808-10702')
/usr/local/lib/python3.6/dist-packages/muon/_atac/plot.py in fragment_histogram(data, region, groupby, barcodes)
372 else:
373 # Handle sns.distplot deprecation and sns.histplot addition
--> 374 g = hist(fragments.length, **kwargs)
375 g.set_xlabel("Fragment length (bp)")
376 g.set(xlim=(0, 1000))
/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py in histplot(data, x, y, hue, weights, stat, bins, binwidth, binrange, discrete, cumulative, common_bins, common_norm, multiple, element, fill, shrink, kde, kde_kws, line_kws, thresh, pthresh, pmax, cbar, cbar_ax, cbar_kws, palette, hue_order, hue_norm, color, log_scale, legend, ax, **kwargs)
1434 estimate_kws=estimate_kws,
1435 line_kws=line_kws,
-> 1436 **kwargs,
1437 )
1438
/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py in plot_univariate_histogram(self, multiple, element, fill, common_norm, common_bins, shrink, kde, kde_kws, color, legend, line_kws, estimate_kws, **plot_kws)
435
436 # Do the histogram computation
--> 437 heights, edges = estimator(observations, weights=weights)
438
439 # Rescale the smoothed curve to match the histogram
/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in __call__(self, x1, x2, weights)
369 """Count the occurrances in each bin, maybe normalize."""
370 if x2 is None:
--> 371 return self._eval_univariate(x1, weights)
372 else:
373 return self._eval_bivariate(x1, x2, weights)
/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in _eval_univariate(self, x, weights)
346 bin_edges = self.bin_edges
347 if bin_edges is None:
--> 348 bin_edges = self.define_bin_edges(x, weights=weights, cache=False)
349
350 density = self.stat == "density"
/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in define_bin_edges(self, x1, x2, weights, cache)
264
265 bin_edges = self._define_bin_edges(
--> 266 x1, weights, self.bins, self.binwidth, self.binrange, self.discrete,
267 )
268
/usr/local/lib/python3.6/dist-packages/seaborn/_statistics.py in _define_bin_edges(self, x, weights, bins, binwidth, binrange, discrete)
252 elif binwidth is not None:
253 step = binwidth
--> 254 bin_edges = np.arange(start, stop + step, step)
255 else:
256 bin_edges = np.histogram_bin_edges(
ValueError: arange: cannot compute length
The correct fragment file is stored in atac.uns['files'] and I've installed the relevant dependencies with pip install 'muon[atac]'. Any idea what the issue might be?
System
Thanks!
Rasa
when
import jax
import numpy as np
import anndata as ad
import muon as mu
from muon import MuData
z = jax.random.normal(key=jax.random.PRNGKey(1), shape=(100,3)) * jax.numpy.array([2, 3, 4])
w = jax.random.normal(key=jax.random.PRNGKey(1), shape=(3,200))
y = z @ w
a1 = ad.AnnData(np.array(y[:,:150]))
a2 = ad.AnnData(np.array(y[:,150:]))
mdata = MuData({"a1": a1, "a2": a2})
mdata.var_names_make_unique()
mdata.obs["group"] = jax.random.choice(key=jax.random.PRNGKey(1), a=jax.numpy.array([0, 1]), shape=(100,))
mdata.obs.group = mdata.obs.group.astype("category")
mu.tl.mofa(mdata, groups_label="group")
#########################################################
### __ __ ____ ______ ###
### | \/ |/ __ \| ____/\ _ ###
### | \ / | | | | |__ / \ _| |_ ###
### | |\/| | | | | __/ /\ \_ _| ###
### | | | | |__| | | / ____ \|_| ###
### |_| |_|\____/|_|/_/ \_\ ###
### ###
#########################################################
Loaded view='a1' group='0' with N=59 samples and D=150 features...
Loaded view='a1' group='1' with N=41 samples and D=150 features...
Loaded view='a2' group='0' with N=59 samples and D=50 features...
Loaded view='a2' group='1' with N=41 samples and D=50 features...
Model options:
- Automatic Relevance Determination prior on the factors: True
- Automatic Relevance Determination prior on the weights: True
- Spike-and-slab prior on the factors: False
- Spike-and-slab prior on the weights: True
Likelihoods:
- View 0 (a1): gaussian
- View 1 (a2): gaussian
######################################
## Training the model with seed 1 ##
######################################
Converged!
#######################
## Training finished ##
#######################
Saving model in /tmp/mofa_20220802-101431.hdf5...
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [6], in <cell line: 15>()
13 mdata.obs["group"] = jax.random.choice(key=jax.random.PRNGKey(1), a=jax.numpy.array([0, 1]), shape=(100,))
14 mdata.obs.group = mdata.obs.group.astype("category")
---> 15 mu.tl.mofa(mdata, groups_label="group")
File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/muon/_core/tools.py:581, in mofa(data, groups_label, use_raw, use_layer, use_var, use_obs, likelihoods, n_factors, scale_views, scale_groups, center_groups, ard_weights, ard_factors, spikeslab_weights, spikeslab_factors, n_iterations, convergence_mode, gpu_mode, use_float32, smooth_covariate, smooth_warping, smooth_kwargs, save_parameters, save_data, save_metadata, seed, outfile, expectations, save_interrupted, verbose, quiet, copy)
579 else:
580 if groups_label:
--> 581 z = pd.DataFrame(z, index=zs).loc[mdata.obs.index.values].to_numpy()
582 data.obsm["X_mofa"] = z
584 # Weights
File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/pandas/core/frame.py:694, in DataFrame.__init__(self, data, index, columns, dtype, copy)
684 mgr = dict_to_mgr(
685 # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
686 # attribute "name"
(...)
691 typ=manager,
692 )
693 else:
--> 694 mgr = ndarray_to_mgr(
695 data,
696 index,
697 columns,
698 dtype=dtype,
699 copy=copy,
700 typ=manager,
701 )
703 # For data is list-like, or Iterable (will consume into list)
704 elif is_list_like(data):
File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/pandas/core/internals/construction.py:351, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
346 # _prep_ndarray ensures that values.ndim == 2 at this point
347 index, columns = _get_axes(
348 values.shape[0], values.shape[1], index=index, columns=columns
349 )
--> 351 _check_values_indices_shape_match(values, index, columns)
353 if typ == "array":
355 if issubclass(values.dtype.type, str):
File ~/miniconda3/envs/scvi-env/lib/python3.9/site-packages/pandas/core/internals/construction.py:418, in _check_values_indices_shape_match(values, index, columns)
414 if values.shape[1] != len(columns) or values.shape[0] != len(index):
415 # Could let this raise in Block constructor, but we get a more
416 # helpful exception message this way.
417 if values.shape[0] == 0:
--> 418 raise ValueError("Empty data passed with indices specified.")
420 passed = values.shape
421 implied = (len(index), len(columns))
ValueError: Empty data passed with indices specified.
HOWEVER,
mdata.obs.group = mdata.obs.group.astype(str)
can run !
So,it is a type error.
I may be missing something: Is there a reason why scipy
's issparse
function is not used e.g. here:
Thanks for the great work on this toolkit!
I came across a potential bug while testing mu.pp.neighbors
: when I construct a MuData
object, save it and re-load it with mu.read
, if the .uns
slot was empty it gets set as None
when reloaded. This causes mu.pp.neighbors
to throw an error.
Example:
I make a new MuData with nothing in mdata.uns
:
> mdata = mu.MuData({'rna': rna_adata, 'atac': atac_adata})
> mdata.uns
{}
Saving and re-loading:
> mdata.write("/path/to/mdata.h5mu")
> mdata = mu.read("/path/to/mdata.h5mu")
> mdata.uns
Running WNN (having saved KNN graphs for each modality):
> mu.pp.neighbors(mdata)
TypeError Traceback (most recent call last)
/tmp/ipykernel_669/3502185260.py in <module>
----> 1 mu.pp.neighbors(mdata)
~/my-conda-envs/sc2021-multiomics/lib/python3.9/site-packages/muon/_core/preproc.py in neighbors(mdata, n_neighbors, n_bandwidth_neighbors, n_multineighbors, neighbor_keys, metric, low_memory, key_added, weight_key, add_weights_to_modalities, eps, copy, random_state)
598 mdata.obsp[dists_key] = neighbordistances
599 mdata.obsp[conns_key] = connectivities
--> 600 mdata.uns[key_added] = neighbors_dict
601
602 mdata.update_obs()
TypeError: 'NoneType' object does not support item assignment
The issue seems to be solved if I reset the .uns
slot manually
> mdata.uns = {}
> mu.pp.neighbors(mdata)
System
-----
muon 0.1.0
anndata 0.7.6
scanpy 1.8.1
sinfo 0.3.4
-----
PIL 8.3.1
anndata2ri 1.0.6
backcall 0.2.0
beta_ufunc NA
binom_ufunc NA
cffi 1.14.6
cycler 0.10.0
cython_runtime NA
dateutil 2.8.2
decorator 5.0.9
dunamai 1.5.5
get_version 3.5
google NA
h5py 3.3.0
ipykernel 6.0.3
ipython_genutils 0.2.0
jedi 0.18.0
jinja2 3.0.1
joblib 1.0.1
kiwisolver 1.3.1
llvmlite 0.36.0
markupsafe 2.0.1
matplotlib 3.4.2
matplotlib_inline NA
mpl_toolkits NA
muon 0.1.0
natsort 7.1.1
nbinom_ufunc NA
numba 0.53.1
numexpr 2.7.3
numpy 1.21.1
packaging 21.0
pandas 1.3.1
parso 0.8.2
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prompt_toolkit 3.0.19
ptyprocess 0.7.0
pycparser 2.20
pyexpat NA
pygments 2.9.0
pynndescent 0.5.4
pyparsing 2.4.7
pytz 2021.1
rpy2 3.4.2
scipy 1.7.1
seaborn 0.11.1
six 1.16.0
sklearn 0.24.2
statsmodels 0.12.2
storemagic NA
tables 3.6.1
threadpoolctl 2.2.0
tornado 6.1
tqdm 4.62.0
traitlets 5.0.5
tzlocal NA
umap 0.5.1
wcwidth 0.2.5
zmq 22.2.1
-----
IPython 7.26.0
jupyter_client 6.1.12
jupyter_core 4.7.1
-----
Python 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:39:48) [GCC 9.3.0]
Linux-4.15.0-112-generic-x86_64-with-glibc2.31
26 logical CPU cores, x86_64
-----
Session information updated at 2021-08-24 12:49
I got a warning and an error regarding the dependency of pysam when I tried to run the atac-seq tutorial. I tried installing pysam a couple of times but keep having issues. After a little search on google, I found that pysam cannot be installed on windows OS. Was wondering whether is it true or is there any solution for this.
Best,
Thank you,
Describe the bug
While going through the tutorial on the 10xMultiome data and plotting the PCA with values for cut counts in peaks corresponding to different genes all plots are colored the same.
Expected behaviour
Correct aggregation across specific peak sets.
System
Linux 5.13.12-arch1-1 #1 SMP PREEMPT Wed, 18 Aug 2021 20:49:03 +0000 x86_64 GNU/Linux
alabaster==0.7.12
anndata==0.7.6
argon2-cffi==20.1.0
async-generator==1.10
attrs==21.2.0
Babel==2.9.1
backcall==0.2.0
bleach==3.3.1
certifi==2021.5.30
cffi==1.14.6
charset-normalizer==2.0.4
click==8.0.1
cycler==0.10.0
debugpy==1.4.1
decorator==4.4.2
defusedxml==0.7.1
docutils==0.16
entrypoints==0.3
flit==3.3.0
flit_core==3.3.0
h5py==3.3.0
idna==3.2
imagesize==1.2.0
ipykernel==6.0.3
ipython==7.25.0
ipython-genutils==0.2.0
ipywidgets==7.6.3
jedi==0.18.0
Jinja2==3.0.1
joblib==1.0.1
jsonschema==3.2.0
jupyter==1.0.0
jupyter-client==6.1.12
jupyter-console==6.4.0
jupyter-core==4.7.1
jupyterlab-pygments==0.1.2
jupyterlab-widgets==1.0.0
jupyterthemes==0.20.0
kiwisolver==1.3.1
leidenalg==0.8.7
lesscpy==0.15.0
llvmlite==0.36.0
loompy==3.0.6
MarkupSafe==2.0.1
matplotlib==3.4.2
matplotlib-inline==0.1.2
mistune==0.8.4
muon @ file:///[my local path]/muon
natsort==7.1.1
nbclient==0.5.3
nbconvert==6.1.0
nbformat==5.1.3
nbsphinx==0.8.7
nest-asyncio==1.5.1
networkx==2.5.1
notebook==6.4.0
numba==0.53.1
numexpr==2.7.3
numpy==1.21.1
numpy-groupies==0.9.13
packaging==21.0
pandas==1.3.1
pandocfilters==1.4.3
parso==0.8.2
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==8.3.1
ply==3.11
prometheus-client==0.11.0
prompt-toolkit==3.0.19
protobuf==3.17.3
ptyprocess==0.7.0
pycparser==2.20
Pygments==2.9.0
pynndescent==0.5.4
pyparsing==2.4.7
pyrsistent==0.18.0
pysam==0.16.0.1
python-dateutil==2.8.2
python-igraph==0.9.6
pytz==2021.1
pyzmq==22.1.0
qtconsole==5.1.1
QtPy==1.9.0
readthedocs-sphinx-search==0.1.0
requests==2.26.0
scanpy==1.8.1
scikit-learn==0.24.2
scipy==1.7.0
seaborn==0.11.1
Send2Trash==1.7.1
sinfo==0.3.4
six==1.16.0
sklearn==0.0
snowballstemmer==2.1.0
Sphinx==4.1.2
sphinx-automodapi==0.13
sphinx-rtd-theme==0.5.2
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
statsmodels==0.12.2
stdlib-list==0.8.0
tables==3.6.1
terminado==0.10.1
testpath==0.5.0
texttable==1.6.4
threadpoolctl==2.2.0
toml==0.10.2
tornado==6.1
tqdm==4.61.2
traitlets==5.0.5
umap-learn==0.5.1
urllib3==1.26.6
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==3.5.1
xlrd==1.2.0
Additional context
I believe it's related to #21 or it's fix. In fact reverting the b2347af produces correct behavior.
I've checked manually and the subsetting used in b2347af doesn't work, i.e. returns full matrix (top of the picture) with AnnData 0.7.6.
Hi, probably a long shot but I need to re-analyse the Teaseq dataset that you use in your multimodal vignette but I can't find the .tbi index (data/teaseq/GSM5123951_X066-MP0C1W3_leukopak_perm-cells_tea_200M_atac_filtered_fragments.tsv.gz.tbi) for the atac fragment files on geo. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE158013
do you happen to have these files and could share them?
thank you!
Describe the bug
Hi there, I am not sure if this is a bug, but I cannot seem to run this tutorial locally.
See screenshots below:
Not loading the 'files' ATAC unstructured annotation
Correct importing of data, according to tutorial
Everything works fine up until the ATACseq analysis part.
I get a key error related to 'files'. Which is supposed to be an unstructured annotation in Anndata.
atac.obs['NS'] = 1
ac.pl.fragment_histogram(atac, region='chr1:1-2000000')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [37], in <cell line: 2>()
1 atac.obs['NS'] = 1
----> 2 ac.pl.fragment_histogram(atac, region='chr1:1-2000000')
File ~/mambaforge/envs/bioinfo/lib/python3.10/site-packages/muon/_atac/plot.py:341, in fragment_histogram(data, region, groupby, barcodes)
338 else:
339 raise TypeError("Expected AnnData or MuData object with 'atac' modality")
--> 341 fragment_path = adata.uns["files"]["fragments"]
342 fragments = tools.fetch_regions_to_df(fragment_path=fragment_path, features=region)
344 fragments["length"] = fragments.End - fragments.Start
File ~/mambaforge/envs/bioinfo/lib/python3.10/site-packages/anndata/compat/_overloaded_dict.py:100, in OverloadedDict.__getitem__(self, key)
98 return self.overloaded[key].get()
99 else:
--> 100 return self.data[key]
KeyError: 'files'
I assume this is related to the fact that I have a different directory structure, or that somehow I have downloaded the wrong files from 10X Genomics.
System
Any help is appreciated.
Thanks,
Giuliano.
Having functional .obsp
/ .varp
attributes would for instance allow to store neighbourhood graphs generated based on multiple modalities. Will likely require implementing AnnData's PairwiseArrays
analogue.
Example:
Samples in AnnData are sample1_group1, sample2_group1, sample1_group2, sample2_group2, sample3_group2, sample3_group1.
Resulting mofa samples_groups and samples_names
modal.data_opts['samples_groups'] = ["group1", "group1", "group2", "group2", "group2", "group1"]
modal.data_opts['samples_names'] = [["sample1_group1", "sample2_group1", "sample3_group1"],
["sample1_group2", "sample2_group2", "sample3_group2"]]
As samples_groups are concatenated later in entry_point.py the correspondence of samples to groups is wrong.
Jupyter notebooks allow objects to have rich representationhttps://ipython.readthedocs.io/en/stable/config/integrating.html by implementing a spacial formatter, e.g. _repr_html_()
for HTML.
This is something that xarray
uses to display rich and interactive object representation. Having this feature will enhance user experience with MuData
objects when working in Jupyter notebooks and similar environments.
To switch between 'text'
and 'html'
representations, global options for muon
might need to be introduced, e.g. mu.set_options(display_style='html')
.
Describe the bug
Triger AttributeError "X not found" when use layer parameter in mu.tl.embedding.
To Reproduce
mu.pl.embedding( md, "X_umap", color=[*some gene*], cmap="Reds", layer="sct_corrected", )
then a attribute error was triggered
AttributeError Traceback (most recent call last)
~/softwares/anaconda3/lib/python3.8/site-packages/muon/_core/plot.py in embedding(data, basis, color, use_raw, layer, >**kwargs)
120 if layer is not None:
121 if layer in data.mod[m].layers:
--> 122 fmod_adata.X = data.mod[m][:, mod_keys].layers[layer].X
123 if use_raw:
124 warnings.warn(f"Layer='{layer}' superseded use_raw={use_raw}")~/softwares/anaconda3/lib/python3.8/site-packages/scipy/sparse/base.py in getattr(self, attr)
685 return self.getnnz()
686 else:
--> 687 raise AttributeError(attr + " not found")
688
689 def transpose(self, axes=None, copy=False):AttributeError: X not found
Additional context
This error appears to be caused by a typo. In fact a simple deletion of the .X
will fix this bug.https://github.com/PMBio/muon/blob/e77bbf3a2acb1c92dae4c13419cd7335602c2735/muon/_core/plot.py#L122
Thanks developers for providing such a great tool.
Dear Muon Team,
I used your program with great success (really great usability!), up until I wanted to integrate the RNA and ATAC data into a joint latent space.
My object looks as following:
muData object with n_obs × n_vars = 9817 × 304944
obs: 'sample'
var: 'feature_types', 'genome'
2 modalities
rna: 9817 x 16660
obs: 'sample', 'int_id', 'seq_id_gex_id', 'seq_id_atac', 'reporter', 'n_counts', 'log_cell_probs', 'cell_confirmed', 'log_counts', 'n_genes', [...]
var: 'gene_ids', 'feature_types', 'genome', 'interval', 'n_cells-0', 'ambient_genes_values-0', 'is_ambient-0', 'is_ambi_key-0',
[...]
uns: 'final_doublets_cat_colors', 'louvain', 'louvain_3_colors', 'louvain_colors', 'neighbors', 'reporter_colors', 'sample_colors', 'batch_colors', 'log1p', 'umap'
obsm: 'X_pca', 'X_umap'
layers: 'counts'
obsp: 'connectivities', 'distances'
atac: 9817 x 288284
obs: 'seurat_id', 'aggr_barcodes', 'sample', 'int_id', 'seq_id_gex_id', 'seq_id_atac', 'reporter', 'size_factors', 'louvain'
var: 'feature_types', 'modality', 'genome'
uns: 'files', 'log1p', 'neighbors', 'louvain', 'umap', 'sample_colors'
obsm: 'X_pca', 'X_umap'
layers: 'counts'
obsp: 'distances', 'connectivities'
When I then try to run
mu.tl.mofa(mdata2, groups_label = 'sample')
the result is always the same with:
ValueError:` shapes (9817,16660) and (9817,16660) not aligned: 16660 (dim 1) != 9817 (dim 0)
And I just cant make sense of it. Thank you very much for your help!
Hey,
I can't seem to find muon on any conda channel. Would suggest to add muon to conda-forge.
Cheers
Hi there,
First of all, every time I go back to working with muon it gets better and better, so thanks for the great work!
I was wondering whether you had thought of adding a muon.atac.pp
function to compute GC content of peaks. This is frequently used for peak-gene association testing (e.g. here), and as far as I am aware there is no easy functionality in python for this (I've always used addGCBias
from chromVAR
in R).
Expected to be able to read a modality in a backed mode but it currently fails:
mu.read('pbmc10k.h5mu/atac', backed=True)
# or
mu.read_h5ad('pbmc10k.h5mu', mod='atac', backed=True)
# => TypeError: __init__() got an unexpected keyword argument 'mode'
It seems like mu.read_h5ad()
passes {mode: hdf5_mode}
to the AnnData constructor, which can't handle it.
dsb
normalisation has been implemented in #9.
The reference implementation now features an updated interface (DSBNormalizeProtein
), which we should consider to adopt. This should improve user experience when applying dsb
normalisation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.