Git Product home page Git Product logo

mosaic's Introduction

Molecular Systems Automated Identification of Cooperativity

MoSAIC is an unsupervised method for correlation analysis which automatically detects the collective motion in MD simulation data, while simultaneously identifying uncorrelated coordinates as noise. Hence, it can be used as a feature selection scheme for Markov state modeling or simply to obtain a detailed picture of the key coordinates driving a biomolecular process. It is based on the Leiden community detection algorithm which is used to bring a correlation matrix in a block-diagonal form.

The method was published in:

Correlation-Based Feature Selection to Identify Functional Dynamics in Proteins
G. Diez, D. Nagel, and G. Stock,
J. Chem. Theory Comput. 2022 18 (8), 5079-5088,
doi: 10.1021/acs.jctc.2c00337

If you use this software package, please cite the above mentioned paper.

Features

  • Intuitive usage via module and via CI
  • Sklearn-style API for fast integration into your Python workflow
  • No magic, only a single parameter which can be optimized via cross-validation
  • Extensive documentation and detailed discussion in publication
  • Step by step tutorial to follow

Installation

The package is called mosaic-clustering and is available via PyPI or conda. To install it, simply call:

python3 -m pip install --upgrade mosaic-clustering

or

conda install -c conda-forge mosaic-clustering

or for the latest dev version

# via ssh key
python3 -m pip install git+ssh://[email protected]/moldyn/MoSAIC.git

# or via password-based login
python3 -m pip install git+https://github.com/moldyn/MoSAIC.git

In case one wants to use the deprecated UMAPSimilarity or the module mosaic umap one needs to specify the extras_require='umap', so

python3 -m pip install --upgrade moldyn-mosaic[umap]

Shell Completion

Using the bash, zsh or fish shell click provides an easy way to provide shell completion, checkout the docs. In the case of bash you need to add following line to your ~/.bashrc

eval "$(_MOSAIC_COMPLETE=bash_source mosaic)"

Usage

In general one can call the module directly by its entry point $ MoSAIC or by calling the module $ python -m mosaic. The latter method is preferred to ensure using the desired python environment. For enabling the shell completion, the entry point needs to be used.

CI - Usage Directly from the Command Line

The module brings a rich CI using click. Each module and submodule contains a detailed help, which can be accessed by

$ python -m mosaic
Usage: python -m mosaic [OPTIONS] COMMAND [ARGS]...

  MoSAIC motion v0.4.1

  Molecular systems automated identification of collective motion, is
  a correlation based feature selection framework for MD data.
  Copyright (c) 2021-2023, Georg Diez and Daniel Nagel

Options:
  --help  Show this message and exit.

Commands:
  clustering  Clustering similarity matrix of coordinates.
  similarity  Creating similarity matrix of coordinates.
  tui         Open Textual TUI for interactive usage.

For more details on the submodule one needs to specify one of the two commands, or by opening the terminal user interface (tui).

A simple workflow example for clustering the input file input_file using correlation and Leiden with CPM and the default resolution parameter:

# creating correlation matrix
$ python -m mosaic similarity -i input_file -o output_similarity --metric correlation -v

MoSAIC SIMILARITY
~~~ Initialize similarity class
~~~ Load file input_file
~~~ Fit input
~~~ Store similarity matrix in output_similarity

# clustering with CPM and default resolution parameter
# the latter needs to be fine-tuned to each matrix
$ python -m mosaic clustering -i output_similarity -o output_clustering --plot -v

MoSAIC CLUSTERING
~~~ Initialize clustering class
~~~ Load file output_similarity
~~~ Fit input
~~~ Store output
~~~ Plot matrix

This will generate the similarity matrix stored in output_similarity, the plotted result in output_clustering.matrix.pdf, the raw data of the matrix in output_clustering.matrix and a file containing in each row the indices of a cluster.

Module - Inside a Python Script

import mosaic

# Load file
# X is np.ndarray of shape (n_samples, n_features)

sim = mosaic.Similarity(
    metric='correlation',  # or 'NMI', 'GY', 'JSD'
)
sim.fit(X)


# Cluster matrix
clust = mosaic.Clustering(
    mode='CPM',  # or 'modularity
)
clust.fit(sim.matrix_)

clusters = clust.clusters_
clusterd_X = clust.matrix_
...

mosaic's People

Contributors

braniii avatar gegabo avatar moldyn-nagel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mosaic's Issues

similarity fit overflow

Hi everyone,

First of all congratulations on all your work on this exciting tool.

I have an input numpy array of shape (5852, 1297) and this is in the format (n_samples, n_features).
If I do:

d_array = np.load(“data”)
sim = mosaic.Similarity(
    metric='correlation’)
sim.fit(d_array]) 

I get:
ValueError: Correlation matrix is not symmetric. This should not occur and is probably caused by an overflow error.

The procedure actually works only if I reduce a lot the input matrix, as much as I can keep only 200 features.
The funny fact is that I, by mistake, transposed the matrix, so to get d_array of dimensions (1297, 5852) and in that case, it worked perfectly, but of course, it was conceptually wrong.

I tried to work around the problem by computing the correlation matrix with standard numpy:
R1 = np.corrcoef(d_array.T)
With this, I get the correlation matrix, but if I then feed this to the clustering function I get an error:
AssertionError: False not tri-state boolean.

As suggested by Georg Diez, I checked the format of my input data and it is np.float32.

Could you help me with this problem?

Thank you,
Elena

"ValueError: Correlation matrix is not symmetric. This should not occur and is probably caused by an overflow error or too low dtype precision."

I have installed MoSAIC into a python 3.8 environment, and have run the following command:

python -m mosaic similarity -i test.dat -o output_similarity --metric correlation -v

My output looks like this:

MoSAIC SIMILARITY
~~~ Initialize similarity class
~~~ Load file test.dat
~~~ Fit input.
/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/numpy/lib/function_base.py:2854: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/numpy/lib/function_base.py:2855: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]
Traceback (most recent call last):
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/mosaic/__main__.py", line 363, in <module>
    main()
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/mosaic/__main__.py", line 153, in similarity
    sim.fit(X)
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/functools.py", line 912, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/mosaic/similarity.py", line 208, in _
    corr = _correlation(X)
  File "<@beartype(mosaic._correlation_utils._correlation) at 0x7fdb541b9d30>", line 50, in _correlation
  File "/home/austin/miniconda3/envs/mosaic/lib/python3.8/site-packages/mosaic/_correlation_utils.py", line 109, in _correlation
    raise ValueError(
ValueError: Correlation matrix is not symmetric. This should not occur and is probably caused by an overflow error or too low dtype precision.

I have tried changing the shape of my input test.dat, but the result is always the same. I convert my npy file to dat format using a command like this: np.savetxt('test.dat', npy_array, fmt='%.4f')

I am not sure how I can circumvent this issue. I would like to use MoSAIC to reduce my feature set from 2070 to something more reasonable. There is no tutorial file available for determining similarity; the only example file that I see is MoSAIC/example/toy_matrix_paper, which is meant for testing the MoSAIC clustering function rather than the MoSAIC similarity function.

Let me know if there is anything else I can do or provide input files.

Thanks,
Austin

[bug] missing dependency `decorit`

Thx @dieJaegerIn for reporting this bug.

Traceback (most recent call last):
  File "/home/user/anaconda3/bin/mosaic", line 5, in <module>
    from mosaic.__main__ import main
  File "/home/user/anaconda3/lib/python3.9/site-packages/mosaic/__init__.py", line 13, in <module>
    from .umap_similarity import UMAPSimilarity
  File "/home/user/anaconda3/lib/python3.9/site-packages/mosaic/umap_similarity.py", line 16, in <module>
    from decorit import deprecated
ModuleNotFoundError: No module named 'decorit' 

When using prettypyplot < 0.8.0 decorit is missing. Either add decorit to dependencies or require pplt>=0.8.0.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.