Git Product home page Git Product logo

pyprotoclust's Introduction

Documentation Status MIT License

Pyprotoclust is an implementatin of representative hierarchical clustering using minimax linkage.

The original algorithm is from Hierarchical Clustering With Prototypes via Minimax Linkage by Jacob Bien and Robert Tibshirani.

Pyprotoclust takes a distance matrix as input. It returns a linkage matrix encoding the hierachical clustering as well as an additional list labelling the prototypes associated with each clustering. This allows a user to integrate with the existing tools in the SciPy hierarchical clustering module.

Installation:

pip install pyprotoclust

Usage:

from pyprotoclust import protoclust
import numpy as np
import scipy as sp
import scipy.cluster.hierarchy
import scipy.spatial.distance

# Generate two-dimensional toy data
n = 60
np.random.seed(4)
params = [{'mean': [-7, 0], 'cov': [[1, 1], [1, 5]]},
          {'mean': [1, -1], 'cov': [[5, 0], [0, 1]]},
          {'mean': [3, 7], 'cov': [[1, 0], [0, 1]]}]
data = np.vstack([np.random.multivariate_normal(p['mean'], p['cov'], n) for p in params])
X = sp.spatial.distance.squareform(sp.spatial.distance.pdist(data))

# Produce a hierarchical clustering using minimax linkage
Z, prototypes = protoclust(X)

# Generate clusters at a set cut_height using scipy's hierarchy module
cut_height = 7
T = sp.cluster.hierarchy.fcluster(Z, cut_height, criterion='distance')
L,M = sp.cluster.hierarchy.leaders(Z, T)

# Get the prototypes associated with the generated clusters
P = data[[prototypes[l] for l in L]]

The previous example produces a linkage matrix Z and prototypes P that can be used to produce dendrograms and other plots of the data.

A dendrogram of the hierarchical clustering example.

A dendrogram of the hierarchical clustering example with a dashed line at the example cut height.

A scatter plot of the  hierarchical clustering example.

A scatter plot of the example with circles centered at prototypes drawn with radii equal to the top-level linkage heights of each cluster.

Citing pyprotoclust

The pyprotoclust package is a contribution to work that has been published in Nature Scientific Data. The original algorithm was published in the Journal of the American Statistical Association. If you use pyprotoclust in your work, please cite the following references:

Goldschmidt, Andy, et al. "Quantifying yeast colony morphologies with feature engineering from time-lapse photography." Scientific Data 9.1 (2022): 1-9. https://doi.org/10.1038/s41597-022-01340-3

@article{goldschmidt2022quantifying,
  doi={https://doi.org/10.1038/s41597-022-01340-3},
  title={Quantifying yeast colony morphologies with feature engineering from time-lapse photography},
  author={Goldschmidt, Andy and Kunert-Graf, James and Scott, Adrian C and Tan, Zhihao and Dudley, Aim{\'e}e M and Kutz, J Nathan},
  journal={Scientific Data},
  volume={9},
  number={1},
  pages={1--9},
  year={2022},
  publisher={Nature Publishing Group}
}

Bien, Jacob, and Robert Tibshirani. "Hierarchical clustering with prototypes via minimax linkage." Journal of the American Statistical Association 106.495 (2011): 1075-1084. https://doi.org/10.1198/jasa.2011.tm10183

@article{bien2011hierarchical,
  doi={https://doi.org/10.1198/jasa.2011.tm10183},
  title={Hierarchical {Clustering} with {Prototypes} via {Minimax} {Linkage}},
  author={Bien, Jacob and Tibshirani, Robert},
  journal={Journal of the American Statistical Association},
  volume={106},
  number={495},
  pages={1075--1084},
  year={2011},
  publisher={Taylor \& Francis}
}

pyprotoclust's People

Contributors

andgoldschmidt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pyprotoclust's Issues

Package does not have Windows wheels

Hi! I am using Windows 10 and I am trying to install the package with:

pip install pyprotoclust

I have tried even creating different conda environments, but the same error always arrises:

Building wheels for collected packages: pyprotoclust
  Building wheel for pyprotoclust (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for pyprotoclust (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      A setup.py file already exists. Using it.
      Traceback (most recent call last):
        File "C:\Users\gorgo\AppData\Local\Temp\pip-install-54lp9uet\pyprotoclust_362cf845644e40b698d9726acc64ba2a\setup.py", line 2, in <module>
          from setuptools import setup
      ModuleNotFoundError: No module named 'setuptools'
      Traceback (most recent call last):
        File "D:\Programas\anaconda3\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 363, in <module>
          main()
        File "D:\Programas\anaconda3\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "D:\Programas\anaconda3\lib\site-packages\pip\_vendor\pep517\in_process\_in_process.py", line 261, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "C:\Users\gorgo\AppData\Local\Temp\pip-build-env-b672qip5\overlay\Lib\site-packages\poetry\core\masonry\api.py", line 68, in build_wheel
          return unicode(WheelBuilder.make_in(poetry, Path(wheel_directory)))
        File "C:\Users\gorgo\AppData\Local\Temp\pip-build-env-b672qip5\overlay\Lib\site-packages\poetry\core\masonry\builders\wheel.py", line 78, in make_in
          wb.build()
        File "C:\Users\gorgo\AppData\Local\Temp\pip-build-env-b672qip5\overlay\Lib\site-packages\poetry\core\masonry\builders\wheel.py", line 110, in build
          self._build(zip_file)
        File "C:\Users\gorgo\AppData\Local\Temp\pip-build-env-b672qip5\overlay\Lib\site-packages\poetry\core\masonry\builders\wheel.py", line 162, in _build
          self._run_build_command(setup)
        File "C:\Users\gorgo\AppData\Local\Temp\pip-build-env-b672qip5\overlay\Lib\site-packages\poetry\core\masonry\builders\wheel.py", line 190, in _run_build_command
          subprocess.check_call(
        File "D:\Programas\anaconda3\lib\subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['D:/Programas/anaconda3/python.exe', 'C:\\Users\\gorgo\\AppData\\Local\\Temp\\pip-install-54lp9uet\\pyprotoclust_362cf845644e40b698d9726acc64ba2a\\setup.py', 'build', '-b', 'C:\\Users\\gorgo\\AppData\\Local\\Temp\\pip-install-54lp9uet\\pyprotoclust_362cf845644e40b698d9726acc64ba2a\\build']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pyprotoclust
Failed to build pyprotoclust
ERROR: Could not build wheels for pyprotoclust, which is required to install pyproject.toml-based projects

I have checked my installation and setuptools is correctly installed. I have also tried with python -m pip install --upgrade pip setuptools wheel, and with python -m pip install --upgrade setuptools but they do not solve the problem either.

Can't install the package

Thanks for sharing this package. I can't install the package via pip or setup package. could you help me in that?

  • pip install .

ERROR: Command errored out with exit status 1: /usr/bin/python3 /usr/local/lib/python3.6/dist-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmpa8ydsv1m Check the logs for full command output.

  • with pip install pyprotoclust

ERROR: No matching distribution found for pyprotoclust

Installation problems [osx wheels requested]

Hello! Thanks for building and open-sourcing this :)

I'm trying to install this package with pip install pyprotoclust on my Apple Silicon machine using Python 3.9 within a conda environment and seeing the following output (even though setuptools is in fact installled)... If you had any ideas or suggestions about how I could get this installed, I'd appreciate it :)

Collecting pyprotoclust
  Using cached pyprotoclust-0.1.0.tar.gz (112 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: tqdm<5.0.0,>=4.46.0 in ./opt/miniconda3/envs/env/lib/python3.9/site-packages (from pyprotoclust) (4.63.0)
Building wheels for collected packages: pyprotoclust
  Building wheel for pyprotoclust (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for pyprotoclust (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      A setup.py file already exists. Using it.
      Traceback (most recent call last):
        File "/private/var/folders/20/zn579xsd7m3fm3yxbg9kkj640000gp/T/pip-install-hn3vumpf/pyprotoclust_5f2e121823764c21b428d4fbfc9ddfc7/setup.py", line 2, in <module>
          from setuptools import setup
      ModuleNotFoundError: No module named 'setuptools'
      Traceback (most recent call last):
        File "/Users/nicolas/opt/miniconda3/envs/env/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
          main()
        File "/Users/nicolas/opt/miniconda3/envs/env/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/Users/nicolas/opt/miniconda3/envs/env/lib/python3.9/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 261, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/private/var/folders/20/zn579xsd7m3fm3yxbg9kkj640000gp/T/pip-build-env-4xd7s4i7/overlay/lib/python3.9/site-packages/poetry/core/masonry/api.py", line 68, in build_wheel
          return unicode(WheelBuilder.make_in(poetry, Path(wheel_directory)))
        File "/private/var/folders/20/zn579xsd7m3fm3yxbg9kkj640000gp/T/pip-build-env-4xd7s4i7/overlay/lib/python3.9/site-packages/poetry/core/masonry/builders/wheel.py", line 78, in make_in
          wb.build()
        File "/private/var/folders/20/zn579xsd7m3fm3yxbg9kkj640000gp/T/pip-build-env-4xd7s4i7/overlay/lib/python3.9/site-packages/poetry/core/masonry/builders/wheel.py", line 110, in build
          self._build(zip_file)
        File "/private/var/folders/20/zn579xsd7m3fm3yxbg9kkj640000gp/T/pip-build-env-4xd7s4i7/overlay/lib/python3.9/site-packages/poetry/core/masonry/builders/wheel.py", line 162, in _build
          self._run_build_command(setup)
        File "/private/var/folders/20/zn579xsd7m3fm3yxbg9kkj640000gp/T/pip-build-env-4xd7s4i7/overlay/lib/python3.9/site-packages/poetry/core/masonry/builders/wheel.py", line 190, in _run_build_command
          subprocess.check_call(
        File "/Users/nicolas/opt/miniconda3/envs/env/lib/python3.9/subprocess.py", line 373, in check_call
          raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['/Users/nicolas/opt/miniconda3/envs/env/bin/python', '/private/var/folders/20/zn579xsd7m3fm3yxbg9kkj640000gp/T/pip-install-hn3vumpf/pyprotoclust_5f2e121823764c21b428d4fbfc9ddfc7/setup.py', 'build', '-b', '/private/var/folders/20/zn579xsd7m3fm3yxbg9kkj640000gp/T/pip-install-hn3vumpf/pyprotoclust_5f2e121823764c21b428d4fbfc9ddfc7/build']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pyprotoclust
Failed to build pyprotoclust
ERROR: Could not build wheels for pyprotoclust, which is required to install pyproject.toml-based projects

Installation Issues

I am using anaconda with a specific environment for this purpose on Windows 11, and Python 3.10.14.

I installed the package with pip install pyprotoclust and everything seemed to work correctly. However, when I run the code from pyprotoclust import protoclust I constantly get the following error:

Traceback (most recent call last):

  Cell In[1], line 1
    from pyprotoclust import protoclust

  File ~\AppData\Local\anaconda3\envs\clustering\librium-packages\pyprotoclust\__init__.py:1
    from .protoclust import protoclust

  File ~\AppData\Local\anaconda3\envsclustering\libelibrium-packages\pyprotoclust\protoclust.py:1
    from pyprotoclust.c_protoclust import CyProtoclust

ModuleNotFoundError: No module named 'pyprotoclust.c_protoclust'

After this, I tried with conda install anaconda::cython. In addition, I added the following instructions in pyprotoclust.__init__:

import pyximport
pyximport.install()

As a result, I got an error related to the version of Microsoft Visual C++ (it requires 14.0 or greater). I updated the C++ version, and the error changed to the following one:

ImportError: Building module pyprotoclust.c_protoclust failed: ["distutils.errors.CompileError: command 'C:\\\\Program Files (x86)\\\\Microsoft Visual Studio\\\\2022\\\\BuildTools\\\\VC\\\\Tools\\\\MSVC\\\\14.40.33807\\\\bin\\\\HostX86\\\\x64\\\\cl.exe' failed with exit code 2\n"]

For the time being, I have not been able to solve this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.