erdogant / clusteval Goto Github PK

Clusteval provides methods for unsupervised cluster validation

Home Page: https://erdogant.github.io/clusteval

License: Other

Python 0.87% Shell 0.01% Jupyter Notebook 99.12%

clustering unsupervised-clustering silhouette-method dbindex density-based-clustering validation machine-learning python

clusteval's Introduction

Hi there! I am sharing my knowledge with the world through my blogs and open-source GitHub projects.

Your ❤️ is important to keep maintaining my packages. It is awsome that there are readily millions of downloads but to keep the libraries alive, I often need to make all kinds of fixes. This can eat up my entire weekend and evenings. Yes, I do this for free and in my free time! There are various ways you can help. You can report bugs/ issues, or even better help out with fixing bugs or maybe adding new features! If you don't have the time or maybe you are still learning, you can also take a Medium Membership using my referral link to keep reading all my hands-on blogs and learn more :-) If you don't need that, there is always an easy way with Coffee :-) Cheers!

A structured list of my repos

All Repos can be found in the Repositories section. If Sphinx pages are available, the link will directly go to the documentation pages.

Statistics	Machine learning	(Time)Series	Vizualization	Utils	API
bnlearn	clusteval	findpeaks	d3graph	df2onehot	googletrends
hnet	classeval	temporalrank	d3heatmap	pypickle	slacki
distfit	hgboost	caerus	treeplot	ismember
pca	clustimage		kaplanmeier	irelease
thompson	undouble		flameplot	pypiplot
benfordslaw			worldmap	dicter
			colourmap
			imagesc
			scatterd
			d3blocks

Find here my Pypi download stats

Overview of open issues

clusteval's People

Contributors

Stargazers

Watchers

Forkers

eybesh shalevy1 aniruddhachoudhury jedsada-gh ricciardi tdl77 matthew-j-payne customeriq

clusteval's Issues

Small question on recommanded usage

This was an error

AttributeError: 'clusteval' object has no attribute 'results'... DUP of closed issue #5?

I've tried versions from 2.0.0 to beta, all resulting in the same issue as described in Issue 5.

I'm on the latest HDBSCAN version of 0.8.27, and the beta version of Clusteval...

My array is as follows:

umap_rs_embed[:10]

array([[-0.16568227,  2.3830128 ,  0.9952151 ],
       [-0.16470274,  0.91874045,  1.6843276 ],
       [-0.10057875,  1.0044663 ,  4.231984  ],
       [ 7.218489  ,  3.189865  ,  1.6015646 ],
       [ 1.7666751 ,  2.4235313 ,  1.2277056 ],
       [-0.02537769,  1.1624466 ,  4.175513  ],
       [ 1.4869809 , -0.8690608 ,  2.6568232 ],
       [-0.05031788, -0.30832335,  0.93605393],
       [ 1.2532264 ,  1.6826892 ,  0.4620979 ],
       [ 1.3145269 ,  1.3296161 ,  3.9630399 ]], dtype=float32)

And the result of fitting is:

# Import library
from clusteval import clusteval
import hdbscan
# Set the method
ce = clusteval(method='hdbscan')
# Evaluate
results = ce.fit(umap_rs_embed)

AttributeError                            Traceback (most recent call last)
<ipython-input-6-68904b3307b8> in <module>
      5 ce = clusteval(method='hdbscan')
      6 # Evaluate
----> 7 results = ce.fit(umap_rs_embed)

~\Desktop\scripting\apps\trajectory\lib\site-packages\clusteval\clusteval.py in fit(self, X)
    170 
    171         # Compute the dendrogram threshold
--> 172         if (self.cluster!='kmeans') and (self.results['labx'] is not None) and (len(np.unique(self.results['labx']))>1):
    173             # print(self.results['labx'])
    174             max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)

AttributeError: 'clusteval' object has no attribute 'results'

AttributeError: 'clusteval' object has no attribute 'results'

[clusteval] >Fit using agglomerative with metric: euclidean, and linkage: ward

AttributeError Traceback (most recent call last)
in
6
7 # Fit to find optimal number of clusters using dbscan
----> 8 results= ce.fit(X)
9
10 # Make plot of the cluster evaluation

~\anaconda3\lib\site-packages\clusteval\clusteval.py in fit(self, X)
167
168 # Compute the dendrogram threshold
--> 169 if (self.cluster!='kmeans') and (len(np.unique(self.results['labx']))>1):
170 # print(self.results['labx'])
171 max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)

AttributeError: 'clusteval' object has no attribute 'results'

How to save the plots?

Charts are well generated but how to save them to local folder?

TypeError: plot() got an unexpected keyword argument 'width'

[HDBSCAN] Estimated number of clusters: 10
[HDBSCAN] Silhouette Coefficient: 0.780
[clusteval] >Fin.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-8b8378daba75> in <module>
      8 ce = clusteval(method='hdbscan')
      9 ce.fit(X)
---> 10 ce.plot()
     11 #ce.scatter(X)

/usr/local/lib/python3.7/site-packages/clusteval/clusteval.py in plot(self, figsize)
    151         elif self.method=='hdbscan':
    152             import clusteval.hdbscan as hdbscan
--> 153             hdbscan.plot(self.results, width=figsize[0], height=figsize[1])
    154 
    155     # Plot

TypeError: plot() got an unexpected keyword argument 'width'

is pip not working anymore?

I get the following
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnection
Error('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x0000024B20CAB400>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/clusteval/
ERROR: Could not find a version that satisfies the requirement clusteval (from versions: none) ERROR: No matching distribution found for clusteval

"Attempt to get argmax of an empty sequence" error when Clustering dbscan

Hi, im having this error when using clusteval with cluster param "dbscan" with TFIDF, this is my code:

vectorizer = TfidfVectorizer(max_df=0.55,min_df=27)
X = vectorizer.fit_transform(grams)

svd = TruncatedSVD(int(X.shape[1] - 1))
normalizer = Normalizer(copy=False)
lsa = make_pipeline(svd, normalizer)
X = lsa.fit_transform(X)

ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

This is the complete log error:

Traceback (most recent call last):
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/main/init.py", line 82, in
ce.fit(X.toarray())
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/clusteval.py", line 153, in fit
self.results = dbscan.fit(X, eps=None, epsres=50, min_samples=0.01, metric=self.metric, norm=True, n_jobs=-1, min_clust=self.min_clust, max_clust=self.max_clust, verbose=self.verbose)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/dbscan.py", line 97, in fit
idx = np.argmax(silscores)
File "<array_function internals>", line 6, in argmax
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1188, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

Method and metric

I sense potential in this package and I am inclined to use it in the future. Keep on the good work! Consider publishing the software in the SoftwareX journal.

I have a number of questions that I could not answer from the arguably short documentation:

It seems to me that I can either pick DBSCAN or say silhouette score, but not both at the same time. This seems odd to me because DBSCAN is a method whose results could be used with the silhouette score.
Related to that question: How are clusters evaluated if I pick DBSCAN or HDBSCAN, and how are clusters computed if I pick silhouette score or the Davies-Boulin index.
How could I choose different distance metrics to plug into e.g. DBSCAN or HDBSCAN?
How do I see which parameters got chosen?

S_Dbw index

Hi Erdogan,

would it be possible for the clustering evaluation to integrate an S_Dbw index https://pypi.org/project/s-dbw/#description, https://github.com/alashkov83/S_Dbw?

Best regards,
Nataliia

erdogant / clusteval Goto Github PK

clusteval's Introduction

clusteval's People

Contributors

Stargazers

Watchers

Forkers

clusteval's Issues

Small question on recommanded usage

AttributeError: 'clusteval' object has no attribute 'results'... DUP of closed issue #5?

AttributeError: 'clusteval' object has no attribute 'results'

[clusteval] >Fit using agglomerative with metric: euclidean, and linkage: ward

How to save the plots?

TypeError: plot() got an unexpected keyword argument 'width'

is pip not working anymore?

"Attempt to get argmax of an empty sequence" error when Clustering dbscan

Method and metric

S_Dbw index

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent