Git Product home page Git Product logo

clusteval's Introduction

Hi there! I am sharing my knowledge with the world through my blogs and open-source GitHub projects.

Your ❤️ is important to keep maintaining my packages. It is awsome that there are readily millions of downloads but to keep the libraries alive, I often need to make all kinds of fixes. This can eat up my entire weekend and evenings. Yes, I do this for free and in my free time! There are various ways you can help. You can report bugs/ issues, or even better help out with fixing bugs or maybe adding new features! If you don't have the time or maybe you are still learning, you can also take a Medium Membership using my referral link to keep reading all my hands-on blogs and learn more :-) If you don't need that, there is always an easy way with Coffee :-) Cheers!

Buy Me a Coffee at ko-fi.com

A structured list of my repos

All Repos can be found in the Repositories section. If Sphinx pages are available, the link will directly go to the documentation pages.

Statistics Machine learning (Time)Series Vizualization Utils API
bnlearn clusteval findpeaks d3graph df2onehot googletrends
hnet classeval temporalrank d3heatmap pypickle slacki
distfit hgboost caerus treeplot ismember
pca clustimage kaplanmeier irelease
thompson undouble flameplot pypiplot
benfordslaw worldmap dicter
colourmap
imagesc
scatterd
d3blocks

Find here my Pypi download stats

Overview of open issues

d3blocks bnlearn hnet distfit pca benfordslaw clustimage undouble clusteval classeval hgboost findpeaks d3graph d3heatmap treeplot kaplanmeier ismember dicter

clusteval's People

Contributors

erdogant avatar matthew-j-payne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

clusteval's Issues

AttributeError: 'clusteval' object has no attribute 'results'... DUP of closed issue #5?

I've tried versions from 2.0.0 to beta, all resulting in the same issue as described in Issue 5.

I'm on the latest HDBSCAN version of 0.8.27, and the beta version of Clusteval...

My array is as follows:

umap_rs_embed[:10]

array([[-0.16568227,  2.3830128 ,  0.9952151 ],
       [-0.16470274,  0.91874045,  1.6843276 ],
       [-0.10057875,  1.0044663 ,  4.231984  ],
       [ 7.218489  ,  3.189865  ,  1.6015646 ],
       [ 1.7666751 ,  2.4235313 ,  1.2277056 ],
       [-0.02537769,  1.1624466 ,  4.175513  ],
       [ 1.4869809 , -0.8690608 ,  2.6568232 ],
       [-0.05031788, -0.30832335,  0.93605393],
       [ 1.2532264 ,  1.6826892 ,  0.4620979 ],
       [ 1.3145269 ,  1.3296161 ,  3.9630399 ]], dtype=float32)

And the result of fitting is:

# Import library
from clusteval import clusteval
import hdbscan
# Set the method
ce = clusteval(method='hdbscan')
# Evaluate
results = ce.fit(umap_rs_embed)
AttributeError                            Traceback (most recent call last)
<ipython-input-6-68904b3307b8> in <module>
      5 ce = clusteval(method='hdbscan')
      6 # Evaluate
----> 7 results = ce.fit(umap_rs_embed)

~\Desktop\scripting\apps\trajectory\lib\site-packages\clusteval\clusteval.py in fit(self, X)
    170 
    171         # Compute the dendrogram threshold
--> 172         if (self.cluster!='kmeans') and (self.results['labx'] is not None) and (len(np.unique(self.results['labx']))>1):
    173             # print(self.results['labx'])
    174             max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)

AttributeError: 'clusteval' object has no attribute 'results'

AttributeError: 'clusteval' object has no attribute 'results'

image
[clusteval] >Fit using agglomerative with metric: euclidean, and linkage: ward

AttributeError Traceback (most recent call last)
in
6
7 # Fit to find optimal number of clusters using dbscan
----> 8 results= ce.fit(X)
9
10 # Make plot of the cluster evaluation

~\anaconda3\lib\site-packages\clusteval\clusteval.py in fit(self, X)
167
168 # Compute the dendrogram threshold
--> 169 if (self.cluster!='kmeans') and (len(np.unique(self.results['labx']))>1):
170 # print(self.results['labx'])
171 max_d, max_d_lower, max_d_upper = _compute_dendrogram_threshold(self.Z, self.results['labx'], verbose=self.verbose)

AttributeError: 'clusteval' object has no attribute 'results'

TypeError: plot() got an unexpected keyword argument 'width'

[HDBSCAN] Estimated number of clusters: 10
[HDBSCAN] Silhouette Coefficient: 0.780
[clusteval] >Fin.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-35-8b8378daba75> in <module>
      8 ce = clusteval(method='hdbscan')
      9 ce.fit(X)
---> 10 ce.plot()
     11 #ce.scatter(X)

/usr/local/lib/python3.7/site-packages/clusteval/clusteval.py in plot(self, figsize)
    151         elif self.method=='hdbscan':
    152             import clusteval.hdbscan as hdbscan
--> 153             hdbscan.plot(self.results, width=figsize[0], height=figsize[1])
    154 
    155     # Plot

TypeError: plot() got an unexpected keyword argument 'width'

is pip not working anymore?

I get the following
WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnection
Error('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x0000024B20CAB400>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /simple/clusteval/
ERROR: Could not find a version that satisfies the requirement clusteval (from versions: none) ERROR: No matching distribution found for clusteval

"Attempt to get argmax of an empty sequence" error when Clustering dbscan

Hi, im having this error when using clusteval with cluster param "dbscan" with TFIDF, this is my code:

vectorizer = TfidfVectorizer(max_df=0.55,min_df=27)
X = vectorizer.fit_transform(grams)

svd = TruncatedSVD(int(X.shape[1] - 1))
normalizer = Normalizer(copy=False)
lsa = make_pipeline(svd, normalizer)
X = lsa.fit_transform(X)

ce = clusteval(cluster='dbscan')
ce.fit(X)
ce.plot()
ce.scatter(X)
ce.dendrogram()

This is the complete log error:

Traceback (most recent call last):
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/main/init.py", line 82, in
ce.fit(X.toarray())
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/clusteval.py", line 153, in fit
self.results = dbscan.fit(X, eps=None, epsres=50, min_samples=0.01, metric=self.metric, norm=True, n_jobs=-1, min_clust=self.min_clust, max_clust=self.max_clust, verbose=self.verbose)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/clusteval/dbscan.py", line 97, in fit
idx = np.argmax(silscores)
File "<array_function internals>", line 6, in argmax
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1188, in argmax
return _wrapfunc(a, 'argmax', axis=axis, out=out)
File "/Volumes/HD 2/Repositorios/author_profiling_db_scan/venv/lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: attempt to get argmax of an empty sequence

Method and metric

I sense potential in this package and I am inclined to use it in the future. Keep on the good work! Consider publishing the software in the SoftwareX journal.

I have a number of questions that I could not answer from the arguably short documentation:

  • It seems to me that I can either pick DBSCAN or say silhouette score, but not both at the same time. This seems odd to me because DBSCAN is a method whose results could be used with the silhouette score.
  • Related to that question: How are clusters evaluated if I pick DBSCAN or HDBSCAN, and how are clusters computed if I pick silhouette score or the Davies-Boulin index.
  • How could I choose different distance metrics to plug into e.g. DBSCAN or HDBSCAN?
  • How do I see which parameters got chosen?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.