Git Product home page Git Product logo

clusterix's Introduction

alt tag

Clusterix: A visual analytics approach to data clustering

Clusterix is a web-based visual analytics tool that aspires to support clustering tasks by users, while having analysts at the center of the workflow. Clusterix provides the facilities to:

  • Load and preview CSV data files;
  • create a 2D projection of the dataset
  • select any combination of fields to be used for projection/clustering;
  • select and run one or more clustering algorithms (K-Means, Agglomerative Clustering, Mean Shift) with varying parameters;
  • view and interact with the results in a browser environment;
  • save time and use an iterative approach;
  • modify the parameters or input data to correct the clustering output.

Such an iterative, visual analytics approach allows users to quickly determine the best clustering algorithm and parameters for their data, and to correct any errors in the clustering output. Clusterix has been applied to the clustering of heterogeneous data sets

Usage

First you need to install the requirements:

pip install -r requirements.txt

To run the project:

python manage.py runserver

This command will run Clusterix on http://127.0.0.1:5000 where you will be able to use the interface to upload data files, and select the algorithms/options that you want.

Features

File input (CSV only currently)

  • Data Preview
  • Field selection
  • Text Features (Vectorizers, stemming, stopwords, etc)

Vectorizers

  • Count Vactorizer
  • Tf-Idf Vectorizer
  • Hashing Vectorizer

Decomposition

  • PCA
  • SVD
  • MDS
  • t-SNE

Algorithms

  • K-Means
  • Agglomerative Clustering
  • Mean Shift
  • DBSCAN

Plot Features

  • Scatterplot vizualizations
  • Full text/column search for the nodes
  • Brushing and zoom for targeted inspection
  • Various clustering metrics (TF-IDF, etc)

Instructions

Clusterix works iteratively, so there are certain steps that need to be followed:

  • Upload a data file. the necessary information/preprocessing will happen and the options will be shown
  • First you need to get a projection of the data, so use all the text and field options to tune your decomposition.
  • The decomposition model and the coordinates are saved, so that you can iterate through clustering models really fast.
  • In case you need to try a new decomposition, create a new projection.
  • Use brushing to get TF-IDF (if applicable) and a zoomed area for browsing.
  • The Search function works using the SQLite syntax, so everytime you want to write something imagine that it starts like this: SELECT * FROM dataframe WHERE...

Screenshots

Wine Data

alt tag

alt tag

clusterix's People

Contributors

eamonnmag avatar lilykos avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

clusterix's Issues

ValueError: Length of values does not match length of index

Using the attached with settings: fields (Content), PCA, Euclidean, Count Vectorizer, k-means, 5 clusters.

(2017-10-12 08:31:54) - INFO	127.0.0.1 - - [12/Oct/2017 08:31:54] "POST /get_clustering_results HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/app.py", line 1997, in __call__
    return self.wsgi_app(environ, start_response)
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/app.py", line 1985, in wsgi_app
    response = self.handle_exception(e)
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/app.py", line 1540, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/app.py", line 1982, in wsgi_app
    response = self.full_dispatch_request()
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/app.py", line 1614, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/app.py", line 1517, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/_compat.py", line 33, in reraise
    raise value
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/Users/brray/Documents/Exxon/Demo/clusterix-master/src/routes/projections.py", line 17, in get_clustering_results
    return jsonify(**{'results': cluster_data(attrs)})
  File "/Users/brray/Documents/Exxon/Demo/clusterix-master/src/config.py", line 43, in wrapper
    return func(*args)
  File "/Users/brray/Documents/Exxon/Demo/clusterix-master/src/cluster/process.py", line 38, in cluster_data
    df['clx_cluster'] = labels
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/pandas/core/frame.py", line 2429, in __setitem__
    self._set_item(key, value)
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/pandas/core/frame.py", line 2495, in _set_item
    value = self._sanitize_column(key, value)
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/pandas/core/frame.py", line 2666, in _sanitize_column
    value = _sanitize_index(value, self.index, copy=False)
  File "/Users/brray/anaconda2/anaconda/envs/py3k/lib/python3.5/site-packages/pandas/core/series.py", line 2879, in _sanitize_index
    raise ValueError('Length of values does not match length of ' 'index')
ValueError: Length of values does not match length of index```

[out_emails_labled.csv.zip](https://github.com/Lilykos/clusterix/files/1379398/out_emails_labled.csv.zip)

truncated words in UI

Using Chrome Version 61.0.3163.100 (Official Build) (64-bit) on Mac OX El Capitan 10.11.16.

screen shot 2017-10-03 at 9 17 16 pm

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.