Git Product home page Git Product logo

clustrviz's Introduction

TravisCI Build Status AppVeyor Build Status codecov Coverage Status License: GPL v3 CRAN_Status_Badge Project Status: Active โ€“ The project has reached a stable, usable state and is being actively developed.

clustRviz

clustRviz aims to enable fast computation and easy visualization of Convex Clustering solution paths.

Installation

You can install clustRviz from github with:

# install.packages("devtools")
devtools::install_github("DataSlingers/clustRviz")

Note that RcppEigen (which clustRviz internally) triggers many compiler warnings (which cannot be suppressed per CRAN policies). Many of these warnings can be locally suppressed by adding the line CXX11FLAGS+=-Wno-ignored-attributes to your ~/.R/Makevars file.

clustRviz also depends on the development version of the gganimate package, which is not yet on CRAN, but may be found here.

Quick-Start

There are two main entry points to the clustRviz package, the CARP and CBASS functions, which perform convex clustering and convex biclustering respectively. We demonstrate the use of these two functions on a text minining data set, presidential_speech, which measures how often the 44 U.S. presidents used certain words in their public addresses.

library(clustRviz)
data(presidential_speech)
presidential_speech[1:6, 1:6]
#>                     amount appropri  british     cent commerci commission
#> Abraham Lincoln   3.433987 2.397895 1.791759 2.564949 2.708050   2.079442
#> Andrew Jackson    4.248495 4.663439 2.995732 1.945910 3.828641   3.218876
#> Andrew Johnson    4.025352 3.091042 2.833213 3.332205 2.772589   2.079442
#> Barack Obama      1.386294 0.000000 0.000000 1.386294 0.000000   0.000000
#> Benjamin Harrison 4.060443 4.174387 2.302585 4.304065 3.663562   3.465736
#> Calvin Coolidge   3.713572 4.094345 1.386294 3.555348 2.639057   1.609438

Clustering

We begin by clustering this data set, grouping the rows (presidents) into clusters:

carp_fit <- CARP(presidential_speech)
#> Pre-computing weights and edge sets
#> Computing Convex Clustering [CARP] Path
#> Post-processing
print(carp_fit)
#> CARP Fit Summary
#> ====================
#> 
#> Algorithm: CARP (t = 1.05) 
#> Fit Time: 0.148 secs 
#> Total Time: 0.443 secs 
#> 
#> Number of Observations: 44 
#> Number of Variables:    75 
#> 
#> Pre-processing options:
#>  - Columnwise centering: TRUE 
#>  - Columnwise scaling:   FALSE 
#> 
#> Weights:
#>  - Source: Radial Basis Function Kernel Weights
#>  - Distance Metric: Euclidean
#>  - Scale parameter (phi): 0.01 [Data-Driven]
#>  - Sparsified: 4 Nearest Neighbors [Data-Driven]

The algorithmic regularization technique employed by CARP makes computation of the whole solution path almost immediate.

We can examine the result of CARP graphically. We begin with a standard dendrogram, with three clusters highlighted:

plot(carp_fit, type = "dendrogram", k = 3)

Examing the dendrogram, we see two clear clusters, consisting of pre-WWII and post-WWII presidents and Warren G. Harding as a possible outlier. Harding is generally considered one of the worst US presidents of all time, so this is perhaps not too surprising.

A more interesting visualization is the dynamic path visualization, whereby we can watch the clusters fuse as the regularization level is increased:

plot(carp_fit, type = "dynamic_path")

clustRviz also provides interactive dendrograms using the heatmaply package:

plot(carp_fit, type = "js")

BiClustering

The use of CBASS for convex biclustering is similar, and we demonstrate it here with a cluster heatmap, with the regularization set to give 3 observation clusters:

cbass_fit <- CBASS(presidential_speech)
#> Pre-computing column weights and edge sets
#> Pre-computing row weights and edge sets
#> Computing Convex Bi-Clustering [CBASS] Path
#> Post-processing rows
#> Post-processing columns
plot(cbass_fit, k.row = 3)

Setting type = "js" gives the traditional cluster heatmap:

plot(cbass_fit, type = "js", k.row = 3)

By default, if a regularization level is specified, all plotting functions in clustRviz will plot the clustered data. If the regularization level is not specified, the raw data will be plotted instead:

plot(cbass_fit, type = "js")

More details about the use and mathematical formulation of CARP and CBASS may be found in the package documentation.

clustrviz's People

Contributors

traviscibot avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.