Git Product home page Git Product logo

scca's Introduction

License: MIT DOI

SCCA: Spectral Clustering Correspondence Analysis in R

Introduction

The SCCA package implements in R the methodological approach to CA as proposed in Correspondence analysis, spectral clustering and graph embedding: applications to ecology and economic complexity van Dam et al; 2021.

Installation

The package can be installed directly from Github with the code below. Ensure the package devtools has been installed.

#install.packages("devtools")
library(devtools)
install_github("UtrechtUniversity/scca", build_vignettes = TRUE)

Documentation of exported functions and data set

After loading the package a list of all exported functions and data sets can be retrieved by ?SCCA and the documentation of an individual function by ?<function name>; e.g. ?scca_compute.

The methodology and the use of the functions and the data are explained in the included vignette. After installing package SCCA use browseVignettes('SCCA') in the R(Studio) console.

License

The software code is licensed under MIT. The next section (References) provides links to sources of the included datasets. See there for licences of those data sets.

References

Software

van Dam, Alje, Dekker, Mark, Morales-Castilla, Ignacio, Rodríguez, Miguel Á., Wichmann, David and Baudena, Mara (2021); Correspondence analysis, spectral clustering and graph embedding: applications to ecology and economic complexity; Scientific Reports; DOI: 10.1038/s41598-021-87971-9

Included data set

Faurby, Søren e.a; 2019; HYLACINE 1.2: The Phylogenetic Atlas of Mammal Macroecology

The team

The team members are:

  • Mathematical foundations of the code

    • Alje van Dam, Copernicus Institute of Sustainable Development and Centre for Complex Systems Studies, Utrecht University, the Netherlands
    • Mark Dekker, Department of Information and Computing Sciences and Centre for Complex Systems Studies, Utrecht University, the Netherlands
  • Programming and packaging

    • Kees van Eijden Research Engineering/ITS, Utrecht University, the Netherlands
  • With contributions of

    • Ignacio Morales Castilla, Global Change Ecology and Evolution Group, Department of Life Sciences, University of Alcala´, Spain
    • Jonathan de Bruin, Research Engineering/ITS, Utrecht University, the Netherlands
    • Raoul Schram, Research Engineering/ITS, Utrecht University, the Netherlands
    • Mara Baudena, National Research Council of Italy, Institute of Atmospheric Science and Climate (CNR-ISAC), Turin, Italy; Copernicus Institute of Sustainable Development and Centre for Complex Systems Studies, Utrecht University, the Netherlands

How to cite SCCA

To cite the SCCA repository and R package, use citation("SCCA") to retrieve the BibTex entry. Otherwise use the following format:

van Eijden, Kees et al; 2021; SCCA: Spectral Clustering Correspondence Analysis in R; Utrecht University; DOI: 10.5281/zenodo.4665670. Also available at Utrecht University.

Please also cite the paper van Dam et al, 2021 when using the SCCA repository.

scca's People

Contributors

aljevandam avatar baudenam avatar j535d165 avatar kveijden avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

baudenam kequach

scca's Issues

Flat file export of tree

We would like to export a flat file of the tree.

CSV files are in SurfDrive.

See python code (last function in the class).

stability

drop <- sample(ncol(carnivora), ncol(carnivora) %/% 10)
stability <- scca_stability_test(m = carnivora, drop_vars = drop)
Error in clustering_overlap(cl.x, cl.y, plot = plot) :
Clusterings x and y not from the same dataset and category.

Typo in vignette

@KvEijden

Hi Kees, I'm leaving an issue here regarding the vignette. It should be corrected as the part where it shows how to load the package is written:

library(sccar)

and should show instead:

library(SCCA)

Let me know if you want me to correct it.

Error decomp = svds

scca <- scca_compute(carnivora, decomp = 'svds')
Error in fun(A, k, nu, nv, opts, mattype = "dgCMatrix") :
nrow(A) and ncol(A) should be at least 3
In addition: Warning messages:
1: In fun(A, k, nu, nv, opts, mattype = "dgCMatrix") :
all singular values are requested, svd() is used instead
2: In fun(A, k, nu, nv, opts, mattype = "dgCMatrix") :
all singular values are requested, svd() is used instead

Disconnected columns and rows

Sometimes in the clustering process there will be sub matrices with disconnected columns and/or disconnected rows. How to handle these cases?

If disconnections occur, the function will print a warning to the console.

iterations

  1. Sometimes kmeans gives a warning: 'did not converge in 10 iterations'. How should the functions in the package handle this situation?
  2. Should parameter max.iter of kmeans be a parameter of scc_compute?

Coordinates of carnivora

Should we include the coordinates of carnivora? Is this part of the analysis?

Proposal:

  • If required, we ship two datasets (carnivora and carnivora_coords) to replace the list structure (complicated for non-R programmers).
  • If not required, we remove the coordinates and carnivora returns the matrix directly.

@KvEijden can you give feedback on this?

Raw datasets not available

The raw datasets are not available in the package. Therefore, it is not possible to reproduce the packaging. Can we access the datasets from the web?

normalization eigenvectors when computed using 'trick'

In SCCA_compute, when the eigenvalues are computed using the 'trick' (i.e. first doing decomposition of the 'small matrix' to obtain the matrix of vectors U and then obtaining the eigenvectors by pre-multiplying with D^-1 A, so that D^-1 A U = V. This gives the correct eigenvectors but not normalized properly. They should be normalized by setting v = v / (v^T D v ) (where D is the appropriate diagonal matrix with either row or column sums on the diagonal)

Technicalities to be dealt with considering eigenvalue computation and normalization (mostly for own reference, to be discussed)

  1. Normalization of eigenvectors: The normalization of the eigenvectors is currently done within the create_y function. It would be more efficient to normalize them directly when they are caclulated, so that eigenvectors are also normalized in other outputs. Perhaps even give the 'standardized' vectors (i.e. multiplie normalized eigenvalues by sqrt of eigenvalue).

  2. Computation of eigenvectors: This concerns also the question of symmetry of the Laplacian mentioned in the function. Looking at the rARPACK documentation, it might be computationally efficient to compute the eigenvectors from a symmetric matrix instead of the laplacian, and then post-processing them to obtain eigenvectors of L, as follows:

D^{-1/2}SD^{-1/2} = U \Lambda U^T, where U^T U = I
so that
L = D^{-1} S = D^{-1/2} U \Lambda U^T D^{1/2}
so the right eigenvectors are given by
V = D^{-1/2}U and then normalized setting v = v / sqrt(v^T D v) so that V^T D V = I (as required). Normalization is unnecessary if rARPACK returns normalized eigenvectors (to be checked)

  1. calculating embedding space: the function compute_y then only needs to create a matrix who's columns exist of eigenvector 1 to k (dismissing the zero'th (trivial) eigenvector), and possibly weigh the columns by the sqrt of their eigenvalues (depending on whether we scale them in 1 or not).

Error when plotting overlap

Error: Can't convert to .
Run rlang::last_error() to see where the error occurred.
15.
stop(fallback)
14.
signal_abort(cnd)
13.
abort(message, class = c(class, "vctrs_error"), ...)
12.
stop_vctrs(message, class = c(class, "vctrs_error_incompatible"),
x = x, y = y, details = details, ...)
11.
stop_incompatible(x, y, x_arg = x_arg, y_arg = y_arg, details = details,
..., message = message, class = c(class, "vctrs_error_incompatible_type"))
10.
stop_incompatible_type(x = x, y = to, ..., x_arg = x_arg, y_arg = to_arg,
action = "convert", details = details, message = message,
class = class)
9.
stop_incompatible_cast(x, to, x_arg = x_arg, to_arg = to_arg,
vctrs:::from_dispatch = match_from_dispatch(...))
8.
vec_default_cast(x = x, to = to, x_arg = x_arg, to_arg = to_arg,
vctrs:::from_dispatch = vctrs:::from_dispatch, vctrs:::df_fallback = vctrs:::df_fallback,
vctrs:::s3_fallback = vctrs:::s3_fallback)
7.
(function ()
vec_default_cast(x = x, to = to, x_arg = x_arg, to_arg = to_arg,
vctrs:::from_dispatch = vctrs:::from_dispatch, vctrs:::df_fallback = vctrs:::df_fallback,
vctrs:::s3_fallback = vctrs:::s3_fallback))()
6.
vec_cast(fill, val)
5.
pivot_wider_spec(data, spec, !!id_cols, names_repair = names_repair,
values_fill = values_fill, values_fn = values_fn)
4.
tidyr::pivot_wider(data = overlap_xy, names_from = .data$cluster.y,
names_prefix = "y_", values_from = .data$edge, values_fill = list(edge = 0)) at scca_overlap_test.R#187
3.
plot_overlap(cl.xy) at scca_overlap_test.R#157
2.
clustering_overlap(cl.x, cl.y, plot = plot) at scca_overlap_test.R#116
1.
scca_overlap_test(x = scca, y = scca1, plot = TRUE)

List of required narrative

By Alje:

  • Non-technical project description (5-10 lines) (README, Tutorial)
  • Description of the software/package (+-5 lines) (readme, DESCRIPTION, ...)
  • Single line description of the software (DESCRIPTION)
  • Description, format and references of the Carnivores dataset. See iris.
  • Information about the researchers/faculties/CCSS

License

Under what license conditions will we publish our package?

Validity

Warning messages:
1: In max(distance[c1, c1]) : no non-missing arguments to max; returning -Inf
2: In min(distance[c1, c2]) : no non-missing arguments to min; returning I

in:validity <- scca_validity_test(scca = scca_species, dist = d_species)

Problems with exports dataset because there are no rownames

Hi, I succeeded in installing the package. I thought I'd just list here what I ran into:

  • I can load the data 'carnivora' and 'exports'

  • The exports dataset seems to contain only the labels (products in entry 1 and countries in entry 2)

  • Upon running scca_compute on the exprots I get back the labels.

  • Upon running scca_compute on the carnivora I get the error message 'M must contain row and
    column labels'

Let me know if this helps..I might be looking at and older version?

Add dataset Carnivores to package

Put dataset in folder data/.
The format should be .RData.
Do not include the CSV. But include a snippet to convert the data (for example in this issue).

k-means nstart parameter

It look like we could do a bit better on the variability of K-means by setting the nstart parameter to x (which reruns k-means x times and picks the best solution).

Sensitivity tests can then still be done to see how variable te clustering obtained is.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.