Git Product home page Git Product logo

bigclust's Introduction

BigClust - Stochastic Nonsmooth Optimization based Incremental Clustering Software (version 0.1)

BigClust is a nonsmooth optimization based clustering algorithm for solving the minimum sum-of-squares clustering (MSSC) problem in very large-scale and big data sets. BigClust consist of two different algorithms: an incremental algorithm is used to solve clustering problems globally and at each iteration of this algorithm the stochastic limited memory bundle algorithm (SLMBA) is used to solve both the clustering and the auxiliary clustering problems with different starting points. In addition to the k-partition problem, BigClust solves all intermediate l-partition problems where l=1,…,k-1 due to the incremental approach used.

Files included

  • bigclust.f95

    • Mainprogram for clustering software
  • initbigclust.f95

    • Initialization of clustering parameters and SLMBA. Includes modules:
      • initclust - Initialization of parameters for clustering.
      • initslmb - Initialization of SLMBA.
  • clusteringmod.f95

    • Subroutines for clustering software.
  • slmb.f95

    • SLMBA - Stochastic limited memory bundle algorithm.
  • objfun.f95

    • Computation of the cluster function and subgradients values.
  • subpro.f95

    • Subprograms for SLMBA.
  • parameters.f95

    • Parameters. Inludes modules:
      • r_precision - Precision for reals,
      • param - Parameters,
      • exe_time - Execution time.
  • Makefile

    • makefile: requires a Fortran compiler (gfortran) to be installed.

Installation and usage

To use the code:

  1. Modify initbigclust.f95 as needed. The least, select the dataset, give the number of data points, features, and the maximum number of clusters "nclust".
  2. Run Makefile (by typing "make"). Makefile uses gfortran as default.
  3. Finally, just type "./bigclust".

The algorithm returns a txt-file with clustering function values, Dunn and Davies-Bouldin validity indices and elapsed CPU-times up to nclust clusters. In addition, separate txt-file with the final cluster centers with nclust clusters and the solutions to all intermediate l-clustering problems with l = 1,...,nclust-1 is returned.

References:

Acknowledgements

The work was financially supported by the Research Council of Finland projects (Project No. #345804 and #345805) led by Antti Airola and Tapio Pahikkala.

bigclust's People

Contributors

napsu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.