This is an implemenation of the KMeans++ algorithm.
The original authors of the algorithm are: https://github.com/TU-Berlin-DIMA/IMPRO-3.SS14
- Yuwen Chen
- Mingliang Qi
- Mingyuan Wu
Author of rebuild for Flink 0.8.1:
- Jonathan Hasenburg
The algorithm can be executed as "double" or "bagOfWords".
The following parameters are necessary:
- The first parameter must contain whether to start the "double" or the "bagOfWords" variant.
- The second parameter must contain the dataPath.
- The third parameter must contain the outputPath.
- The fourth parameter must contain k.
- The fifth parameter must contain numIterations.
The included unit tests use generated data sets. You can generate sample datasets for yourself by executing algorithm.util.GenerateDouble or algorithm.util.GenerateBagOfWords. The sample datasets can be found in tmp/double or tmp/bagOfWords.