- KDD Biotrain Dataset
- Worms Dataset
- KDD Biotrain Dataset
- Worms Dataset
- KDD Biotrain Dataset
- Worms Dataset
- K-medoids clustering is performed on artificial dataset because of its expensive runtime
- Artificial Dataset (dataset visualization)
- All sampling methods
- The directory structure is as follows -
- There are 4 directories for each clustering algorithm
β Averaged_Data_collection.xlsx
β README.md
β
ββββbisecting
β kdd_coresets.py
β kdd_leverage.py
β kdd_uniform.py
β kdd_volume.py
β worms_coresets.py
β worms_leverage.py
β worms_uniform.py
β worms_volume.py
β
ββββimages
β β artificial.png
β β kmedoids.png
β β
β ββββkdd
β β bisectingcoresets.png
β β bisectingvol_lev.png
β β kcenterall.png
β β kmeanscoresets.png
β β kmeansvol_lev.png
β β
β ββββworms
β bisectingall.png
β kcenterall.png
β kmeansall.png
β
ββββkcenter
β kdd_coresets.py
β kdd_leverage.py
β kdd_uniform.py
β kdd_volume.py
β worms_coresets.py
β worms_leverage.py
β worms_uniform.py
β worms_volume.py
β
ββββkdd
β bio_train.dat
β kdd_reduced.pickle
β kdd_reduced_1k.pickle
β kdd_reduced_20k.pickle
β kdd_reduced_30k.pickle
β kdd_reduced_40k.pickle
β
ββββkmeans
β kdd_coresets.py
β kdd_leverage.py
β kdd_uniform.py
β kdd_volume.py
β worms_coresets.py
β worms_leverage.py
β worms_uniform.py
β worms_volume.py
β
ββββkmedoid
β artificial_all.py
β
ββββworms
README.txt
worms_2d.png
worms_2d.txt
worms_64d.txt
worms_reduced.pickle
worms_reduced_20k.pickle
worms_reduced_30k.pickle
worms_reduced_40k.pickle
-
Lightweight coresets outperform all other sampling techniques in each combination ofdatasets and clustering algorithms.
-
For the KDD dataset, Leverage sampling performs better than Volume sampling, in eachcombination of KDD dataset and clustering algorithms.
-
For the Worms dataset, Volume sampling performs better than Leverage sampling, in eachcombination of Worms dataset and clustering algorithms.
-
Although lightweight coresets were designed for kmeans, they show a good performance onthe kcenters algorithm as well, beating rest of the sampling techniques.