brianwu-s / dimentionality_reduction Goto Github PK
View Code? Open in Web Editor NEWNowadays unstructured high-dimensional data like video, audio, text and images has become hot topics in mining research. However, high-dim data is often accompanied with the problem of substantial computation cost and low training efficiency. Otherwise high dimension brings about sparseness of data space representations, making it more likely to be overfitted. As a consequence, dimensionality reduction has to be applied to the preprocess of data. In this report, we try nine different dimensionality reduction methods, including selection by variance, Random Forest, PCA, kernel PCA, LDA, AE, VAE, t-SNE, Umap. Then we made overall comparisons between performance of various approaches and hyperparameters. The experiment on AwA2 dataset shows that LDA gets attains the most efficient performance with 0.93 accuracy and only 49 dimensions, while PCA with sigmoid kernel function reaches the best accuracy 0.935 but reduces dimension barely to 1024.
License: MIT License