This project is a Hadoop/Mapreduce implementation of the K-nearest neighbor similarity algorithm. Our main contribution was parallelizing KNN's training method for use in MapReduce.
- smalltest.txt - A small database to test with. Can be run through the entire process.
- chunkit.py - Python script to chunk up a database file to be consumed by MapReduce.
k is the number of similarities per song to generate. r is the minimum number of ratings a similarity should have to be valid.
Sequential neighborhood generator:
KDD-Music-Recommender.jar -k N -r N database
MapReduce neighborhood generator:
hadoop jar KDD-Music-Recommender.jar -p [-k N] dirContainingChunks output
Query the neighborhood file:
KDD-Music-Recommender.jar -q -t D -n neighborhoodFile -u activeUserFile database