eugenelin89 / recommender_content_based Goto Github PK
View Code? Open in Web Editor NEWA Recommender System Prototype using Content Based Filtering
A Recommender System Prototype using Content Based Filtering
Recommendation System with Content Based Filtering Please see project page at: http://eugenelin89.github.io/recommender_content_based/ SETUP The system is built with LensKit, an open-source took kit for building recommenders. Requires the following: Java SE Development Kit 7 (http://www.oracle.com/technetwork/java/javase/downloads/index.html) LensKit (http://lenskit.grouplens.org) Apache Maven (https://maven.apache.org) Java IDE such as Eclipse (http://www.eclipse.org/) or IntelliJ IDEA (http://www.jetbrains.com/idea/). BACKGROUND This recommendation system prototype uses content-based filtering. For detailed background, please refer to: http://recommender-systems.org/content-based-filtering/ The algorithm implemented in this prototype is composed of the following steps: 1. Compute item-tag vectors (the model) Implemented in the model builder (TFIDFModelBuilder, in the get() method), compute the unit-normalized TF-IDF vector for each movie in the data set. The model contains a mapping of item IDs to TF-IDF vectors, normalized to unit vectors, for each item. 2. Build user profile for each query user The makeUserVector(long) method of TFIDFItemScorer takes a user ID and produces a vector representing that user's profile. In the original implementation, the profile was the sum of the item-tag vectors of all items the user has rated positively (>= 3.5 stars). This approach was later improved with weighted user profile (with the older implementation commented out for reference). Weighted profile is computed with weighted sum of the item vectors for all items, with weights being based on the user's rating. 3. Generate item scores for each user The heart of the recommendation process in many LensKit recommenders is the score method of the item scorer, in this case TFIDFItemScorer. This method scores each item by using cosine similarity: the score for an item is the cosine between that item's tag vector and the user's profile vector. TEST DRIVE A set of test data is provided for movie ratings, but can be easily adopted for other domains. data/movie-tags.csv Attributes associated with each movie. This is the basis of the model generated in step 1. data/movie-titles.csv Maps Movie IDs to Movie Titles. data/ratings.csv Users and their movie ratings. Each line of the CSV file is ordered as: User ID, Movie ID, Rating data/users.csv Maps User ID to User Name. The test data is injected into the system in CBFMain.java in the method configureRecommender(). Run the recommender with command similar to the following, where the arguments are the user IDs: runecbf 4045 144 3855 1637 2919 With the above command, the following output is produced. recommendations for user 4045: 807: 0.1932 63: 0.1438 187: 0.0947 11: 0.0900 641: 0.0471 recommendations for user 144: 11: 0.1394 585: 0.1229 671: 0.1130 672: 0.0878 141: 0.0436 recommendations for user 3855: 1892: 0.2243 1894: 0.1465 604: 0.1258 462: 0.1050 10020: 0.0898 recommendations for user 1637: 393: 0.1976 24: 0.1900 2164: 0.1522 601: 0.1334 5503: 0.0992 recommendations for user 2919: 180: 0.1454 11: 0.1238 1891: 0.1172 424: 0.1074 2501: 0.0973
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.