Git Product home page Git Product logo

joshasgard / article-recommendation-engine Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 3.0 4.63 MB

Built article recommendation engines for IBM Watson's platform users. Used collaborative filtering, content-based, ranking and knowledge-based techniques.

HTML 75.46% Jupyter Notebook 23.91% Python 0.62%
recommendation-engine coldstartchallenge nlp collaborative-filtering content-based-recommendation knowledge-based-recommendation

article-recommendation-engine's Introduction

Article Recommendation Engine for IBM Watson Studio Platform

Article recommendation engines built for IBM Watson's platform users with collaborative filtering, content-based, ranking and knowledge-based techniques!

Tools๐Ÿ“ขโœ๏ธ

  • Python 3.7 and above, Jupyter Notebook
  • Pandas & Numpy for data manipulations
  • Matplotlib for charts
  • pickle for imports
  • RE and NLTK for natural language processing (tokenization)
  • Sklearn for word vectorization and term frequency-inverse document frequency transformation

File Descriptions ๐Ÿ“‚

  • data folder - every information on every article available to users on the platform is contained in articles_community.csv while tracked 'user-item-interactions.csv` holds all data of how over 5000 users interacted with 714 articles on the platform.
  • Recommendations_with_IBM.ipynb - raw Jupyer Notebook containing all Python codes showcasing the building process and end results.
  • 'project_tests.py` - python script for testing functions in notebook
  • top_10.p, top_20.p and top_5.p - pickled data of from notebook of top 10, 20 and 5 articles on the platform
  • user-item-matrix.p - pickled user to article matrix.

Summary ๐ŸŽฏ

As I mentioned earlier, I built out different article recommendation systems for the IBM Watson Studio Platform for users on the platform to receive more personalised article recommendations and to mitigate the cold start problem for new users. The Jupyter Notebook containing this work has been divided into sections to improve the readability and structure.

Exploratory Data Analysis

I explored the data provided by IBM for the project to answer the following questions:

  • What is the distribution of how many articles a user interacts with in the dataset?
  • The number of unique articles that have an interaction with a user.
  • The number of unique articles in the dataset (whether they have any interactions or not).
  • The number of unique users in the dataset. (excluding null values)
  • The number of user-article interactions in the dataset.
  • The most viewed article_id

Rank-based Recommendations

One way to create recommendations is to rank items (articles) based on existing user rating, but the data provided does not have such information. Hence, we can only assume that the most popular articles are the ones with the most user interactions, i.e. how often an article was interacted with. This means we created functions to:

  • return the names of top articles with most interactions
  • return the IDs of these top article

User-User Based Collaborative Filtering

In order to build better recommendations for the users of IBM's platform, we can explore the similarity of users on the platform based on the range of articles they have interacted with, then we can make recommendations based off of the differences in the most similar users. This is a step towards making more personalised recommendations for the users on the platform. Our measure of similarity between the users was found by simply taking vector dot products between users. Here are the major steps I followed to achieve this:

  • Re-formatted user-items interaction data to show unique users and articles
  • Created a boolean matrix between each user and each article (user-item pairs) to show interaction. 1 for interation, 0 otherwise
  • Found similar users based on similarity of their interactions
  • Returned article names based on user IDs
  • Made recommendations by defining a User-user collaborative Filtering function.

Content-based Recommendations with NLP

Since we have sufficient content information about the articles in our articles_community.csv file, there are different ways we can use the article content to find similarities between the articles so we can make recommendations to interested users.

  • Using article titles to find similarity
  • Using article description to find similarity
  • Using entire article body to find similarity
  • Using any combination of the 3 above.

Here, I have opted for a combination of the article titles and description to determine which articles are most similar. I implemented this as follows:

  • Wrote a Tokenization function to tokenize the title of each article
  • Created df_merged to combine all the articles available on the platform, whether or not any user interacted with the article
  • Found similar articles function using the output vector of tfidf class to determine which articles are the most similar to the article
  • Wrote a function that performs content recommendations

Matrix Factorization with ML for COLD-START

Lastly, I completed a machine learning approach to building recommendations. Using the user-item interactions, I built out a matrix decomposition with SVD. Using this decomposition, I evaluated how accurate our predictions might be for articles that new users might interact. Finally, I discussed which methods we might use for recommending articles on the platform to new users and established users on the IBM Watson Studio Platform.

Licence ๐Ÿ“ƒ

The MIT License (MIT)

Copyright (c) 2021 Idowu Odesanmi. All rights reserved.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE..

Credits ๐Ÿš€

article-recommendation-engine's People

Contributors

joshasgard avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.