Git Product home page Git Product logo

article-recommendation-engine's Introduction

Building a Article Recommendation Engine using Word2vec and Stanford's GloVe

Introduction

In this project, I created a simple yet effective article recommendation engine using Word2vec, a popular word embedding technique, and pre-trained word vectors from Stanford's GloVe project, based on a Wikipedia dataset. By leveraging these powerful tools, we can provide article recommendations based on semantic similarities in content.

Methodology

  1. Obtain pre-trained word vectors: I use the pre-trained word vectors from Stanford's GloVe project , which were trained on a Wikipedia corpus. These vectors will enable us to represent words in a high-dimensional space, capturing their semantic relationships.

  2. Prepare the dataset: For this project, I utilize a collection of BBC articles. After obtaining the dataset, I processed the text articles and organize them into a convenient table format (list of lists) for further analysis.

  3. Compute document centroids: For each article, I calculate its centroid in the high-dimensional word vector space. The centroid is the sum of the word vectors of all words in the article divided by the total number of words. By doing this, we can represent the overall semantic meaning of the article.

  4. Measure similarity: To find related articles, I compute the distance between the centroids of different documents. Articles with centroids close to each other in the high-dimensional space are considered semantically related or similar.

  5. Build the web server: I develop a web server that displays a list of BBC articles. The server will be accessible at http://localhost:5000 for local testing or the IP address of an Amazon server for deployment.

  6. Integrate the recommendation engine: The web server will use the recommendation engine to suggest similar articles based on the user's selection. When a user clicks on an article, the server will display a list of recommended articles with the closest centroids to the selected article.

Delivery

An example will look like this:

Clicking on one of those articles takes you to an article page that shows the text of the article as well as a list of five recommended articles:

Main functions of the algorithms are implemented in doc2vec.py.

Walkthrough of the functions is in walkthrough.ipynb.

Reference:

Standalone WSGI Containers

Efficient Estimation of Word Representations in Vector Space

Word similarity and relationships

GloVe: Global Vectors for Word Representation

article-recommendation-engine's People

Contributors

hxu47 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.