Git Product home page Git Product logo

mal_reccos's Introduction

Anime Recommendation end to end deployment

NCF

What is NCF?

  • Collaborative filtering, a very commonly used method in recommendation systems, essentially decomposes a matrix of users and their ratings or interactions with items into the product of two matrices. By doing so, one can obtain embeddings of items and users with respect to some latent feature dimension. This allows for an understanding of users and items in the context of this latent dimension, and extrapolation of similarity for items or users that have not yet interacted. image Chupakhin, Andrei & Kolosov, Alexey & Smeliansky, Ruslan & Antonenko, Vitaly & Ishelev, G.. (2020). New approach to MPI program execution time prediction.
  • Item user interactions in this latent space can be more complex than just the dot product operation used to concatenate the embeddings. Neural collaborative filtering adds a feedforward neural network on top of the embeddings to represent a more complex, nonlinear function.
  • The following neural collaborative filtering model was implemented, following guidance of the original paper: image arXiv:1708.05031

Code for the neural network implementation can be found here: KAGGLE NOTEBOOK


Serverless Orchestration

image

In order to create a machine learning powered anime recommendation website, the following infrastructure was implemented using the serverless framework. This framework allows for easy deployment of AWS lambda services with infrastructure as code style deployment.

To hold sklearn, numpy and pandas libraries, a docker image was created and pushed to ECR. The Lambda was configured to run on this container to allow for encoding of incoming anime queries to the embedding space of the NCF model. From there, pandas and numpy functions were used to calculate the top n most similar anime in the dataset.

Because the pandas dataset load was expensive, caching was used to make multiple requests after the Lambda warmed up much faster.


LLM

As an extra exploration step, HuggingFace's sentence-transformer library was used to encode anime synopses as large language model numerical vectors. This allowed for a similarity score to be computed pairwise between all of the anime in the dataset, that would be based on a language model's understanding of the text similarity. So instead of user based ratings implemented in the NCF, the similarity of the text of the synopsis was used as an alternative recommendation style.


Similarity score

Dot product was used for NCF similarity score. Cosine similarity was used for LLM similarity. Cosine similarity is simply dot product normalized by the scalar product of two matrices: image Almatrooshi, Fatima & Akour, Iman & Alhammadi, Sumayya & Shaalan, Khaled & Salloum, Said. (2020). A Recommendation System for Diabetes Detection and Treatment. 10.1109/CCCI49893.2020.9256676.

  • This is a desirable metric for word vector embeddings if we do not want to consider the magnitude, or occurrence of a word.
  • Dot product was used for the NCF approach because it results in a faster calculation. Because the user-item dataset fed to the NCF model was just binary in data (1 = interaction, 0 = no interaction), the resulting embeddings were already normalized.

mal_reccos's People

Contributors

ubitquitin avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.