Git Product home page Git Product logo

perfume_recommender's Introduction

Perfume Recommender System

Overview

A recommendation system, also known as a recommender system, is a class of algorithms and techniques used in information filtering and decision-making processes. Its main purpose is to provide personalized suggestions or recommendations to users for items they might be interested in. These items could be products, services, movies, music, articles, or anything else that can be recommended based on user preferences and behavior. In this work, a perfume recommendation system will be built based on the scent description.

Objectives

Create a recommendation app based on perfume scents.

Note: Online app can be viewed on https://huggingface.co/spaces/rdemarqui/perfume_recommender

Tecnologies Used

  • python 3.9.16
  • pandas 1.5.3
  • numpy 1.22.4
  • sklearn 1.2.2
  • scipy 1.10.0
  • gradio 3.39.0

About the Data

The data was scrapped from the site fragrantica.com [1]. The dataset contains 2.570 unique brands, 36.969 perfumes, and 2.145 different scents.

Methodology

Fragrance notes are the individual scent layers of ingredients that make up a fragrance. They are the building blocks of a fragrance and contribute to its overall scent profile. Fragrance notes are typically categorized into three main types: top notes, heart notes (also known as middle or mid notes), and base notes. Each note plays a specific role in the fragrance’s development and longevity [2].

For this work, we grouped the scents in a single description, given that not all samples in the dataset have the scent division by notes, even though such division influences the performance and, consequently, the real similarity between perfumes.

The first task of text cleanup was relatively simple, as the fields on the site were standardized. We just had to remove some special characters, leave the text in lowercase and group the note subcategories (top, middle, and base) into a single category. After this step, the perfume scents were vectorized using the bag of words method from sklearn. We chose this method for its simplicity and good results for this specific case.

The cosine method was used for similarity calculation. Due to memory limitations for storing the resultant matrix (36969 x 36969), we used the sparse output method. Our application gonna use a pre-calculated base of similarities, for this reason, we saved two matrices: One for perfume index vect_index and another for similarities values vect_values resulting from the cosine calculation. We saved both in pickle format.

Finally, we used Gradio to build a demo app, capable of performing the recommendation task based on brand and perfume name choices.

Results and Conclusions

Even with limited resources and some assumptions adopted, it was possible to create a pretty decent recommend system, capable of delivering acceptable results. Below, is a print of the developed App.

This work came from a personal need, where a perfume that I like (Hugo Boss Soul) stopped being marketed. Unfortunately, I couldn't find the first place in the list generated (Shirley May - Compass), but the second place can be easily found (Carolina Herrera - 212 Men White). I'll probably give it a chance...

Future improvements proposal: As mentioned above, the division of scents into notes is crucial for the characteristics of perfumes. Unfortunately, the dataset used didn't have this division for all elements. Certainly, a richer dataset could yield better results. Thinking about text embedding, for this solution, the simplest form of vectorization (bag of words) was employed, but other methodologies could also be tested, such as TF-IDF, Word2vec, FastText, Doc2vec, and even transformers.

References

perfume_recommender's People

Contributors

rdemarqui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

perfume_recommender's Issues

Ratings?

Hey, did you scrap ratings? No column in the database :(

Fragrantica Scrapping

Hi, how did you managed to scrap data from fragnantica with all those limits? I was trying different approaches, but without good results. I'd like to scrap whole database, so around +90 perfumes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.