Git Product home page Git Product logo

rakuteam's Introduction

Multimodal Product Data Classification


In order to complete training as a Data Scientist, we developped this project as a team of 4 people.
For this [contest organized by ENS](https://challengedata.ens.fr/participants/challenges/35/), we worked on the classification of e-commerce articles by developping and aggregating several models.
The data provided for each article included both some text(title and description) and a picture.

Demo

Visit our Streamlit demo here
Features:

  • Predict the classification of a random article (or even an article loaded from Amazon/Rakuten, or manually inputted)
  • Calculate the probablities using your own combination of all 3 models
  • Explore the dataset with a dynamic EDA
  • ...
  • ...
    Page Preview:



Dataset

99 000 articles (85 000 in train + 14 000 in test) and 27 categories
Each article includes:
text data (2 fields: description and title) text data
one picture picture data

EDA

15 most frequent words from the description field for category/class #1560 15 most frequent words from the description field for category/class #1560
15 random images for category/class #10 15 random images for category/class #10

Model 1: Random Forest

Features engineering

tf-idf for category/class 1281 tf-idf for category/class 1281
frequency of regular expressions for each category frequency of regular expressions
% of pixels in green for each category % of pixels in green

Best Model selected




Result: Accuracy 0.77

Model 2: Convolutional neural network (on the pictures)


Result: Accuracy 0.58

Model 3: Dense neural network (on the text)


Result: Accuracy 0.82

Final Model: Voting Classifier between all the 3 weighted models

Result: Accuracy 0.84

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.