Git Product home page Git Product logo

wtsp's Introduction

Where To Sell Products

Author: Luis Eduardo Ferro Diez [email protected]

This repository contains all my work for my MsC in Computer Science project.

Where To Sell Products (wtsp) is a project in which I try to solve this very same question. The idea is to characterize geographic areas in terms of its relationship with a selected set of products. The relationship is derived from geotagged texts, i.e., I gathered geotagged text data from Twitter and product reviews from Amazon. For the former I generated spatial clusters from which all the tweets are aggregated to form a single cluster-corpus. For the latter I trained a convolutional neural network to classify product categories given several review texts. Finally each cluster-corpus is submitted to the classifier to emit a relevance score for some categories. The result is displayed on a map of an area of interest, e.g., a city, in which the clusters are shown with their corresponding relationship score with certain products categories.

Demo

Los Angeles (2013-07)

Vancouver (2013-07)

New York (2013-07)

Repository Structure

  • Data Preparation (dataprep): It contains Apache Spark data engineering pipelines to prepare the raw sources for model training.
  • Notebooks: It contains several notebooks with the plain experiments while developing the project, as well as a notebook showcasing the CLI usage.
  • Runtime environments (env): It contains conda and docker environment configuration recipies and files.

System Requirements

  • Java 1.8.x
  • Apache Spark >= 2.3
  • Conda >= 4.8.x
  • Python 3.7.x
  • CUDA 10.1
  • Tensorflow 2.1.0
  • Keras 2.3.1

Workflow

1. Gather the data

2. Execute the data preparation pipelines

Both twitter and product review data needs to be pre-processed, for this there are two spark projects under dataprep.

Tweets Transformer

This job takes the twitter data, filters out the tweets that are not geotagged and sinks the result as parquet files.

Amazon Product Reviews Transformer

This job takes the raw amazon product reviews and product metadata and converts them into 'documents' where each document has categories and either a review text or a product description.

3. Train the product classifier

Install the cli and follow the instructions to create the embeddings and train the classifier with the transformed product documents.

4. Predict the geographic area categories

Use the cli to predict the detect and classify the geographic areas.

Detailed process

A more detailed process is written in jupyter notebooks here

Docker

I have created a docker image with the environment and cli pre-installed and configured to run experments starting from pre-processed data https://hub.docker.com/r/ohtar10/wtsp.

To download the image:

docker image push ohtar10/wtsp:0.1.1

License

GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007

See the LICENSE_ file in the root of this project for license details.

wtsp's People

Contributors

ohtar10 avatar dependabot[bot] avatar

Stargazers

 avatar Sebastián García Acosta avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.