Git Product home page Git Product logo

kaggle-competition-favorita's Introduction

Kaggle-Competition-Favorita

This is the 5th place solution for Kaggle competition Favorita Grocery Sales Forecasting.

The Problem

This competition is a time series problem where we are required to predict the sales of different items in different stores for 16 days in the future, given the sales history and promotion info of these items. Additional information about the items and the stores are also provided. Dataset and detailed description can be found on the competition page: https://www.kaggle.com/c/favorita-grocery-sales-forecasting

Model Overview

I build 3 models: a Gradient Boosting, a CNN+DNN and a seq2seq RNN model. Final model was a weighted average of these models (where each model is stabilized by training multiple times with different random seeds then take the average). Each model separately can stay in top 1% in the final ranking.

LGBM: It is an upgraded model from the public kernels. More features, data and periods were fed to the model.

CNN+DNN: This is a traditional NN model, where the CNN part is a dilated causal convolution inspired by WaveNet, and the DNN part is 2 FC layers connected to raw sales sequences. Then the inputs are concatenated together with categorical embeddings and future promotions, and directly output to 16 future days of predictions.

RNN: This is a seq2seq model with a similar architecture of @Arthur Suilin's solution for the web traffic prediction. Encoder and decoder are both GRUs. The hidden states of the encoder are passed to the decoder through an FC layer connector. This is useful to improve the accuracy significantly.

How to Run the Model

Three models are in separate .py files as their filename tell.

Before running the models, download the data from the competition website, and add records of 0 with any existing store-item combo on every Dec 25th in the training data. Then use the function load_data() in Utils.py to load and transform the raw data files, and use save_unstack() to save them to feather files. In the model codes, change the input of load_unstack() to the filename you saved. Then the models can be runned. Please read the codes of these functions for more details.

Note: if you are not using a GPU, change CudnnGRU to GRU in seq2seq.py

kaggle-competition-favorita's People

Contributors

lenzdu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.