Git Product home page Git Product logo

dien_tf2's Introduction

Deep Interest Evolution Network for Click-Through Rate Prediction

https://arxiv.org/abs/1809.03672

prepare data

method 1

You can get the data from amazon website and process it using the script

sh prepare_data.sh

method 2 (recommended)

Because getting and processing the data is time consuming,so we had processed it and upload it for you. You can unzip it to use directly.

tar -jxvf data.tar.gz
mv data/* .
tar -jxvf data1.tar.gz
mv data1/* .
tar -jxvf data2.tar.gz
mv data2/* .

When you see the files below, you can do the next work.

  • cat_voc.pkl
  • mid_voc.pkl
  • uid_voc.pkl
  • local_train_splitByUser
  • local_test_splitByUser
  • reviews-info
  • item-info

train model

python train.py train [model name] 

The model blelow had been supported:

Note: we use tensorflow 1.4.

Run the Benchmark

To run the benchmark, just run

./prepare_dataset.sh
./run.sh

prepare_dataset.sh only needs to be ran once.

To run training only, use

./train.sh

and to run inference only, use

./infer.sh

Currently AI Matrix only supports running on one accelerator device, being a GPU or other AI accelerator. If you intend to test an NVIDIA GPU, assign the GPU device before you start to run the benchmark suite using the following command:

export CUDA_VISIBLE_DEVICES="DEVICE_NUMBER_TO_TEST"

Run the benchmark with synthetic input

The background is that real industry models usually have huge size of embedding table. They can not fit into GPU memory. Therefore, they are put into CPU ram and let CPU to handle the corresponding embedding lookup and gradients update. However, PCIE between CPU and GPU are the bottleneck in most cases. Industry solution needs to be innovative to well balance the issue here to achieve better performance. We initiate an synthetic random data as input to our model as a solution to mimic the real case where embedding table is huge. In this way, the embedding related operation can be chosen to implement on CPU side.

# embedding on GPU
python script/train.py --mode=synthetic --batch_size=32 --model=DIEN --embedding_device=gpu

# embedding on CPU
python script/train.py --mode=synthetic --batch_size=32 --model=DIEN --embedding_device=cpu

dien_tf2's People

Watchers

dongnan.cao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.