Git Product home page Git Product logo

fuxictr's Introduction

Logo
Python version Pytorch version Pytorch version Pypi version Downloads License

Click-through rate (CTR) prediction is a critical task for various industrial applications such as online advertising, recommender systems, and sponsored search. FuxiCTR provides an open-source library for CTR prediction, with key features in configurability, tunability, and reproducibility. We hope this project could promote reproducible research and benefit both researchers and practitioners in this field.

Key Features

  • Configurable: Both data preprocessing and models are modularized and configurable.

  • Tunable: Models can be automatically tuned through easy configurations.

  • Reproducible: All the benchmarks can be easily reproduced.

  • Extensible: It can be easily extended to any new models, supporting both Pytorch and Tensorflow frameworks.

Model Zoo

No Publication Model Paper Benchmark Version
📂 Feature Interaction Models
1 WWW'07 LR Predicting Clicks: Estimating the Click-Through Rate for New Ads 🚩Microsoft ↗️ torch
2 ICDM'10 FM Factorization Machines ↗️ torch
3 CIKM'13 DSSM Learning Deep Structured Semantic Models for Web Search using Clickthrough Data 🚩Microsoft ↗️ torch
4 CIKM'15 CCPM A Convolutional Click Prediction Model ↗️ torch
5 RecSys'16 FFM Field-aware Factorization Machines for CTR Prediction 🚩Criteo ↗️ torch
6 RecSys'16 DNN Deep Neural Networks for YouTube Recommendations 🚩Google ↗️ torch, tf
7 DLRS'16 Wide&Deep Wide & Deep Learning for Recommender Systems 🚩Google ↗️ torch, tf
8 ICDM'16 PNN Product-based Neural Networks for User Response Prediction ↗️ torch
9 KDD'16 DeepCrossing Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features 🚩Microsoft ↗️ torch
10 NIPS'16 HOFM Higher-Order Factorization Machines ↗️ torch
11 IJCAI'17 DeepFM DeepFM: A Factorization-Machine based Neural Network for CTR Prediction 🚩Huawei ↗️ torch, tf
12 SIGIR'17 NFM Neural Factorization Machines for Sparse Predictive Analytics ↗️ torch
13 IJCAI'17 AFM Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks ↗️ torch
14 ADKDD'17 DCN Deep & Cross Network for Ad Click Predictions 🚩Google ↗️ torch, tf
15 WWW'18 FwFM Field-weighted Factorization Machines for Click-Through Rate Prediction in Display Advertising 🚩Oath, TouchPal, LinkedIn, Alibaba ↗️ torch
16 KDD'18 xDeepFM xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems 🚩Microsoft ↗️ torch
17 CIKM'19 FiGNN FiGNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction ↗️ torch
18 CIKM'19 AutoInt/AutoInt+ AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks ↗️ torch
19 RecSys'19 FiBiNET FiBiNET: Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction 🚩Sina Weibo ↗️ torch
20 WWW'19 FGCNN Feature Generation by Convolutional Neural Network for Click-Through Rate Prediction 🚩Huawei ↗️ torch
21 AAAI'19 HFM/HFM+ Holographic Factorization Machines for Recommendation ↗️ torch
22 Arxiv'19 DLRM Deep Learning Recommendation Model for Personalization and Recommendation Systems 🚩Facebook ↗️ torch
23 NeuralNetworks'20 ONN Operation-aware Neural Networks for User Response Prediction ↗️ torch, tf
24 AAAI'20 AFN/AFN+ Adaptive Factorization Network: Learning Adaptive-Order Feature Interactions ↗️ torch
25 AAAI'20 LorentzFM Learning Feature Interactions with Lorentzian Factorization 🚩eBay ↗️ torch
26 WSDM'20 InterHAt Interpretable Click-through Rate Prediction through Hierarchical Attention 🚩NEC Labs, Google ↗️ torch
27 DLP-KDD'20 FLEN FLEN: Leveraging Field for Scalable CTR Prediction 🚩Tencent ↗️ torch
28 CIKM'20 DeepIM Deep Interaction Machine: A Simple but Effective Model for High-order Feature Interactions 🚩Alibaba, RealAI ↗️ torch
29 WWW'21 FmFM FM^2: Field-matrixed Factorization Machines for Recommender Systems 🚩Yahoo ↗️ torch
30 WWW'21 DCN-V2 DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems 🚩Google ↗️ torch
31 CIKM'21 DESTINE Disentangled Self-Attentive Neural Networks for Click-Through Rate Prediction 🚩Alibaba ↗️ torch
32 CIKM'21 EDCN Enhancing Explicit and Implicit Feature Interactions via Information Sharing for Parallel Deep CTR Models 🚩Huawei ↗️ torch
33 DLP-KDD'21 MaskNet MaskNet: Introducing Feature-Wise Multiplication to CTR Ranking Models by Instance-Guided Mask 🚩Sina Weibo ↗️ torch
34 SIGIR'21 SAM Looking at CTR Prediction Again: Is Attention All You Need? 🚩BOSS Zhipin ↗️ torch
35 KDD'21 AOANet Architecture and Operation Adaptive Network for Online Recommendations 🚩Didi Chuxing ↗️ torch
36 AAAI'23 FinalMLP FinalMLP: An Enhanced Two-Stream MLP Model for CTR Prediction 🚩Huawei ↗️ torch
37 SIGIR'23 FinalNet FINAL: Factorized Interaction Layer for CTR Prediction 🚩Huawei ↗️ torch
38 CIKM'23 GDCN Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction 🚩Microsoft torch
📂 Behavior Sequence Modeling
39 KDD'18 DIN Deep Interest Network for Click-Through Rate Prediction 🚩Alibaba ↗️ torch
40 AAAI'19 DIEN Deep Interest Evolution Network for Click-Through Rate Prediction 🚩Alibaba ↗️ torch
41 DLP-KDD'19 BST Behavior Sequence Transformer for E-commerce Recommendation in Alibaba 🚩Alibaba ↗️ torch
42 CIKM'20 DMIN Deep Multi-Interest Network for Click-through Rate Prediction 🚩Alibaba torch
43 AAAI'20 DMR Deep Match to Rank Model for Personalized Click-Through Rate Prediction 🚩Alibaba torch
44 Arxiv'21 ETA End-to-End User Behavior Retrieval in Click-Through RatePrediction Model 🚩Alibaba torch
45 CIKM'22 SDIM Sampling Is All You Need on Modeling Long-Term User Behaviors for CTR Prediction 🚩Meituan torch
📂 Dynamic Weight Network
46 NeurIPS'22 APG APG: Adaptive Parameter Generation Network for Click-Through Rate Prediction 🚩Alibaba torch
47 Arxiv'23 PPNet PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information 🚩KuaiShou torch
📂 Multi-Task Modeling
48 MachineLearn'97 SharedBottom Multitask Learning torch
49 KDD'18 MMoE Modeling Task Relationships in Multi-task Learning with Multi-Gate Mixture-of-Experts 🚩Google torch
50 KDD'18 PLE Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations 🚩Tencent torch
📂 Multi-Domain Modeling
51 Arxiv'23 PEPNet PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information 🚩KuaiShou torch

Benchmarking

We have benchmarked FuxiCTR models on a set of open datasets as follows:

Dependencies

FuxiCTR has the following dependencies:

  • python 3.9+
  • pytorch 1.10+ (required only for Torch models)
  • tensorflow 2.1+ (required only for TF models)

Please install other required packages via pip install -r requirements.txt.

Quick Start

  1. Run the demo examples

    Examples are provided in the demo directory to show some basic usage of FuxiCTR. Users can run the examples for quick start and to understand the workflow.

    cd demo
    python example1_build_dataset_to_h5.py
    python example2_DeepFM_with_h5_input.py
    
  2. Run a model on tiny data

    Users can easily run each model in the model zoo following the commands below, which is a demo for running DCN. In addition, users can modify the dataset config and model config files to run on their own datasets or with new hyper-parameters. More details can be found in the README.

    cd model_zoo/DCN/DCN_torch
    python run_expid.py --expid DCN_test --gpu 0
    
    # Change `MODEL` according to the target model name
    cd model_zoo/MODEL_PATH
    python run_expid.py --expid MODEL_test --gpu 0
    
  3. Run a model on benchmark datasets (e.g., Criteo)

    Users can follow the benchmark section to get benchmark datasets and running steps for reproducing the existing results. Please see an example here: https://github.com/reczoo/BARS/tree/main/ranking/ctr/DCNv2/DCNv2_criteo_x1

  4. Implement a new model

    The FuxiCTR library is designed to be modularized, so that every component can be overwritten by users according to their needs. In many cases, only the model class needs to be implemented for a new customized model. If data preprocessing or data loader is not directly applicable, one can also overwrite a new one through the core APIs. We show a concrete example which implements our new model FinalMLP that has been recently published in AAAI 2023.

  5. Tune hyper-parameters of a model

    FuxiCTR currently support fast grid search of hyper-parameters of a model using multiple GPUs. The following example shows the grid search of 8 experiments with 4 GPUs.

    cd experiment
    python run_param_tuner.py --config config/DCN_tiny_h5_tuner_config.yaml --gpu 0 1 2 3 0 1 2 3
    

🔥 Citation

If you find our code or benchmarks helpful in your research, please cite the following papers.

Discussion

Welcome to join our WeChat group for any question and discussion. We also have open positions for internships and full-time jobs. If you are interested in research and practice in recommender systems, please reach out via our WeChat group.

Scan QR code

fuxictr's People

Contributors

liangcaisu avatar lsjsj92 avatar lu-minous avatar rsj123 avatar sdilbaz avatar xpai avatar zhujiem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fuxictr's Issues

MultiHeadAttention bug when num_heads is greater than 1

The code of splitting heads is buggy, without transpose the shape. The buggy code impacts AutoInt, DESTINE, InterHAt

query = query.view(batch_size * self.num_heads, -1, self.attention_dim)
key = key.view(batch_size * self.num_heads, -1, self.attention_dim)
value = value.view(batch_size * self.num_heads, -1, self.attention_dim)

The buggy code is from the reference code https://zhuanlan.zhihu.com/p/47812375

The correct code should be:

query = query.view(batch_size, seq_len, self.num_heads, self.attention_dim).transpose(1, 2)
key = key.view(batch_size, seq_len, self.num_heads, self.attention_dim).transpose(1, 2)
value = value.view(batch_size, seq_len, self.num_heads, self.attention_dim).transpose(1, 2)

torch.jit.trace not working

torch.jit.trace(fibinet, batch_data)
File "/anaconda3/envs/fuxictr3.8/lib/python3.8/site-packages/torch/jit/_trace.py", line 794, in trace
return trace_module(
File "/anaconda3/envs/fuxictr3.8/lib/python3.8/site-packages/torch/jit/_trace.py", line 1056, in trace_module
module._c._create_method_from_trace(
File "/anaconda3/envs/fuxictr3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/anaconda3/envs/fuxictr3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1488, in _slow_forward
result = self.forward(*input, **kwargs)
TypeError: forward() takes 2 positional arguments but 3 were given

Streaming data loading, evaluation is not performed in epoch end.

len(data_generator) != the actual batches in npz_block_dataloader

2024-04-14 11:01:33,212 P42965 INFO Evaluation @epoch 2 - batch 1060:
2024-04-14 16:15:11,489 P42965 INFO Evaluation @epoch 3 - batch 2120:
2024-04-14 21:28:40,049 P42965 INFO Evaluation @epoch 4 - batch 3180:
2024-04-15 02:43:36,228 P42965 INFO Evaluation @epoch 5 - batch 4240:

求教无法使用torch.parallel的问题

最近在FuxiCTR的基础上实现了自己的模型,由于显存吃紧,想使用torch.parallel模块在两块GPU上训练。但是无论如何设置,模型始终在一个GPU上运行。请问该如何解决?

Training on original Criterio Dataset.

Hey, I have a question about training DeepFM on originial dataset Criterio dataset. Is this possible with the code provided in the repository? Dataset presented in demo version has 19 colummns. I mean train_sample.csv, test_sample.csv etc. Are these columns from Criterio dataset? How can I use original Criterio dataset when the data from this dataset is numerical and the caterogical columns are hashed. How to handle no labels in test set?

Failed to run FINAL

Traceback (most recent call last):
File "run_expid.py", line 69, in
model.fit(train_gen, validation_data=valid_gen, **params)
File "/root/miniconda3/lib/python3.8/site-packages/fuxictr/pytorch/models/ctr_model.py", line 154, in fit
self.train_epoch(data_generator)
File "/root/miniconda3/lib/python3.8/site-packages/fuxictr/pytorch/models/ctr_model.py", line 210, in train_epoch
loss = self.train_step(batch_data)
File "/root/miniconda3/lib/python3.8/site-packages/fuxictr/pytorch/models/ctr_model.py", line 193, in train_step
loss = self.get_total_loss(batch_data)
File "/root/miniconda3/lib/python3.8/site-packages/fuxictr/pytorch/models/ctr_model.py", line 90, in get_total_loss
total_loss = self.add_loss(inputs) + self.add_regularization()
File "/root/FuxiCTR/model_zoo/FINAL/model/FINAL.py", line 107, in add_loss
return_dict = self.forward(inputs)
File "/root/FuxiCTR/model_zoo/FINAL/model/FINAL.py", line 87, in forward
y1 = self.forward1(feature_emb)
File "/root/FuxiCTR/model_zoo/FINAL/model/FINAL.py", line 96, in forward1
X = self.field_gate(X)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/FuxiCTR/model_zoo/FINAL/model/FINAL.py", line 132, in forward
if gate_residual == "concat":
NameError: name 'gate_residual' is not defined

in file FINAL.py, line 130:

    def forward(self, feature_emb):
        gates = self.linear(feature_emb.transpose(1, 2)).transpose(1, 2)
        if gate_residual == "concat":
            out = torch.cat([feature_emb, feature_emb * gates], dim=1) # b x 2f x d
        else:
            out = feature_emb + feature_emb * gates
        return out

the code if gate_residual == "concat": should be if self.gate_residual == "concat":

Implementation of the model DIN

I am not sure whether the implementation of DIN here is consistent with the original official released code DIN or DIEN.
Given query_field (e.g., query Goods ID, query Cate ID) and history_field (e.g., Goods ID, Cate ID), in the original implementation, they first concatenate the embeddings of history_field (i.e., [Goods ID embedding, Cate ID embedding]) and then perform so-called "local activation unit (LAU)" with the concatenated query_field (i.e., [query Goods ID embedding, query Cate ID embedding]). In contrast, the implementation here seems to first perform the LAU then do the concatenation.
Specifically, the code in DIN.py:

# 1. perform the LAU for each history_field
for idx, (din_query_field, din_history_field) \
            in enumerate(zip(self.din_query_field, self.din_history_field)):
            item_emb = feature_emb_dict[din_query_field]
            history_sequence_emb = feature_emb_dict[din_history_field]
            pooled_history_emb = self.attention_layers[idx](item_emb, history_sequence_emb)
            feature_emb_dict[din_history_field] = pooled_history_emb
# 2. do the concatenation here
feature_emb = self.embedding_layer.dict2tensor(feature_emb_dict)

If my understanding is correct, I wonder whether this will affect the final performance compared with the original implementation?

ACC metric cant't work

image
ypred is one dimension tensor,don't have the axis=1
image
In base_model.py ypred have been transform into 1 dimension

[Suggestion] Update the logic of preprocessing for efficiency

Suggest to update the logic of preprocessing for efficiency.

In many cases, a user's behavior sequence is the same for all training samples. In addition, the features of a user or an item are often the same for all training samples.

However, the current version of FuxiCTR receives the training dataset as a single DataFrame, so these features (e.g., a user's behavior sequence, the features of a user or an item) should be stored redundantly in that DataFrame, which consumes too much memory (especially, in large-scale dataset). Also, fit/transform of feature_preprocessor should be performed on redundant behavior sequences and features, which takes too long (especially, in large-scale dataset).

So, to operate more efficiently, I hope these redundancies are removed. To this end, I suggest to change the logic of preprocessing to receive user_df and item_df for each dataset together and fit/transform unique features (i.e., user_df and item_df).

Call for model implementations

Many hyperlinks are no longer valid, responding with 404. Please update.

expid repeat

when i run "python run_expid.py --config /..../ --expid 191 --gpu 0" in my code,i get 2 results log .they are ..._tune_191_ahdfy68.log and ..._tune_051_os191usd.log.In second log file name,it also has the string of '191'.

Normalizer and NaN values.

For StandardScaler, looks like it supports NaN values, see class Normalizer:

null_index = np.isnan(X)

However, during preprocess, _fill_na() will fill na_value for non-string.
So

  • for dtype=str, the X values will be string
  • for dtype=float/int, the X values will be na_value

In the first case, np.isnan will throw an error because X elements are of string type.
In the second case, there is no point to normalize numbers if we have a na_value there.

Is this behavior expected or not?

Sequence feature in demo "DeepFM_with_sequence_feature.py".

Should field "sequence" share embeddings with the field "adgroup_id"? I found that the method "encoder.fit()" assigns the encoder such as tokenizer for each field. Since the given tiny datasets record the user historical behavior (ad sequence), then in my understanding that the id that appeared in the field "sequence" may also appear in the field "adgroup_id". As a result, it seems that the field "sequence" should share the same encoder (i.e., tokenizer) with the field "adgroup_id", but the demo "DeepFM_with_sequence_feature.py" gives separate encoders for these two fields.

训练代码epoch为100的疑问

您好,尝试您提供的deepFM等模型时,发现epoch为100时在criteo数据集上取得较好的效果,而推荐算法往往1个epoch就可以收敛,请问这里如何运行如此多个epoch,还能取得较好的效果?在百忙之中打扰到您~

Where is the paper for FINAL model?

Where is the paper for FINAL model (FINAL: factorized interaction layer for ctr prediction)? I can't find it on the internet. Based on the source code, FINAL model uses multiplicative feature interactions. But I want to find the paper to gain more insight into the model. I would appreciate it if you could provide the paper.

关于自己写的一个简单的新模型

我自己设计了一个非常简单的模型,非常简单(类似于MaskedNet),运行速度很快。因为现在毕业了没有GPU资源,所以几乎没有经过调参,但是准确率仅次于FinalMLP,请问下你们有兴趣接手一下吗?

Emb_LayerNorm bug in MaskNet

The paper uses concat(LN(e1), LN(e2),..., LN(ef)), but the code use nn.LayerNorm([feature_map.num_fields, embedding_dim]) . This makes normalization happen in the last two dimensions.

HFM may not be available due to compatibility issues

HFM adopts HolographicInteractionLayer in fuxictr/pytorch/layers/interaction.py.

However, HolographicInteractionLayer may be not available in Pytorch 1.10 because torch.rfft/torch.irfft has changed.\

Here is a solution for reference:

try:
    from torch import irfft
    from torch import rfft
except ImportError:
    from torch.fft import irfft2
    from torch.fft import rfft2
    def rfft(x, d):
        t = rfft2(x, dim = (-d))
        return torch.stack((t.real, t.imag), -1)
    def irfft(x, d, signal_sizes):
        return irfft2(torch.complex(x[:,:,0], x[:,:,1]), s = signal_sizes, dim = (-d))

sequence feature

41/5000
When I try to use sequence characteristics, the program has an array out of bounds error. Could you please give an example of sequence features.

How to save model for tf serving?

I want to save a model such as DCN_tf for serving. If I add 'model.save("path/to/model")' at the end of the run_expid.py, error occurs as
cannot be saved either because the input shape is not available or because the forward pass of the model is not defined.To define a forward pass, please override Model.call(). To specify an input shape, either call build(input_shape) directly, or call the model on actual data using Model(), Model.fit(), or Model.predict(). If you have a custom training step, please make sure to invoke the forward pass in train step through Model.__call__, i.e. model(inputs), as opposed to model.call().

I have added 'model(input)' before model.save() as the following code and it works. However, when I want to curl to call the service, I must specify the 'clk' which is already specified as label.

for i in train_gen:
    model(i)
    break
model.save("./model/SavedModels/1")

Thank you!

Problem with BN of MLP_Block in Pytorch version

Hi, I found that MLP_Block didn't work properly due to BN when the length of the tensor input was greater than or equal to 3.

Here is a toy example.

a = torch.randn(16, 128, 64)
mlp_block = MLP_Block(input_dim=64, hidden_units=[1024,1024,1024],batch_norm=True)
mlp_block(a)

The output is:

RuntimeError: running_mean should contain 128 elements not 1024

FNN model problem

image
We can't find a function called self.reduce_learning_rate() in Base model ,does it have change into self.lr_decay()?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.