woshiyyya / erpp-rmtpp Goto Github PK

View Code? Open in Web Editor NEW

54.0 2.0 22.0 2.32 MB

A pytorch implementation of ERPP and RMTPP on ATM maintenance dataset.

Python 100.00%

erpp-rmtpp's Introduction

A pytorch implementation of ERPP and RMTPP

This is a course project for SJTU, CS488 Temporal Point Process Modeling.

ERPP reference paper is Modeling the intensity function of point process via recurrent neural networks. (AAAI 2017)

RMTPP reference paper is Recurrent marked temporal point processes: Embedding event history to vectore.(KDD 2016)

Dataset

In maintenance support services, when a device fails, the equipment owner raises a maintenance service ticket and technician will be assigned to repair the failure. The studied dataset is comprised of the event logs involving error reporting and failure tickets, which is originally collected from 1,554 ATMs. The event log of error records includes device identity, timestamp, message content, priority, code, and action.

Dataset has been splited into train and test set.

data/train_day.csv
data/test_day.csv

Each csv file contains 3 columns, the first column indicates current ATM machine id. The second column refers to the time sequence where an event happens. The last column indicates the type of events.

id,time,event
g1548,16344.394270833332,0
g1548,16367.035381944444,4
g1548,16367.036377314815,4
g1548,16367.037650462962,4
g1548,16442.100289351853,2
g1548,16490.032743055555,1
g1548,16490.032743055555,1
g1548,16514.03287037037,4
g1548,16514.033252314814,3
g1548,16514.041932870372,3

The task is to predict the next event time and to classify the event's category. The metric for time prediction is mean relative error(MAE). And for multi-class classification, the metric is traditional Precision, Recall and F1-score.

Requirements

pytorch = 0.4.1
numpy = 1.14.2
tqdm = 4.28.1
pandas = 0.23.4

Run ERPP (Event Recurrent Point Process)

With default setting:

python main.py --model=erpp

You may get the following result: MAE=4.9, Precision=0.77, Recall=0.90, F1=0.83

Run RMTPP (Recurrent Marked Temporal Point Process)

With default setting:

python main.py --model=rmtpp

You may get the following result: MAE=4.8, Precision=0.76, Recall=0.89, F1=0.825

Other Parameters

python main.py --name=EXPERIMENT_NAME \
               --model= \   # "erpp" or "rmtpp"
               --seq_len=10 \
               --emb_dim=10 \
               --hid_dim=32 \
               --mlp_dim=16 \
               --alpha=0.05 \  # weight on time loss
               --dropout=0.1 \
               --batch_size= 1024 \
               --lr=1e-3 \
               --epochs=30 \
               --importance_weight \  # if use importance loss weight
               --verbose_step

erpp-rmtpp's People

Contributors

Stargazers

Watchers

erpp-rmtpp's Issues

Time prediction performance evaluation index MAE

Hi, thank you very much for your code. I successfully ran your code and got the results you gave.
However, I don't understand why MAE is used as a performance indicator for time prediction, because I found that pred_times and gold_times are quite different, and the prediction effect is not ideal. However, due to the computer system of MAE itself, the prediction results look good.

CUDA

Hi,

Thanks for the implementation. Can I write the training without CUDA? I keep getting an assertion error that I do not have CUDA (I'd rather not to install the drivers required).

Thanks,
Arik.

My torch is the same version as yours, so what's wrong with it?Is it the python version?

Traceback (most recent call last):
File "main.py", line 62, in
model.cuda()
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 258, in cuda
return self._apply(lambda t: t.cuda(device))
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 185, in _apply
module._apply(fn)
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 112, in _apply
self.flatten_parameters()
File "/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 105, in flatten_parameters
self.batch_first, bool(self.bidirectional))
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

what's the difference between time point process and linear layer?

hello, in your code, you straightly use the linear layer to predict the event time? i dont understand, is this fits the description of the paper? is this means that actually we dont need RMTPP_loss? Logistic regression is also feasible?

The calculation of precision and recall

In funcion clf_metric in util.py, it seems that the calculation of precision and recall is reversed?
if gold_count[i] != 0: prec += match_count / gold_count[i] pcnt += 1 if pred_count[i] != 0: recall += match_count / pred_count[i] rcnt += 1
Precision = true positive / predicted condition positive
Recall = true positive / condition positive

mean relative error(MAE)

in your markdown file, you said mean relative error(MAE), but i remember that MAE means Mean Absolute Error (MAE)

Prediction per timestamp

Thanks a lot for the great code! It's nicely written.

If I understand it correctly, given a sequence of events and timings your code only predicts the last event and the corresponding timing. As opposed to https://github.com/musically-ut/tf_rmtpp/blob/ea4ab25b12422d3b0657082c90bc4beb957c0e83/src/tf_rmtpp/rmtpp_core.py#L575 which I believe predicts every event and timing and computes its corresponding losses.

It would be great if you could clarify this!

Computing conditional intensity function for RMTPP

Thanks for sharing the code, it was well written and easy to follow. Can you provide a pointer on how to estimate the conditional intensity function per eq. 7 in this paper here https://www.kdd.org/kdd2016/papers/files/rpp1081-duA.pdf from your code base. Any pointers would be very appreciated.