emadeldeen24 / ca-tcc Goto Github PK

[TPAMI 2023] Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series Classification

Home Page: https://ieeexplore.ieee.org/document/10233092

Shell 2.24% Python 97.76%

classification contrastive-learning fine-tuning pseudo-label representation-learning self-supervised-learning semi-supervised-learning time-series transfer-learning unsupervised-learning

ca-tcc's Introduction

Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series Classification (CA-TCC) [Paper] [Cite]

The paper is accepted in the IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

This work is an extention to TS-TCC, so if you need any details about the unsupervised pretraining and/or the datasets and its preprocessing, please check it first.

Training modes:

CA-TCC has two new training modes over TS-TCC

"gen_pseudo_labels": which generates pseudo labels from fine-tuned TS-TCC model. This mode assumes that you ran "ft_1per" mode first.
"SupCon": which performs supervised contrasting on pseudo-labeled data.

Note that "SupCon" is case-sensitive.

To fine-tune or linearly evaluate "SupCon" pretrained model, include it in the training mode. For example: "ft_1per" will fine-tune the TS-TCC pretrained model with 1% of labeled data. "ft_SupCon_1per" will fine-tune the CA-TCC pretrained model with 1% of labeled data. The same applies to "tl" or "train_linear".

To generate the 1%, you just need to split the data into 1%-99% and take the 1%. Also, you can find a script that does a similar job here. However, note that it creates it for 5-fold, so you can set it to just 1-fold.

Baselines:

The codes of the self- and semi-supervised learning baselines I used in the paper are HERE.

The codes of the self-supervised learning baselines I used in the paper can be found in my other work.

Training procedure

To run everything smoothly, we included ca_tcc_pipeline.sh file. You can simply use it.

Citation

If you found this work useful for you, please consider citing it.

@inproceedings{tstcc,
  title     = {Time-Series Representation Learning via Temporal and Contextual Contrasting},
  author    = {Eldele, Emadeldeen and Ragab, Mohamed and Chen, Zhenghua and Wu, Min and Kwoh, Chee Keong and Li, Xiaoli and Guan, Cuntai},
  booktitle = {Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}},
  pages     = {2352--2359},
  year      = {2021},
}

@ARTICLE{catcc,
  author={Eldele, Emadeldeen and Ragab, Mohamed and Chen, Zhenghua and Wu, Min and Kwoh, Chee-Keong and Li, Xiaoli and Guan, Cuntai},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Self-Supervised Contrastive Representation Learning for Semi-Supervised Time-Series Classification}, 
  year={2023},
  volume={45},
  number={12},
  pages={15604-15618},
  doi={10.1109/TPAMI.2023.3308189}
}

Contact

Please contact me for any issues/questions regarding the paper or reproducing the results at: emad0002{at}e.ntu.edu.sg

ca-tcc's People

Contributors

Stargazers

Watchers

ca-tcc's Issues

Problem with self_supervised mode training

When i run the main.py with self_supervised mode training

the following error occurs.

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

def permutation(x, max_segments=5, seg_mode="random"):
orig_steps = np.arange(x.shape[2])

num_segs = np.random.randint(1, max_segments, size=(x.shape[0]))

ret = np.zeros_like(x)
for i, pat in enumerate(x):
    if num_segs[i] > 1:
        if seg_mode == "random":
            split_points = np.random.choice(x.shape[2] - 2, num_segs[i] - 1, replace=False)
            split_points.sort()
            splits = np.split(orig_steps, split_points)
        else:
            splits = np.array_split(orig_steps, num_segs[i])
        warp = np.concatenate(np.random.permutation(splits)).ravel()     （Error comes from this line）
        ret[i] = pat[0,warp]
    else:
        ret[i] = pat
return torch.from_numpy(ret)

how to solve it? Thanks

Regarding the dataset

Dear author, hello. If you have time, I have three questions to ask you:

How many training modes should I run in order to run CATCC? Should we follow the order in your. sh file 【 "self_Superserved" 】
"train_linear_1p"
"ft_1p"
"gen_pseudo_labels"
"SupCon"
Train_inear_SupCon_1p
How about running it like this? So which running result is the final result of the model?
If I want to use my own dataset to run the CATCC model. My dataset is also time series data, with a total of about 5000 pieces of data, which means there are 5000 samples. Half of them have labels, while the other half have no labels.
In the initial self supervision section, should I use all 5000 of my data to run self supervision, then fine tune it with 2500 labeled data, label the remaining 2500 unlabeled data with pseudo labels, and use the 2500 labeled data for final training (or should I use all 5000 data for final training).
Or should I use my 2500 unlabeled data in the initial self supervised section to run the self supervised mode?
Normally, unlabeled data and labeled data should be two different types of data. Did you do this in revising the article CATCC?

About Data Preprocessing

I had a problem running CA-TCC, and I was working with UCI datasets.I've read the（https://github.com/emadeldeen24/eval_ssl_ssc/blob/main/split_k-fold_and_few_labels.py），I still don't understand how to handle this step when using UCI data sets. Could you help me？

How to obtain train_1perc.pt

Format of the dataset

Thank you very much for providing the code, but due to the lack of format details for data storage, I am unable to reproduce the training. Therefore, could you please provide an introduction to the storage details of the data or upload the file. Contact email: [email protected]