Git Product home page Git Product logo

mims-harvard / tfc-pretraining Goto Github PK

View Code? Open in Web Editor NEW
384.0 5.0 75.0 19.95 MB

Self-supervised contrastive learning for time series via time-frequency consistency

Home Page: https://zitniklab.hms.harvard.edu/projects/TF-C/

License: MIT License

Python 99.17% Shell 0.83%
consistency-models contrastive-learning deep-learning pre-trained-model representation-learning self-supervised-learning time-series

tfc-pretraining's Introduction

Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency

TF-C Paper: NeurIPS 2022, Preprint

Overview

This repository contains eight processed datasets and the codes of developed TF-C pretraining model (along with baselines) for manuscript Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency. We propose TF-C, a novel pre-training approach for learning generalizable features that can be transferred across different time-series datasets. We evaluate TF-C on eight time series datasets with different sensor measurements and semantic meanings in four real-world application scenarios. The following illustration provides an overview of the idea behind and the broad applicability of our TF-C approach. The idea is shown in (a): given a time series sample, time-based and frequency-based embeddings are made close to each other in a latent time-frequency space. The application scenarios are shown in (b): leveraging TF-C in time series, we can generalize a pre-train models to diverse scenarios such as gesture recognition, fault detection, and seizure analysis.

Key idea of TF-C

Our model captures the generalizable property of time series, Time-Frequency Consistency (TF-C), in a large pre-training time series dataset. TF-C means the time-based representation and the frequency-based representation, which are learned from the same time series sample, are closer to each other in a joint time-frequency space, and farther apart if the representations are associated with different time series samples. By modeling the TF-C, which is a characteristic unique to time series, the developed model can capture the underlying common pattern in time series and further empower knowledge transfer across different time series datasets. The different time series datasets are compounded by complexity in terms of large variations of temporal dynamics across datasets, varying semantic meaning, irregular sampling, system factors (e.g., different devices or subjects), etc. Moreover, the developed model enables self-supervised pre-training (which doesn't demand labels in the pre-training dataset) by adopting contrastive learning framework. Our TF-C approach is shown in the following figure.

Overview of TF-C approach. our model has four components: a time encoder, a frequency encoder, and two cross-space projectors that map time-based and frequency-based representations, respectively, to the same time-frequency space. Together, the four components provide a way to embed the input time series to the latent time-frequency space such that time-based embedding and frequency-based embedding are close together. The TF-C property is realized by promoting the alignment of time- and frequency-based representations in the latent time-frequency space, providing a vehicle for transferring the well-trained model to a target dataset not seen before.

Datasets

We prepared eight datasets for the four different scenarios that we used to compare our method against the baselines. The scenarios contain electrodiagnostic testing, human daily activity recognition, mechanical fault detection, and physical status monitoring.

Raw data

(1). SleepEEG contains 153 whole-night sleeping Electroencephalography (EEG) recordings that are monitored by sleep cassette. The data is collected from 82 healthy subjects. The 1-lead EEG signal is sampled at 100 Hz. We segment the EEG signals into segments (window size is 200) without overlapping and each segment forms a sample. Every sample is associated with one of the five sleeping patterns/stages: Wake (W), Non-rapid eye movement (N1, N2, N3) and Rapid Eye Movement (REM). After segmentation, we have 371,055 EEG samples. The raw dataset is distributed under the Open Data Commons Attribution License v1.0.

(2). Epilepsy contains single-channel EEG measurements from 500 subjects. For each subject, the brain activity was recorded for 23.6 seconds. The dataset was then divided and shuffled (to mitigate sample-subject association) into 11,500 samples of 1 second each, sampled at 178 Hz. The raw dataset features 5 different classification labels corresponding to different status of the subject or location of measurement - eyes open, eyes closed, EEG measured in healthy brain region, EEG measured where the tumor was located, and, finally, the subject experiencing seizure episode. To emphasize the distinction between positive and negative samples in terms of epilepsy, We merge the first 4 classes into one and each time series sample has a binary label describing if the associated subject is experiencing seizure or not. There are 11,500 EEG samples in total. To evaluate the performance of pre-trained model on small fine-tuning dataset, we choose a tiny set (60 samples; 30 samples for each class) for fine-tuning and assess the model with a validation set (20 samples; 10 sample for each class). The model with best validation performance is use to make prediction on test set (the remaining 11,420 samples). The raw dataset is distributed under the Creative Commons License (CC-BY) 4.0.

(3), (4). FD-A and FD-B are subsets taken from the FD dataset, which is gathered from an electromechanical drive system that monitors the condition of rolling bearings and detects damages in them. There are four subsets of data collected under various conditions, whose parameters include rotational speed, load torque, and radial force. Each rolling bearing can be undamaged, inner damaged, and outer damaged, which leads to three classes in total. We denote the subsets corresponding to condition A and condition B as Faulty Detection Condition A (FD-A) and Faulty Detection Condition B (FD-B) , respectively. Each original recording has a single channel with sampling frequency of 64k Hz and lasts 4 seconds. To deal with the long duration, we follow the procedure described by Eldele et al., that is, we use sliding window length of 5,120 observations and a shifting length of either 1,024 or 4,096 to make the final number of samples relatively balanced between classes. The raw dataset is distributed under the Creative Commons Attribution-Non Commercial 4.0 International License.

(5). HAR contains recordings of 30 health volunteers performing six daily activities such as walking, walking upstairs, walking downstairs, sitting, standing, and laying. The prediction labels are the six activities. The wearable sensors on a smartphone measure triaxial linear acceleration and triaxial angular velocity at 50 Hz. After preprocessing and isolating out gravitational acceleration from body acceleration, there are nine channels in total. To line up the semantic domain with the channels in the dataset use during fine-tuning Gesture we only use the three channels of body linear accelerations. The raw dataset is distributed AS-IS and no responsibility implied or explicit can be addressed to the authors or their institutions for its use or misuse. Any commercial use is prohibited.

(6). Gesture contains accelerometer measurements of eight simple gestures that differed based on the paths of hand movement. The eight gestures are: hand swiping left, right, up, down, hand waving in a counterclockwise circle, or in clockwise circle, hand waving in a square, and waving a right arrow. The classification labels are those eight different types of gestures. The original paper reports inclusion of 4,480 gesture measurements, but through UCR Database we were only able to recover 440 measurements. The dataset is balanced with 55 samples each class and is of a suitable size for our purpose of fine-tuning experiments. Sampling frequency is not explicitly reported in the original paper but is presumably 100 Hz. The dataset uses three channels corresponding to three coordinate directions of linear acceleration. The raw dataset is publicly available.

(7). ECG is taken as a subset from the 2017 PhysioNet Challenge that focuses on ECG recording classification. The single lead ECG measures four different underlying conditions of cardiac arrhythmias. More specifically, these classes correspond to the recordings of normal sinus rhythm, atrial fibrillation (AF), alternative rhythm, or others (too noisy to be classified). The recordings are sampled at 300 Hz. Furthermore, the dataset is imbalanced, with much fewer samples from the atrial fibrillation and noisy classes out of all four. To preprocess the dataset, we use the code from the CLOCS paper, which applied fixed-length window of 1,500 observations to divide up the long recordings into short samples of 5 seconds in duration that is still physiologically meaningful. The raw dataset is distributed under the Open Data Commons Attribution License v1.0.

(8). Electromyograms (EMG) measures muscle responses as electrical activity to neural stimulation, and they can be used to diagnose certain muscular dystrophies and neuropathies. EMG consists of single-channel EMG recording from the tibialis anterior muscle of three volunteers that are healthy, suffering from neuropathy, and suffering from myopathy, respectively. The recordings are sampled with the frequency of 4K Hz. Each patient, i.e., their disorder, is a separate classification category. Then the recordings are split into time series samples using a fixed-length window of 1,500 observations. The raw dataset is distributed under the Open Data Commons Attribution License v1.0.

The following table summarizes the statistics of all these eight datasets:

Scenario # Dataset # Samples # Channels # Classes Length Freq (Hz)
1 Pre-training SleepEEG 371,055 1 5 200 100
Fine-tuning Epilepsy 60/20/11,420 1 2 178 174
2 Pre-training FD-A 8,184 1 3 5,120 64K
Fine-tuning FD-B 60/21/13,559 1 3 5,120 64K
3 Pre-training HAR 10,299 9 6 128 50
Fine-tuning Gesture 320/120/120 3 8 315 100
4 Pre-training ECG 43,673 1 4 1,500 300
Fine-tuning EMG 122/41/41 1 3 1,500 4,000

Processed data

We explain the data preprocessing and highlight some steps here for clarity. More details can be found in our paper appendix. In summary, our data-processing consists of two stages. First, we segmented time series recordings if they are too long. For fine-tuning (target) datasets, we split the dataset into train, validation, and test portions. We took care to assign all samples belonging to a single recording to one partition only whenever that is possible, to avoid leaking data from the test set into the training set, but for pre-processed datasets like Epilepsy this is not possible. The train: val ratio is at about 3: 1 and we used balanced number of samples for each class whenever possible. All remaining samples not included in the train and validation partitions are used in the test partition to better estimate the performance metrics of the models. After the first stage, we produced three .pt (pytorch format) files corresponding to the three partitions for each dataset. Each file contains a dictionary with keys of samples and labels and corresponding values of torch tensors storing the data, respectively. For samples, the tensor dimensions correspond to the number of samples, number of channels, and, finally, the length of each time series sample. This is the standard format that can be directly read by the TS-TCC model as well as our TF-C implementation.

The second step consists of converting, for each dataset, from the three .pt files, to the accepted input format for each of the baseline models and placing them in the correct directories relative to the script that handles the pre-training and fine-tuning process. We have prepared simple scripts for these straightforward tasks but did not automate them. To further reduce the clutter of files in the repo, we have chosen to omit them from the baseline folders. Also, note that in the second experiment of one-to-many pre-training, the fine-tuning datasets are further clipped to have the same length as the sleepEEG dataset.

Step one The processed datasets can be manually downloaded at the following links.

Then you have to place the files inside the corresponding folder under data/dataset_name (such as data/SleepEEG):

**The well-processed datasets will be released (in FigShare) after acceptance. **

Alternatively, you can use the download_datasets.sh script to automatically download and decompress all datasets into the respective directories. This immediately finishes the first step of preprocessing.

Step two Now we explain in detail the second step. To begin with, TS-TCC and TS-SD (along with our TF-C model), as implemented under the TS-TCC codebase, can directly take in the datasets downloaded from the previous step. All that remains is to create the corresponding subdirectories at TS-TCC/data/dataset_name and put in the datasets inside. This is handled by the shell script data_processing/TS-TCC.sh which creates the folders and soft links that alias to the downloaded files.

For TS2Vec, it uses exactly the same kind of {train,test}_{input,output}.npy files as Mixing-up, so we will just process our downloaded datasets once and use them for these two models. The only difference in data format is the tensors for labels are two dimensional, so we have to insert an axis to each such tensor. This is handled in data_processing/Mixing-up.py and we can then run data_processing/TS2vec.sh to create aliases to the processed files.

Next, for CLOCS, we need to make a more complicated nested dictionary holding the time series and labels. Also, a time series sample is stored as a two dimensional tensor now, by eliminating the channel dimension, because CLOCS assumes that we discard channel information during data preprocessing. Again, the final datasets should be placed in the correct location, which is also in the format of CLOCS/data/dataset_name. However, due to aliasing issues, the name to be used may not align with how we named the datasets in the paper. Please use the python script data_processing/CLOCS.py to do the above steps automatically.

Finally, for SimCLR, we do not have a datafolder but directly place files under SimCLR/dataset_name. For the data itself, we note that the tensor storing time series have the second and third dimension, corresponding to channels and observations, swapped, relative to our starting files. Also, the labels cannot be numeric but have to be in one-hot format. These are handled in the data_processing/SimCLR.py script for convenience.

Of course, we also provide the shortcut script for doing all the steps above, by directly running process_all.sh from the root directory of the git repository. Make sure you are in the correct environment as specified by the baseline_requirements.yml before running the scripts.

Experimental setups

We evaluated our model in two different settings and in comparison with eight baselines. The baselines include six state-of-the-art models that can be used for transfer learning in time series and two non-pre-training models ( a non-DL method (KNN in this case) and a randomly initialized model). The two different settings are:

Setting 1: One-to-one pre-training. We pre-trained a model on one pre-training dataset and use it for fine-tuning on one target dataset only. We tested the proposed model in four independent scenarios: neurological stage detection, mechanical device diagnosis, activity recognition, and physical status monitoring. For example, in Scenario 1 (neurological stage detection), pre-training is done on SleepEEG and fine-tuning on Epilepsy. While both datasets describe a single-channel EEG, the signals are from different channels/positions on the scalp, monitor different physiology (sleep vs. epilepsy), and are collected from different patients. This setting simulates a wide range of practical scenarios where transfer learning may be useful in practice, when there's a domain gap and the fine-tuning dataset is small.

Setting 2: One-to-many pre-training. We pre-trained a model using one dataset followed by fine-tuning on multiple target datasets independently without pre-training from scratch. We chose SleepEEG for pre-training because of the large dataset size and complex temporal dynamics. We fine-tune on Epilepsy, FD-B, and EMG from the other three scenarios. The domain gaps are larger between the pre-training dataset and the three fine-tuning datasets this time, so this setting tests the generality of our model for transfer learning.

Requirements

TF-C has been tested using Python >=3.5.

For the baselines, we have not managed to unify the environments due to the large divergence in original baseline implementations. So you need to build three different environments to cover all six DL baselines. For ts2vec, use ts2vec_requirements.yml. For SimCLR, because Tang et al. used TensorFlow framework, please use simclr_requirements.yml. For the other four baselines, use baseline_requirements.yml. To use these files to install dependencies for this project via Conda, run the following command:

conda env create -f XXX_requirements.yml

Running the code

Reproduce our TF-C Please download the processed datasets to folder code/data/SleepEEG. Make sure the folder name is the same with the dataset name. There are three key parameters: training_mode has two options pre_train and fine_tune_test; pretrain_dataset has four options SleepEEG, FD_A, HAR, and ECG; target_dataset has four options Epilepsy, FD_B, Gesture, and EMG. The hyper-parameters of the models can be found in the configure file in folder config_files. For example, when pre-train a model on SleepEEG and fine-tune it on Epilepsy, please run:

python main.py --training_mode pre_train --pretrain_dataset SleepEEG --target_dataset Epilepsy

python main.py --training_mode fine_tune_test --pretrain_dataset SleepEEG --target_dataset Epilepsy

Reproduce baselines You are advised to run the models from the corresponding folders under code/baselines/ using the command-line patterns described by the original authors' README .md files whenever possible. We note that in the case of Mixing-up and SimCLR, pre-training and fine-tuning are done by directly running train_model.py and finetune_model.py without passing in arguments. Similarly, for CLOCS, one must manually modify the hyperparameters to the training procedure inside the main file ( run_experiments.py in this case). Please reach out to the original authors of these baselines if you have any questions about setting these hyperparameters in their models. Finally, for each baseline, on different pairs of datasets, the performance of transfer learning can vary depending on the hyperparameter choices. We have manually experimented with them and chose the combinations that gave the best performance while keeping the model complexity of different baselines comparable. We include tables describing the specific combinations of hyperparameters we used for different datasets whenever necessary, in the corresponding folder for the different baselines so that reproducing our result is made possible. Please note some baselines are designed for representation learning (instead of pre-training) of time series, we use these baselines in the same setups as our model to make results comparable.

Citation

If you find TF-C useful for your research, please consider citing this paper:

```
@inproceedings{zhang2022self,
title = {Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency},
author = {Zhang, Xiang and Zhao, Ziyuan and Tsiligkaridis, Theodoros and Zitnik, Marinka},
booktitle = {Proceedings of Neural Information Processing Systems, NeurIPS},
year      = {2022}
}
```

Updated Jan. 2023

We updated the implementation the proposed TF-C model on the following aspects.

  1. Fixed bugs, cleaned the codes, and added comments for better understanding. The newly uploaded TF-C code is at path TFC-pretraining/code/TFC. All the necessary files to run it are provided in the folder.
  2. For the contrastive encoders (in both time and frequency domains), we replaced the 3 layers of CNN blocks with 2 layers of Transformer. We noticed that the performance is not improved (even with a slight decrease) but the stability is getting better.
  3. For the downstream classifier, we added a KNN classifier in parallel with the original MLP (2-layer) classifier. In preliminary experiments, we noticed that the performance of MLP varies across different setups and hyper-parameter settings. Thus, in this version, we provide two classifiers: 2-layer MLP and KNN (K=5). However, the reasons hidden behind the performance variance are still unknown, which needs further studies.
  4. For better reproduction, we hereby provided an example of a pre-trained model. The model weights can be found in path TFC-pretraining/code/experiments_logs/SleepEEG_2_Epilepsy/run1/pre_train_seed_42_2layertransformer/saved_models/ckp_last.pt. The model path is identical to the one used in code, so you may clone/download this whole repo and directly run the 'TFC-pretraining/code/TFC/main.py' file.
    • This model is pretrained on the scenario: SleepEEG to Epilepsy (in this update, all the debugs are based on this setup). In specific, setting the training_mode as pre_train and pretrain_dataset as SleepEEG. In SleepEEG_Configs.py, all the hyper-parameters are unchanged, in specific, lr=0.0005, epoch number as 200 (200 for pretraining while 20 for finetuning). We set the random seed as 42.
    • In finetuning on Epilepsy (lr= 0.0005, epoch=20, batchsize=60), the finetuning set is still 60 samples (30 positive + 30 negative). There are 20 validation and 11420 test samples. But we have resplit the Epilepsy dataset (i.e., regenerated the 60 finetuning set) to test the stability of the model. The code for re-splitting is available at TFC-pretraining/code/TFC/Data_split.py and the split dataset is uploaded to this repo TFC-pretraining/datasets/Epilepsy/ (it is also synchronized to Figshare).
    • In such a seting, with the help of TFC, the best test performance on finetuning set is ~0.88 on F1 (achieved by MLP which beats KNN) while only ~0.60 without TFC pretraining. Please note, for quick debugging, the model is pretrained on a subset of SleepEEG (1280 samples which are only <1% of the whole dataset). Thus, we believe there's a large space to further boost the performance with more pretraining samples.
  5. We'd like to share more ideas that may improve the TF-C framework in follow-up works.
    • The 2-layer transformer could be modified based on the specific task (e.g., adding more layers for complex time series). The polishing on the backbone can be helpful. BTW, I didn't tune the hyper-parameters after switching to Transformer, a better hyperparameter setting (e.g., the number of layers, the dimension of Transformer, the dimension of the MLP hidden layer, etc.) might be helpful.
    • Use different architectures in time-based and frequency-based encoders. As the signals' properties in the frequency domain is very different with time domain, a dedicated encoder architecture can be adopted to better capture the information.
    • Explore more augmentations in frequency domain. Now, we used adding or removing frequency components, so that designing more perturbations (like bandpass filtering) is a promising way.
    • In the frequency domain, we only leveraged the magnitude information, however, the phase is also very important. So, an important future topic will fully exploit the information in the frequency domain.
    • Better projection. Now, we project the time- and frequency-based embeddings to a shared time-frequency domain. The current projectors' structure is a 2-layer MLP which is kind of simple. More powerful and helpful projecting methods are welcomed.
    • More ideas may be added.

Miscellaneous

Please send any questions you might have about the code and/or the algorithm to [email protected].

License

TF-C codebase is released under the MIT license.

tfc-pretraining's People

Contributors

xiangzhang1015 avatar zhaoziyuan2000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

tfc-pretraining's Issues

DataTransform_FD funtion implement is inconsistent with the paper?

In DataTransform_FD funtion:
li = np.random.randint(0, 2, size=[sample.shape[0]]) # there are two augmentations in Frequency domain
li_onehot = one_hot_encoding(li)
aug_1[1 - li_onehot[:, 0]] = 0 # the rows are not selected are set as zero.
aug_2[1 - li_onehot[:, 1]] = 0

The above four lines of code only assign 0 to the values of rows 0 and 1 in aug_1 and aug_2, which is different from removing frequency components and adding frequency components.

"data_pre_processing" in simclr

Thank you for you sharing.
But I can't figure out what the "data_pre_processing" is when i try to run the code of simclr.
Could you please give the solution?

Availability of pre-trained weights

Thanks for sharing your work. Can you please help to understand whether the pretrained weights are available? if it is available, can you pls share the link. If not, are we planning to release the weights?
Thanks in advance.

The challenge of reproducing the baselines of CLOCS

When I run run_experiments.py for training, I encounter a FileNotFoundError: [Errno 2] No such file or directory: "TFC-pretraining-main/code/baselines/CLOCS/results/CMSC/sleepEDF/leads_['I']/embedding_320/seed4/pretrained_weight". I'm unsure where the issue arises. Could you please provide the pre-trained weights or assist in troubleshooting?

Possible fatal errors in DataTransform function

I am wondering that you might made some mistakes in your code. If not so, hopefully you can correct me. Here, I want to use the DataTransform_TD as an example and I paste the code below:

li = np.random.randint(0, 4, size=[sample.shape[0]]) # there are two augmentations in Frequency domain
li_onehot = one_hot_encoding(li)
aug_1[1-li_onehot[:, 0]] = 0 # the rows are not selected are set as zero.
aug_2[1 - li_onehot[:, 1]] = 0
aug_3[1 - li_onehot[:, 2]] = 0

First, 1-li_onehot[:, num] is an array with float dtype, which leads to an error because the array used as index have to be of integer.

Second, I understand that you want each individual sample randomly to choose one augmentation from the bank. However, the array 1-li_onehot[:, 0] will be like: [1 0 1 0 1 ........ 0]. This array consists of only 0 and 1. If you use it as index to set the rows (corresponding to various samples) to 0, only the front 2 samples get the chance to be set. In this case, if you have N samples, the left N-2 samples (the most of them, I think) are not touched at all.

Third, the firse line of the code should be np.random.randint(0, 3, size=[sample.shape[0]]), because you only give 3 augmentations.

Notably, the same problems exist in the DataTransform_FD as well.

some details questions

I have a problem. When I get one_hot_encode for my fine 275 line, Should I write
onehot_label = F.one_hot(labels,config.num_classes_target)
instead of
onehot_label = F.one_hot(labels)

the permutation augmentation method

Q1: in augmentation-->permutation-->ret[i]=pat[0,warp], i think should be changed to pat[:,warp] for multiple channels

Q2: In the latest version of your code, the time domain augmentation method only used jitter and there is big change compared with previous code, i want to know the reason.

Time-Frequency Consistency Loss is not utilized

I noticed that the Time-Frequency Consistency Loss is not being utilized in your code. Could you please confirm whether this is intentional or not? And if it is not being used intentionally, could you please explain the reason behind it and its potential impact on the model's performance?
image

backbone

thank you for your time series work.
for transformer backbone, my SleepEEG2Epilepsy reproduce result is close to paper,but others i can't get a good result.
I refered to the paper and simCLR baseline code, try to build and train Resnet backbone but result is bad.
Could you please give a detail structure of Resnet backbone?
Thank you!

It's difficult to replicate the results of the paper

It's noted that the code reports the accuracy for each epoch and also selects the highest accuracy. I wonder if the results reported in the paper are the ones selected for the highest accuracy? In addition, apart from the SleepEEG dataset which can achieve results close to those in the paper, the results for other datasets are quite poor. Could anyone replicate the results for other datasets?

Where is 3-layer 1 D Resnet?

image

image

In the paper, you mentioned using ResNet, but there is a Transformer in the code.

The built-in implementation of Pytorch is still used, where the input is supposed to be (seq_len, N, D), but you just (N, 1, seq_len) tensor.

Is this the reason why Transformer is not as good as ResNet?

KNN baseline not reproducible and incorrect information about the HAR dataset

  • I have been unable to reproduce the results of the KNN (K=2) baseline on the provided datasets, except for the Epilepsy dataset where I was able to get the same score as reported in the paper. However, the performance on other datasets is significantly different. For example, in the "EMG" dataset, I am getting an accuracy score of 0.122 while the paper reports a score of 0.439.

  • The information provided about the HAR dataset is incorrect. The paper mentions that the dataset contains 9 channels, but upon inspection I have found that it only contains 3 channels. This discrepancy needs to be addressed.

  • Could you please provide more information about how the KNN baseline was run for multivariate time series data, and also correct the information about the number of channels in the HAR dataset? My goal was simply to do a quick development run, so any assistance in reproducing the results would be greatly appreciated. Thank you.

스크린샷 2023-01-03 오전 9 59 27

스크린샷 2023-01-03 오전 10 00 18

NO well-processed datasets and requirements.yml

Hello~Thank you for this brilliant work! I am very interested in it. But there are two problems below:

  1. you mentioned ""**The well-processed datasets will be released (in FigShare) after acceptance. **" Could you upload this?
  2. the same with the "requirement.yml". I cann't find it anywhere.

Thanks for your reply ❤️

Code missing

Hello, I have been following your work for a long time. Congratulations on your article being accepted by NIPs 2022. You said in the previous version of the article that you would release the code after receiving the article, and expect you to upload the executable code.

Fine-tuning SleepEEG -> Epilepsy

In the paper, you say that for the 'SleepEEG -> Epilepsy' transfer you only fine-tune on 60 samples of the target data (Epilepsy). How many epochs? Is this really correct, given that the classifier has 16578 parameters? Also, for how many epochs did you pre-train the TF-C feature extractor on the SleepEEG data? One final one: I diffed this repo with the one at https://anonymous.4open.science/r/TFC-pretraining-6B07/README.md and they are the same (at least the code that implements the model, data augmentation and loading). Do you have a more recent version of the code? I can't fine-tune on the 60 samples. Thanks!

Accuracy metrics are generally not computed at the right time - attempted corrected code provided (trainer.py)

As I am attempting to go through the codebase, I noticed that both the finetuning and testing accuracy metrics are not computed correctly

There are couple main issue

  1. In the trainer.py, the "best performance" metrics are reported for the highest test set accuracy across all fine tuning epochs. Due to the absence of a separate hold out test set, this may fall in the lines of cherry picking.
  2. In both fine tuning and testing, the accuracy metrics are computed in each mini batch, and then averaged at the end of the epoch to report. This affects accuracy, aucpr, and AUC. They should be computed across all data, not computed on mini-batch and then aggregated. If your dataset is imbalanced, I noticed that the AUCPR and AUC would always be around +- 0.05 around the random baseline of 0.5. I switched to only compute at the end across the entire epoch, and got 0.17 -0.18
  3. The binarized accuracy metrics (accuracy, recall, precision) are computed using fixed cutoff of 0.5. This might be misleading as the classification model is not calibrated. (that is another issue on the output of the classification model, which is just a float number)

This is my working script under which I commented out the 'best performance' and changed to global evaluation.

import os
import sys
sys.path.append("..")
import pdb
from loss import *
from sklearn.metrics import roc_auc_score, classification_report, confusion_matrix, \
    average_precision_score, accuracy_score, precision_score,f1_score,recall_score
from sklearn.neighbors import KNeighborsClassifier
from model import * 
import wandb
import pdb
import os
from cosine_annealing_warmup import CosineAnnealingWarmupRestarts
# Set the seed value all over the place to make this reproducible.
seed_val = 42
np.random.seed(seed_val)
torch.manual_seed(seed_val)
os.environ["TORCH_USE_CUDA_DSA"] = "1"
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
torch.autograd.set_detect_anomaly(True)
def one_hot_encoding(X):
    X = [int(x) for x in X]
    n_values = np.max(X) + 1
    b = np.eye(n_values)[X]
    return b

def Trainer(model,  model_optimizer, classifier, classifier_optimizer, train_dl, valid_dl, test_dl, device,
            logger, config, experiment_log_dir, training_mode):
    # Start training
    logger.debug("Training started ....")

    criterion = nn.CrossEntropyLoss()
    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(model_optimizer, 'min')
    #scheduler = CosineAnnealingWarmupRestarts(model_optimizer, config.num_epoch // 5, warmup_steps=config.num_epoch // 10)
    if training_mode == 'pre_train':
        print('Pretraining on source dataset')
        for epoch in range(1, config.num_epoch + 1):
            # Train and validate
            """Train. In fine-tuning, this part is also trained???"""
            train_loss = model_pretrain(model, model_optimizer, criterion, train_dl, config, device, training_mode, logger)
            logger.debug(f'\nPre-training Epoch : {epoch}', f'Train Loss : {train_loss:.4f}')
            scheduler.step(train_loss)
            torch.cuda.empty_cache()

        os.makedirs(os.path.join(experiment_log_dir, "saved_models"), exist_ok=True)
        chkpoint = {'model_state_dict': model.state_dict()}
        torch.save(chkpoint, os.path.join(experiment_log_dir, "saved_models", f'ckp_last.pt'))
        print('Pretrained model is stored at folder:{}'.format(experiment_log_dir+'saved_models'+'ckp_last.pt'))

    """Fine-tuning and Test"""
    if training_mode != 'pre_train':
        """fine-tune"""
        print('Fine-tune on Fine-tuning set')
        performance_list = []
        total_f1 = []
        KNN_f1 = []
        global emb_finetune, label_finetune, emb_test, label_test

        for epoch in range(1, config.num_epoch + 1):
            logger.debug(f'\nEpoch : {epoch}')

            valid_loss, emb_finetune, label_finetune, F1 = model_finetune(model, model_optimizer, valid_dl, config,
                                  device, training_mode, classifier=classifier, classifier_optimizer=classifier_optimizer)
            scheduler.step(valid_loss)


            # save best fine-tuning model""
            global arch
            arch = 'sleepedf2eplipsy'
            if len(total_f1) == 0 or F1 > max(total_f1):
                print('update fine-tuned model')
                os.makedirs('experiments_logs/finetunemodel/', exist_ok=True)
                torch.save(model.state_dict(), 'experiments_logs/finetunemodel/' + arch + '_model.pt')
                torch.save(classifier.state_dict(), 'experiments_logs/finetunemodel/' + arch + '_classifier.pt')
            total_f1.append(F1)
            model_test(model, test_dl, config, device, training_mode,
                                                              classifier=classifier)

            # evaluate on the test set
            """Testing set"""
            # logger.debug('Test on Target datasts test set')
            # model.load_state_dict(torch.load('experiments_logs/finetunemodel/' + arch + '_model.pt'))
            # classifier.load_state_dict(torch.load('experiments_logs/finetunemodel/' + arch + '_classifier.pt'))
            # test_loss, test_acc, test_auc, test_prc, emb_test, label_test, performance = model_test(model, test_dl, config, device, training_mode,
            #                                                  classifier=classifier, classifier_optimizer=classifier_optimizer)
            # performance_list.append(performance)

            """Use KNN as another classifier; it's an alternation of the MLP classifier in function model_test. 
            Experiments show KNN and MLP may work differently in different settings, so here we provide both. """
            # # train classifier: KNN
            # neigh = KNeighborsClassifier(n_neighbors=5)
            # neigh.fit(emb_finetune, label_finetune)
            # knn_acc_train = neigh.score(emb_finetune, label_finetune)
            # # print('KNN finetune acc:', knn_acc_train)
            # representation_test = emb_test.detach().cpu().numpy()

            # knn_result = neigh.predict(representation_test)
            # knn_result_score = neigh.predict_proba(representation_test)
            # one_hot_label_test = one_hot_encoding(label_test)
            # # print(classification_report(label_test, knn_result, digits=4))
            # # print(confusion_matrix(label_test, knn_result))
            # knn_acc = accuracy_score(label_test, knn_result)
            # precision = precision_score(label_test, knn_result, average='macro', )
            # recall = recall_score(label_test, knn_result, average='macro', )
            # F1 = f1_score(label_test, knn_result, average='macro')
            # auc = roc_auc_score(one_hot_label_test, knn_result_score, average="macro", multi_class="ovr")
            # prc = average_precision_score(one_hot_label_test, knn_result_score, average="macro")
            # # print('KNN Testing: Acc=%.4f| Precision = %.4f | Recall = %.4f | F1 = %.4f | AUROC= %.4f | AUPRC=%.4f'%
            # #       (knn_acc, precision, recall, F1, auc, prc))
            # KNN_f1.append(F1)
        # logger.debug("\n################## Best testing performance! #########################")
        # performance_array = np.array(performance_list)
        # best_performance = performance_array[np.argmax(performance_array[:,0], axis=0)]
        # print('Best Testing Performance: Acc=%.4f| Precision = %.4f | Recall = %.4f | F1 = %.4f | AUROC= %.4f '
        #       '| AUPRC=%.4f' % (best_performance[0], best_performance[1], best_performance[2], best_performance[3],
        #                         best_performance[4], best_performance[5]))
        # print('Best KNN F1', max(KNN_f1))

    logger.debug("\n################## Training is Done! #########################")

def model_pretrain(model, model_optimizer, criterion, train_loader, config, device, training_mode,logger):
    total_loss = []
    model.train()
    global loss, loss_t, loss_f, l_TF, loss_c, data_test, data_f_test

    # optimizer
    # model_optimizer.zero_grad() ## TODO: Not a good practice to call zero_grad() outside the loop

    for batch_idx, (data, labels, aug1, data_f, aug1_f) in enumerate(train_loader):
        data, labels = data.float().to(device), labels.long().to(device) # data: [128, 1, 178], labels: [128]
        aug1 = aug1.float().to(device)  # aug1 = aug2 : [128, 1, 178]
        data_f, aug1_f = data_f.float().to(device), aug1_f.float().to(device)  # aug1 = aug2 : [128, 1, 178]
        model_optimizer.zero_grad()
        """Produce embeddings"""
        h_t, z_t, h_f, z_f = model(data, data_f)
        h_t_aug, z_t_aug, h_f_aug, z_f_aug = model(aug1, aug1_f)

        """Compute Pre-train loss"""
        """NTXentLoss: normalized temperature-scaled cross entropy loss. From SimCLR"""
        nt_xent_criterion = NTXentLoss_poly(device, config.batch_size, config.Context_Cont.temperature,
                                       config.Context_Cont.use_cosine_similarity) # device, 128, 0.2, True

        loss_t = nt_xent_criterion(h_t, h_t_aug)
        loss_f = nt_xent_criterion(h_f, h_f_aug)
        l_TF = nt_xent_criterion(z_t, z_f) # this is the initial version of TF loss

        l_1, l_2, l_3 = nt_xent_criterion(z_t, z_f_aug), nt_xent_criterion(z_t_aug, z_f), nt_xent_criterion(z_t_aug, z_f_aug)
        loss_c = (1 + l_TF - l_1) + (1 + l_TF - l_2) + (1 + l_TF - l_3)

        lam = 0.2
        loss = lam*(loss_t + loss_f) + l_TF
        logger.debug(f'At batch {batch_idx}, total loss={loss}')

        total_loss.append(loss.item())
        loss.backward()
        model_optimizer.step()

    logger.debug('Pretraining: overall loss:{}, l_t: {}, l_f:{}, l_c:{}'.format(loss, loss_t, loss_f, l_TF))

    ave_loss = torch.tensor(total_loss).mean()

    return ave_loss


def model_finetune(model, model_optimizer, val_dl, config, device, training_mode, classifier=None, classifier_optimizer=None):
    global labels, pred_numpy, fea_concat_flat
    model.train()
    classifier.train()

    total_loss = []
    total_acc = []
    total_auc = []  # it should be outside of the loop
    total_prc = []
    supervised_loss = []

    criterion = nn.BCEWithLogitsLoss(pos_weight = torch.Tensor(6).cuda())
    outs = np.array([])
    trgs = np.array([])
    feas = np.array([])
    labels = np.array([])

    for data, labels, aug1, data_f, aug1_f in val_dl:
        # print('Fine-tuning: {} of target samples'.format(labels.shape[0]))
        data, labels = data.float().to(device), labels.long().to(device)
        data_f = data_f.float().to(device)
        aug1 = aug1.float().to(device)
        aug1_f = aug1_f.float().to(device)

        """if random initialization:"""
        model_optimizer.zero_grad()  # The gradients are zero, but the parameters are still randomly initialized.
        classifier_optimizer.zero_grad()  # the classifier is newly added and randomly initialized

        """Produce embeddings"""
        h_t, z_t, h_f, z_f = model(data, data_f)
        h_t_aug, z_t_aug, h_f_aug, z_f_aug = model(aug1, aug1_f)
        nt_xent_criterion = NTXentLoss_poly(device, config.target_batch_size, config.Context_Cont.temperature,
                                            config.Context_Cont.use_cosine_similarity)
        loss_t = nt_xent_criterion(h_t, h_t_aug)
        loss_f = nt_xent_criterion(h_f, h_f_aug)
        l_TF = nt_xent_criterion(z_t, z_f)
        #pdb.set_trace()
        l_1, l_2, l_3 = nt_xent_criterion(z_t, z_f_aug), nt_xent_criterion(z_t_aug, z_f), \
                        nt_xent_criterion(z_t_aug, z_f_aug)
        loss_c = (1 + l_TF - l_1) + (1 + l_TF - l_2) + (1 + l_TF - l_3) #


        """Add supervised classifier: 1) it's unique to finetuning. 2) this classifier will also be used in test."""
        fea_concat = torch.cat((z_t, z_f), dim=1)
        predictions = classifier(fea_concat)
        fea_concat_flat = fea_concat.reshape(fea_concat.shape[0], -1)
        loss_p = F.binary_cross_entropy_with_logits(predictions, 
                                                    labels.unsqueeze(1).float(),
                                                    pos_weight=torch.Tensor([6]).cuda())

        lam = 0.1
        loss = loss_p + l_TF + lam*(loss_t + loss_f)
        preds_binarized = (torch.sigmoid(predictions) > 0.5).detach()
        predictions_prob = torch.sigmoid(predictions)
        acc_bs = labels.eq(preds_binarized).float().mean()
        # onehot_label = F.one_hot(labels)
        pred_numpy = preds_binarized.cpu().numpy()

        try:
            auc_bs = roc_auc_score(labels.detach().cpu().numpy(), pred_numpy, average="macro", multi_class="ovr" )
        except:
            auc_bs = np.float(0)

        total_acc.append(acc_bs)
        total_auc.append(auc_bs)
        total_loss.append(loss.item())
        supervised_loss.append(loss_p.item())
        loss_p.backward()
        model_optimizer.step()
        classifier_optimizer.step()

        if training_mode != "pre_train":
            #pred = predictions.max(1, keepdim=True)[1]  # get the index of the max log-probability
            outs = np.append(outs, predictions_prob.detach().squeeze().cpu().numpy())
            trgs = np.append(trgs, labels.data.cpu().numpy())
            feas = np.append(feas, fea_concat_flat.data.cpu().numpy())
            
    feas = feas.reshape([len(trgs), -1])  # produce the learned embeddings

    labels_numpy = labels.detach().cpu().numpy()
    pred_numpy = np.argmax(pred_numpy, axis=1)
    precision = precision_score(labels_numpy, pred_numpy, average='macro', )
    recall = recall_score(labels_numpy, pred_numpy, average='macro', )
    F1 = f1_score(labels_numpy, pred_numpy, average='macro', )
    ave_loss = torch.tensor(total_loss).mean()
    avg_supervised_loss = torch.tensor(supervised_loss).mean()

    aucpr = average_precision_score(trgs, outs)

    print(f'Total ones in labels: {np.sum(labels_numpy)}', f'Total ones in predictions: {np.sum(pred_numpy)}')
    print(f'Fine tune loss is {ave_loss}', 
          f'Fine tune supervised loss is {avg_supervised_loss}', 
            f'Fine tune AUPRC {aucpr}')
    # print(' Finetune: loss = %.4f | supervised_loss = %.4f | Acc=%.4f | Precision = %.4f | Recall = %.4f | F1 = %.4f| AUROC=%.4f | AUPRC = %.4f'
    #       % (ave_loss, avg_supervised_loss, ave_acc*100, precision * 100, recall * 100, F1 * 100, ave_auc * 100, ave_prc *100))

    return avg_supervised_loss, feas, trgs, F1


def model_test(model,  test_dl, config,  device, training_mode, classifier=None):
    model.eval()
    classifier.eval()

    total_loss = []


    criterion = nn.BCEWithLogitsLoss() # the loss for downstream classifier
    outs = np.array([])
    trgs = np.array([])
    emb_test_all = []

    with torch.no_grad():
        for data, labels, _,data_f, _ in test_dl:
            data, labels = data.float().to(device), labels.long().to(device)
            data_f = data_f.float().to(device)

            """Add supervised classifier: 1) it's unique to finetuning. 2) this classifier will also be used in test"""
            h_t, z_t, h_f, z_f = model(data, data_f)
            fea_concat = torch.cat((z_t, z_f), dim=1)
            predictions_test = classifier(fea_concat)
            fea_concat_flat = fea_concat.reshape(fea_concat.shape[0], -1)
            emb_test_all.append(fea_concat_flat)

            loss = criterion(predictions_test, 
                             labels.unsqueeze(1).float())
            pred_numpy = torch.sigmoid(predictions_test).detach().squeeze().cpu().numpy()
            labels_numpy = labels.detach().cpu().numpy()


            total_loss.append(loss.item())
            outs = np.append(outs, pred_numpy)
            trgs = np.append(trgs, labels_numpy)
    prc_bs = average_precision_score(trgs, outs)
    print(f'Test AUPRC {prc_bs}', f'Test loss {torch.mean(torch.Tensor(total_loss))}')
    emb_test_all = torch.concat(tuple(emb_test_all))
    return emb_test_all, trgs, prc_bs, outs

The problem with pre-training

Thank you for your work!I have questions regarding one-to-many pre-training evaluation,I can only run SleepEEG2Epilepsy,but it's Accuracy is just 0.9227,for other fine-tune datasets,the following error occurs during pretraining.
屏幕截图 2023-02-13 144553
This issue also occurs in other one-to-one scenarios except FD_A2FD_B,the new issue is
fda-fdb
For ECG2EMG,there is an other bug in fine-tune mode,
image
,the performance is same all the epoch,and it's only half of your paper's performance.
Thank you!

[BUG] Error preprocesing files

Hello.

DESCRIBE THE BUG

I'm trying to reproduce the results of the paper and I've downloaded the datasets using the download_datasets.sh script and preprocessed them usig the process_datasets.sh script. However, I encountered two erros during the preprocessing phase.

  1. Below, the error output that happens for both, CLOCS.py and Mixing-up.py python scripts, called from process_datasets.sh:
Traceback (most recent call last):
  File "/workspaces/hiaac-m4/TFC-pretraining/data_processing/Mixing-up.py", line 8, in <module>
    train_dict = torch.load(os.path.join('datasets', dataset_name, 'train.pt'))
  File "/home/vscode/.local/lib/python3.10/site-packages/torch/serialization.py", line 988, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "/home/vscode/.local/lib/python3.10/site-packages/torch/serialization.py", line 437, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "/home/vscode/.local/lib/python3.10/site-packages/torch/serialization.py", line 418, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'datasets/ecg/train.pt'
  1. Below, the error output that happens when exeuting the SimCLR.py from process_datasets.sh script:
Traceback (most recent call last):
  File "/workspaces/hiaac-m4/TFC-pretraining/data_processing/SimCLR.py", line 79, in <module>
    scatter_numpy(train_y, 1, np.expand_dims(train_dict['labels'].numpy().astype(int), axis=1), 1)
  File "/workspaces/hiaac-m4/TFC-pretraining/data_processing/SimCLR.py", line 41, in scatter_numpy
    idx = [[*np.indices(idx_xsection_shape).reshape(index.ndim - 1, -1),
  File "/workspaces/hiaac-m4/TFC-pretraining/data_processing/SimCLR.py", line 42, in <listcomp>
    index[make_slice(index, dim, i)].reshape(1, -1)[0]] for i in range(index.shape[dim])]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

NOTE: The first error happens in case-sensitive file-systems, as the datasets are downloaded with upper-case letters and the scripts process them using lower-case letters.

SYSTEM SPECIFICATION

  • OS: Ubuntu 22.04
  • Python 3.10

bugs of DataTransform_TD

I have run the latest code, but it seems the problem have not solved yet, in DataTransform_TD, only the front 2 samples get the chance to be set as zeros.

li_onehot = one_hot_encoding(li) # this line generate [0,1,0,1....0]
aug_1[1-li_onehot[:, 0]] = 0 # only first two rows of aug_1 are set to be zeros

Is this the latest version?

Hi, our team is interested in your work, but there are some error that prevent me from running the example. are you sure it's the latest version?

Many-to-one experiments

Nice work! Have you try many-to-one experiments? like pretraining on multi-dataset and evaluate on one specific dataset. Will the dataset from different domain improve the performance of pretraining model?

The MLP classification model output is not logit and vanilla cross entropy loss is used.

The downstream classifier as defined as follows has a simple linear model that can map to any number between negative inf to positive infinity. I switched to nn.bce_with_logits().

'''
class target_classifier(nn.Module):
def init(self, configs):
super(target_classifier, self).init()
self.logits = nn.Linear(2*128, 64)
self.logits_simple = nn.Linear(64, configs.num_classes_target)

def forward(self, emb):
    emb_flat = emb.reshape(emb.shape[0], -1)
    emb = torch.sigmoid(self.logits(emb_flat))
    pred = self.logits_simple(emb)
    return pred

'''

where are the requirement yml files?

As the title states, our team is wondering where the mentioned requirement yaml files are?

Requirements
TF-C has been tested using Python >=3.5.

For the baselines, we have not managed to unify the environments due to the large divergence in original baseline implementations. So you need to build three different environments to cover all six DL baselines. For ts2vec, use ts2vec_requirements.yml. For SimCLR, because Tang et al. used TensorFlow framework, please use simclr_requirements.yml. For the other four baselines, use baseline_requirements.yml. To use these files to install dependencies for this project via Conda

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.