emadeldeen24 / ts-tcc Goto Github PK

[IJCAI-21] "Time-Series Representation Learning via Temporal and Contextual Contrasting"

License: MIT License

Python 76.71% MATLAB 5.62% Jupyter Notebook 17.68%

unsupervised-learning deep-learning contrastive-learning temporal contextual time-series eeg fault-diagnosis epilepsy epileptic-seizures

ts-tcc's People

Stargazers

Watchers

Forkers

shism2 qianrenjian verystrongjoe alpha-coderx tonylibing sean0719 mohamedr002 yaqinzhou mh-lee zhbfy xxg-lab ikkim00 learning310 nchappa lwang89 onsb fengkoushangdezzx lysarthas hdyen lixiaoyu0575 hailunhou bryant6 ahj6377 zhunanyang maginadai statmixedml cahuja1992 xinliangzhou antonio1369 jferstad heibaipei vishalbelsare nhunguet yoonsanghyu mukhoplus xjw-wade clarenceke wenhuiwang93 tsn0 mvrp17 blasotta yfy324 salamahassona kurokumasan arvintashakori sohamc1fm constantin-crailsheim tfahg yqchen8 bryanwong17 echoyi yagami009 anushiya1 aminmbare taichihin lzx-buaa levi-ackman liuweishuo eegkit david-ttao jichuan14 scxsunchenxi rzyang0204 sheral123 laki4ever dineth-mudalige hyang0129 tribe-health oliver15170101 dirus007 a12dongithub zzzidku ustcwumin siaer ykzhang-eeis liesgame rongtongxueya chaof96 djdongx luzhoushili frank-wang-oss mohankrishna12 gabteni fd-guo nkzhangheng wqts caio1pereira zealotty puddingss chtinnes wudibawanglonggege zwbjtu123

ts-tcc's Issues

Produce macro-averaged F1-score (MF1) results

Hi!

In the paper (Results section), you mentioned that the performance is evaluated using two metrics: accuracy and the macro-averaged F1-score (MF1). But when I run main.py, I get only accuracy metric results.
'''
total_acc.append(labels.eq(predictions.detach().argmax(dim=1)).float().mean())
'''
Can you let me know how MF1 is calculated as well?

Thanks in advance.

Augmentations

Why we are doing augmentation on the whole dataset only once and we are using this dataset during the self supervised training?
Shouldn't we augment the dataset at each epoch?

About training a new dataset

Thank you for your work. If I want to input a custom data set, how do I configure the network parameters? My data set size is 1189, 1, 10000 (data set size, data dimension, data length), looking forward to your reply

data_preprocessing/uci_har/preprocess_har.py maybe wrong.

dat_dict = dict()
dat_dict["samples"] = torch.from_numpy(train_data) #7352 In my opinion, it should be X_train instead of train_data
dat_dict["labels"] = torch.from_numpy(y_train) #5881
torch.save(dat_dict, os.path.join(output_dir, "train.pt"))

Please check it. Thanks

Optimizer

Hi, great job! I have a question about implementation. Why are two optimizers used instead of one since all the settings of the two optimizers are exactly the same？

No "supervised" training mode in main.py

Hi，
I find that there is no “supervised” branch in the "main.py". I guess the model is trained with random initialization only by supervised loss in this training mode. Is it right?

Thank you

question regarding the implementation of your temporal contrasting loss

Thank you for your interesting work! I have a question regarding the implementation of your loss implementation on $\mathcal{L}_{TC}^{s}$. I wonder if what I understood is right or please kindly correct me if this is not what you intended.

This is the paper

and this is the code

        for i in np.arange(0, self.timestep):
            total = torch.mm(encode_samples[i], torch.transpose(pred[i], 0, 1))
            nce += torch.sum(torch.diag(self.lsoftmax(total)))
        nce /= -1. * batch * self.timestep

At first sight, it was hard to understand how the code matches the equation in the paper. To me, it seemed that you have only implemented the numerator part of the equation in the code. However, after some thought, it seemed that the total matrix contains elements from both the numerator and denominator. Then by applying a logsoftmax function, you are bounding this matrix to some limit. By only adding the elements in the diagonal terms(=numerator) and adding this as a negative loss, you are essentially making the diagonal terms smaller while making the off-diagnoal terms(=denominator) bigger. This is how I understood. Could you please let me know if this statement is correct?

thank you!

A question about your new work

Hello Emadeldeen,
Recently I noticed your new paper---Self-supervised Contrastive Representation Learning for Semi-supervised Time-Series
Classification on arxiv.
Could you share the TS-TCC code and other methods (e.g., simclr, CPC) also?

Nan question in SupConLoss

I found that in 'SupConLoss::forward' of 'loss.py', logits are defined by anchor_Dot_Contrast subtracts the maximum value of each line from itself, but if the distribution of anchor_Dot_Contrast is uneven, where the maximum value of a certain row is much larger than the other values of that row, will result in the logits being composed of some smaller negative numbers (such as -208.3) and 0. Due to the use of torch. float32 precision in the code, in this case, there is only one 1 in a certain line of torch. exp (logits), and the rest are all zero (and approximately zero); Even worse, if it's related to logits_ Multiplying masks will result in exp_ All logits in a certain line are 0. This will result in a log_ Prob=logits - torch. log (exp_logits. sum (1, keepdim=True)) becomes infinite, resulting in the loss being calculated as Nan. This will be an unacceptable consequence.

Evaluation Process

We have used the same dataset and code as shared and haven't changed any configs; but in the linear evaluation experiment we are not getting the results near to the one shown in the paper; have you used any other evaluation process to get the results as presented in the paper like k cross validation or any change in the number of epochs pretrained?

there might be code error for augmentation?

In strong augmentation, is it correct that pat[0,warp]?
I think pat[:,warp] should be correct.
Could you check below code?

for i, pat in enumerate(x):
    if num_segs[i] > 1:
        if seg_mode == "random":
            split_points = np.random.choice(x.shape[2] - 2, num_segs[i] - 1, replace=False)
            split_points.sort()
            splits = np.split(orig_steps, split_points)
        else:
            splits = np.array_split(orig_steps, num_segs[i])
        warp = np.concatenate(np.random.permutation(splits)).ravel()
        ret[i] = pat[0,warp]
    else:
        ret[i] = pat
return torch.from_numpy(ret)

Add license

Please, can you add the license for your repo.

Augmentations and # of training epochs

While recently implementing TS-TCC I ran into the same issue as one raised here by vamsi231297 a few years ago that self-supervised TS-TCC only augments once at the beginning of training.

To my knowledge all other methods randomly augment each data sample at each epoch, and not just once at the start. See for example Mocov2 (https://github.com/facebookresearch/moco/blob/main/main_moco.py#L351). This is the usual way to prevent overfitting & increase generalization.

I implemented augmenting each epoch rather than just once at the start, and found that it increased the performance of TS-TCC on the HAR dataset. Linear probe after self supervised phase went from 0.9160 accuracy to 0.9294. 9.294 is higher than the values and errorbars reported in Table 2 of the paper.

Saying that, I am also finding that training for more epochs with the rest of the parameters unchanged results in better performance than shown in Table 2 of the paper. E.g. 1000 epochs of supervised training gets accuracy=0.9408. Changing the learning rate also can get better accuracies.

So I would not take my test at face value as an improvement of TS-TCC over supervised as it seems likely that my results and those of Table 2 are highly dependent on hyperparameters and seeds.

Can not repeat FD dataset preprocess

        Mat_0(k,a,:)=vib_0(i:i+wind_size);

What is the value of wind_size?

I guess wind_size == sample_len?

However, after that, i see this error:

>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Error using reshape Number of elements must not change. Use [] as one of the size inputs to automatically calculate the appropriate size for that dimension.

Problem with Augmentation

I am currently trying to get the repository running. Soon as I try to run the self-supervised part, I run into the following error.
I downloaded the datasets from the dataverse... I understand the error message and can follow it, but I am wondering why it does not work for me out of the box. Is there anything I am missing here?

> python main.py --training_mode self_supervised --selected_dataset HAR
=============================================
Dataset: HAR
Method:  TS-TCC
Mode:    self_supervised
=============================================
Traceback (most recent call last):
  File "/home/TS-TCC/main.py", line 85, in <module>
    train_dl, valid_dl, test_dl = data_generator(data_path, configs, training_mode)
  File "/home/TS-TCC/dataloader/dataloader.py", line 51, in data_generator
    train_dataset = Load_Dataset(train_dataset, configs, training_mode)
  File "/home/TS-TCC/dataloader/dataloader.py", line 33, in __init__
    self.aug1, self.aug2 = DataTransform(self.x_data, config)
  File "/home/TS-TCC/dataloader/augmentations.py", line 8, in DataTransform
    strong_aug = jitter(permutation(sample, max_segments=config.augmentation.max_seg), config.augmentation.jitter_ratio)
  File "/home/TS-TCC/dataloader/augmentations.py", line 42, in permutation
    warp = np.concatenate(np.random.permutation(splits)).ravel()
  File "numpy/random/mtrand.pyx", line 4720, in numpy.random.mtrand.RandomState.permutation
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

data augment

def scaling(x, sigma=1.1):
# https://arxiv.org/pdf/1706.00527.pdf
factor = np.random.normal(loc=2., scale=sigma, size=(x.shape[0], x.shape[2]))
ai = []
for i in range(x.shape[1]):
xi = x[:, i, :]
ai.append(np.multiply(xi, factor[:, :])[:, np.newaxis, :])
return np.concatenate((ai), axis=1)

In this paper, weak augmentation is a
jitter-and-scale strategy. Specifically, we add random variations to the signal and scale up its magnitude. But why no jitter in weak augment.

Obtaining labels on a completly unsupervised dataset

Hi @emadeldeen24, I found this project interesting, but I still need to understand completely how I can obtain labels on the fully unsupervised dataset. Once I've completed the training in self-supervised mode, how can I get the predictions for each time series provided in the input?
I've seen that you use Fine-tuning or the linear classifier, but they require some data labels in input. How can I use a completely unsupervised version?

how to handle overfitting problem?

I have tried the TS-TCC to build a classifier for EEG dataset.
However, the training and validation accuracy are greater 90%, but the test accuracy is only 44%.

So would you like to give me some advice or suggestions ?

Thanks!

time series forecasting

Hi, great job! can this model be used for time series forecasting?How's the effect

Problem with self_supervised mode training

When i run the main.py with self_supervised mode training

the following error occurs.

Input tensor shape: torch.Size([128, 8, 100]). Additional info: {'h': 3}.
Shape mismatch, can't divide axis of length 100 in chunks of 3

The error comes from the source code in line 57-58, as is shown as below

43 class Attention(nn.Module):
44 def init(self, dim, heads=8, dropout=0.):
45 super().init()
46 self.heads = heads
47 self.scale = dim ** -0.5
48
49 self.to_qkv = nn.Linear(dim, dim * 3, bias=False)
50 self.to_out = nn.Sequential(
51 nn.Linear(dim, dim),
52 nn.Dropout(dropout)
53 )
54
55 def forward(self, x, mask=None):
56 b, n, _, h = *x.shape, self.heads
57 qkv = self.to_qkv(x).chunk(3, dim=-1)
58 q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h=h), qkv)

how to solve it? Thanks

the process of self-supervised experiment

hello. I have read your paper and code. In your code, I am so confused about the experiment process of fine-tune for self-supervised model. The following is my understanding: First you trained the pre-trained model using train.pt data. Then, in fine-tune, copy the pretrained model parameter in model, the next is my cofused: why do you conduct supervised training using train.pt again? why are the parameters of the pre-trained model not frozen? In your abstract, you write you propose an unsupervised method, Where does unsupervision manifest itself? In my understanding, in fine-tune process, you should forzen the pre-trained model parameter, and using fewer labeled data to finetune the classifier after the pretrained model.

Looking forward to your answer, thank you.

Unable to reproduce pre-processing for HAR, Sleep-EDF, Epilepsy datasets

Hi, thanks for releasing your code! I am trying to reproduce the results from the paper, as well as extend to other datasets, but run into some problems due to insufficient information to recreate the datasets.

Code for pre-processing the HAR dataset is missing. Furthermore, the HAR dataset linked in the README.md comes with 561 dimensions, whereas the config file indicates 9 input channels.
Code for pre-processing the Sleep-EDF dataset does not result in files which match the description in the README.md nor the dataloader code.
Code for pre-processing the Epilepsy dataset only results in train.pt and test.pt, missing a val.pt file as indicated in the README.md and dataloader code.
Also, is it possible to include a requirements.txt file, because it seems that some libraries being used are not backward compatible. The latest version of mne does not have scaling_time as an argument to raw.to_data_frame.

TS-TCC/data_preprocessing/sleep-edf/preprocess_sleep_edf.py

Line 90 in 5c5efa7

raw_ch_df = raw.to_data_frame(scaling_time=100.0)[select_ch]

Thank you!

Loss during self-supervised training stage

During the self-supervised training stage the training loss is not decreasing much it's hovering around 10. But in the fine tuning stage we are getting good accuracy. Is there any prominence to the self supervised training loss and should we even consider looking at the self supervised training loss values?

No val.pt files in the processed datasets on Dataverse?

Hi Emadeldeen. Thanks for sharing your work! Just one quick question. I found there are no val.pt files for HAR and Epilepsy datasets. Is there any particular reason for this?

Badly in need of a pretrained model of epilepsy.Could anyone help?

Contextual Contrasting Loss Function

Hi there,

I was taking a closer look in the implementation of the contextual contrasting loss and and I am having trouble understanding how the positive samples are being treated differently from the negative ones, and specifically how this corresponds to the Eq. 5 of your paper. Could it be that labels should have some elements equal to one and not all of them being zeros?

TS-TCC/models/loss.py

Lines 61 to 62 in ebfdbab

 labels = torch.zeros(2 * self.batch_size).to(self.device).long() 

 loss = self.criterion(logits, labels)

Thanks!

When I choose the random_init training, the results(77.9) is much better than the scores you reported in Table 2(57.89±5.13). I don't know what random seed you used in your paper, but for 5 seed i tested, they all higher than your report.
When I use only 5% of trainining data for supervised trainining, the accuracy and macro F1(>0.89) is much higher than the reported scores Fig.2(MF1<0.55). Below are the results for training with 5% training dataset (with batch size of 8).
All thereported results are hard to reproduce.

Hope you could share me your experiment setting and training logs to remove doubts.

Loss cannot decrease

The loss, temp_cont_loss1, temp_cont_loss2, cannot decrease during training in our dataset, how to solve it? Thank you

	labels = torch.zeros(2 * self.batch_size).to(self.device).long()
	loss = self.criterion(logits, labels)

emadeldeen24 / ts-tcc Goto Github PK

ts-tcc's People

Stargazers

Watchers

Forkers

ts-tcc's Issues

Recommend Projects

Recommend Topics

Recommend Org