A collection of incremental learning paper implementations including PODNet (ECCV20) and Ghost (CVPR-W21).

License: MIT License

Python 99.78% Dockerfile 0.13% Makefile 0.10%

incremental-learning lifelong-learning continual-learning deep-learning pytorch research

incremental_learning.pytorch's Introduction

Incremental Learners for Continual Learning

Repository storing some my public works done during my PhD thesis (2019-).

You will find in there both known implementation (iCaRL, etc.) but also all my papers. You can find the list of the latter on my Google Scholar.

My work on continual segmentation can be found here and on continual data loaders here.

Structures

Every model must inherit inclearn.models.base.IncrementalLearner.

PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning

If you use this paper/code in your research, please consider citing us:

@inproceedings{douillard2020podnet,
    title={PODNet: Pooled Outputs Distillation for Small-Tasks Incremental Learning},
    author={Douillard, Arthur and Cord, Matthieu and Ollion, Charles and Robert, Thomas and Valle, Eduardo},
    booktitle={Proceedings of the IEEE European Conference on Computer Vision (ECCV)},
    year={2020}
}

To run experiments on CIFAR100 with three different class orders, with the challenging setting of 50 steps:

python3 -minclearn --options options/podnet/podnet_cnn_cifar100.yaml options/data/cifar100_3orders.yaml \
    --initial-increment 50 --increment 1 --fixed-memory \
    --device <GPU_ID> --label podnet_cnn_cifar100_50steps \
    --data-path <PATH/TO/DATA>

Likewise, for ImageNet100:

python3 -minclearn --options options/podnet/podnet_cnn_imagenet100.yaml options/data/imagenet100_1order.yaml \
    --initial-increment 50 --increment 1 --fixed-memory \
    --device <GPU_ID> --label podnet_cnn_imagenet100_50steps \
    --data-path <PATH/TO/DATA>

And ImageNet1000:

Likewise, for ImageNet100:

python3 -minclearn --options options/podnet/podnet_cnn_imagenet100.yaml options/data/imagenet1000_1order.yaml \
    --initial-increment 500 --increment 50 --fixed-memory --memory-size 20000 \
    --device <GPU_ID> --label podnet_cnn_imagenet1000_10steps \
    --data-path <PATH/TO/DATA>

Furthermore several options files are available to reproduce the ablations showcased in the paper. Please see the directory ./options/podnet/ablations/.

Insight From the Future for Continual Learning

If you use this paper/code in your research, please consider citing us:

@inproceedings{douillard2020ghost,
    title={Insight From the Future for Continual Learning},
    author={Arthur Douillard and Eduardo Valle and Charles Ollion and Thomas Robert and Matthieu Cord},
    booktitle={arXiv preprint library},
    year={2020}
}

The code is still very dirty, I'll clean it later. Forgive me.

incremental_learning.pytorch's People

Contributors

Stargazers

Watchers

incremental_learning.pytorch's Issues

fixed memory in PODNet NME config

Hi, I came across the problem when trying to reproduce PODNet NME with PyTorch 1.7 and launch another run with PyTorch 1.2 as you suggested. The command is here. Then I successfully got the similar results 60.41 +/- 0.86. However, I found this in the log file:

2021-07-12:11:02:42 [podnet.py]: Now 40 examplars per class.

which was printed before first incremental task and I believe the ablation studies followed the fixed memory protocol. In the command above, fixed-memory option is missing and
self._fixed_memory = args.get("fixed_memory", True) might fail to work as expected because if fixed_memory is missing in command, parser would automatically set fixed_memory=False. After manually set fixed_memory=True, PODNet NME still got 56.91 +/- 0.6. I wonder if there is something wrong with my config or something else to improve the results.

Thanks :)

Class order in iCIFAR100 Dataset

I see that you have defined a specific class order in the iCIFAR100 dataset class. Was it something that gave the best result and you found that out empirically or is it due to some other reason?

about the podLoss

Hi,thanks for your great work. I wonder why " a = torch.pow(a, 2) b = torch.pow(b, 2)" in advance ? Thank you.
(

incremental_learning.pytorch/inclearn/lib/losses/distillation.py

Line 64 in 8893590

a = torch.pow(a, 2)

)

metadata file imagenet dataset

Hi, could you please share the preprocessing and files of metadata needed when training ImageNet?

Question on the basis settings

Hello, I'm a beginner at continuous learning. First of all, thank you for allowing me to use your wonderful platform.

I have a question while looking at your code.

I was going to proceed with the LwF study. In your code, the variable "distillation_config" exists in the LwF class.

class LwM(IncrementalLearner):

    def __init__(self, args):
        self._device = args["device"][0]
        self._multiple_devices = args["device"]

        self._opt_name = args["optimizer"]
        self._lr = args["lr"]
        self._lr_decay = args["lr_decay"]
        self._weight_decay = args["weight_decay"]
        self._n_epochs = args["epochs"]
        self._scheduling = args["scheduling"]

        self._distillation_config = args["distillation_config"]
        self._attention_config = args.get("attention_config", {})

        logger.info("Initializing LwM")

        self._network = network.BasicNet(
            args["convnet"],
            convnet_kwargs=args.get("convnet_config", {}),
            classifier_kwargs=args.get("classifier_config", {
                "type": "fc",
                "use_bias": True
            }),
            device=self._device,
            gradcam_hook=True
        )

        self._n_classes = 0
        self._old_model = None

May I know what the role of this variable is?
Also, is it a valid task to learn only the LwF model in your code?

Trying to reproduce BiC

Hello Arthur,

Congratulations for the contribution in IL.

I am trying to reproduce the BiC results running the following command:
python -minclearn --options options/bic/bic_cifar100.yaml options/data/cifar100_3orders.yaml --increment 10 --initial-increment 50 --fixed-memory --temperature 2 --data-path data/ --device 0

The average incremental accuracy I am getting is ~51%, which is ~5% lower than this reported on paper. Is there anything wrong with the command?

Thank you in advance :)

about the relu function before pooling

Hi, thanks for your great work. In your paper, you said that "We remove the ReLU activation at the last block of each ResNet " , but i have read the code of "my_resnet.py" and found that you remove every last ReLU activation in each block of ResNet.

About UCIR on CIFAR100

Thank you so much for your great work!

Recently, I attempted to reproduce UCIR model on CIFAR100. And I found the first run was perfectly fine, however, the performance dropped dramatically in the second and third run. After calculating the average points, the results are poor.

Here is the log:

2021-08-17:08:02:07 [train.py]: Results done on 3 seeds: avg: 43.24 +/- 7.77, last: 33.6 +/- 9.01, forgetting: 30.87 +/- 6.26
2021-08-17:08:02:07 [train.py]: Individual results avg: [49.33, 45.9, 34.49]
2021-08-17:08:02:07 [train.py]: Individual results last: [40.5, 36.9, 23.4]
2021-08-17:08:02:07 [train.py]: Individual results forget: [29.35, 25.51, 37.75]

Thanks for your time and consideration, and best wishes.

About the learning rate schedule for finetuning stage

I notice that the code you released used an additional finetuning stage, which is not mentioned in your paper. Can I simply remove this part and get the result as u post on paper?

Question about implementing classification loss

I found for previous methods such as LWF.MC, ICARL, EEIL, they all use binary cross-entropy loss to calucate both the distillation term and classification term, while in your codes, the binary cross-entropy loss is replaced by the cross-entropy for classification term. I am wondering if there exist difference between binary cross-entropy and cross-entropy? From my own implementation, the performance is better if I only apply binary cross-entropy, but acutually the problem is a multi-class classification task, so I am confused about the results. Thanks.

The getitem does not adapt to E2E

Hello,

Thanks for your continual updates. However, in this version, the getitem of the data loader seems not to adapt to E2E, in which 'real_idxes' and 'idxes' are needed. This information is not provided in this version, while it is ok in old version.

E2E performance

Thanks for your great repetition. Could you provide some configuration of E2E to get good performance?

Besides, is there much difference between single fc and multi fc?

Thank you again!

Status of some of the models

Hi @arthurdouillard ,
thanks for the great readable code!
I had a question about whether some of the implemented models like 'medic' are complete in the sense of reproducing the paper's results?
Asking because the readme doesn't mention their status.
Thanks again,
Gunshi

Issue with global pooling on resnet

See donlee90/icarl#1 (comment)

Would you share the python version?

Something wrong when running the code:
"?it/spython3: symbol lookup error: /home/anaconda3/envs/lib/python3.6/site-packages/torch/lib/libtorch_python.so: undefined symbol: PySlice_Unpack"

could you share the python version? Thanks for your time~

Add LwF

Some questions about the paper detail

Hi, thanks for your great job. I notice that the paper says that "all alternative losses were tuned on the validation set to get the best performance". Could you give more details about that? What is the proportion of validation set in the original training set? Do you retrain the model from scratch on the original training set after the parameter tuning on the validation set?

Thanks in advance.

How to plot the saved results ?

Remove hard coded args in E2E

icarl performance

Hi，I run your code for the icarl and got an average of 60.59. Each curve's dot (10 classes) is also slightly lower than the image in Fig. 2 in icarl paper. ([0.889, 0.778, 0.693, 0.625, 0.575, 0.564, 0.523, 0.499, 0.477, 0.436]) My config is as follows:
"config": {
"model": "icarl",
"convnet": "rebuffi",
"dropout": 0.0,
"herding": null,
"memory_size": 2000,
"temperature": 1,
"fixed_memory": true,
"dataset": "cifar100",
"increment": 10,
"batch_size": 128,
"workers": 0,
"threads": 1,
"validation": 0.0,
"random_classes": false,
"max_task": null,
"onehot": false,
"initial_increment": 10,
"sampler": null,
"data_path": "datasets/CIFAR100",
"lr": 2.0,
"weight_decay": 5e-05,
"scheduling": [
49,
63
],
"lr_decay": 0.2,
"optimizer": "sgd",
"epochs": 70,
"label": "icarl_cifar100_9steps",
"autolabel": false,
"seed": 1,
"seed_range": null,
"options": [
"options/icarl/icarl_cifar100.yaml",
"options/data/cifar100_3orders.yaml" [87, 0, ...]
],
"save_model": "never",
"dump_predictions": false,
"logging": "info",
"resume": null,
"resume_first": false,
"recompute_meta": false,
"no_benchmark": false,
"detect_anomaly": false,
How can I modify it to get a average result of about 64%? Thanks！

About UCIR hyperparameter

Thanks again for fixing PODNet NME, the new config file produced impressive results 🎉 .

But when I tried to reproduce UCIR, I found another problem. It seems that
self._lambda = args.get("base_lambda", 5) self._nb_negatives = args.get("nb_negatives", 2) self._margin = args.get("ranking_margin", 0.2) (model.ucir.py, line 66)
try to get hyperparameters from self._use_ranking and self._use_less_forget which both are dict instances, not from args.

And based on the suggestion here, I wonder when UCIR uses NME for eval by default, should I discard fine-tuning as PODNet NME did.

I really appreciate your PODNet and other reproductions, this repo helps me a lot, a huge thank you :)

Seeking for repoducing data sharing for drawing

hello!
A lot of thanks for your great work! Sorry for bothering you but I wonder if you are pleased to share the data related to drawing the following picture.

Thanks a lot!

About UCIR configuration

Thank you so much for this excellent work!

I am a little confusing about the epoch number for ucir in the latest version (ucir_cifar100.yaml, line 42). It seems that you change it from 160 to 1. However, in the original implementation of UCIR, such number is 160. Moreover, I tried to reproduce UCIR using the current configuration, the results are awful (15.17% average acc).

I am wondering if this is a typo, or you have a higher level of consideration? Thank you so much!

Accuracy to report

Hi @arthurdouillard ,

Thanks for the detailed display of all metrics such as a old classes, new classes - they are very helpful.

I have a question regarding the accuracy to report:

Do I report the

Accuracy on all data seen so far, calculated through just one call to accuracy with outputs and targets from all data seen so far?

The average accuracy over individual tasks, calculated by averaging across each task (respective classes) accuracy over the stream seen so far?

thank you !

Configurations related to reproduce lwm

hello!
thank you for your great work! Recently I want to use your code to reproduce lwm but I didn't find the option file. Could you please share me the configurations related to lwm? A lot of thanks!

best wishes!

Add EWC

ImageNet 100 label index. Why choose these label index

HobbitLong/CMC#21
https://www.kaggle.com/ambityga/imagenet100

I found there are different versions of ImageNet 100.

So, I was so confused which one should I choose.

Is it a simple random selection of labels?

Hope to get ur replay early.

Thx.

How can i change dataset?

I need to change the dataset. I pasted this code in anaconda prompt.

After doing this I got these errors.How can i solve this?

Is there an easier way to change the dataset?

icarl results?

Nice work.

In Podnet, the experiments protocol is first trained on half classes (50 for CIFAR100), also knowns as CIFAR100-B50.

I also achieve similar results about 53.78 or 58.08 in ur paper (table 1).

However, when i tried CIFAR100-B0, which is split the whole dataset class. (original icarl experiments protocol), I cannot achieve the results. (eg, 0.64 for 10 tasks, 10 classes per task)

How's ur results?

Could u pls give some insghts?

Thx.

Can anyone get the result of LwM?

I run
python -minclearn --model lwm --increment 20 -memory 0. And I change self._attention_config = args.get("attention_config", {"factor":1}) and self._distillation_config["factor"] =1 in lwm.py.

However, the result is very low. Could anyone give me some advice?

Add support for online learning datasets

Like learning a continuous flow of MNIST digits

Fully connected Network instead of resnet18

Hi @arthurdouillard ,
How to use a fully connected network without any convs in place of the default resnet18 in the iCarl setup ?

Excuse me, how do I run your code, such as run icarl.

Weird results of icarl on Cifar-100

Hi, @arthurdouillard
Thanks for your great work! I am trying to use your codes to reproduce the icarl method but the results are not as normal as the results in your paper.
I run the following script:

python3 -minclearn --options options/icarl/icarl_cifar100.yaml options/data/cifar100_3orders.yaml \
    --initial-increment 50 --increment 1 --fixed-memory \
    --device 0 --label icarl_cnn_cifar100_50steps \
    --data-path data

I obtain 44.96, 42.55, 27.76 with three seeds, so that avg= 38.43 +/- 9.32. Am I missing something to reproduce the results? Thanks.

Post the log here for reference.

 2021-10-19:23:12:46 [train.py]: Eval on 0->100.

 2021-10-19:23:12:49 [train.py]: icarl_cnn_cifar100_50steps

 2021-10-19:23:12:49 [train.py]: Avg inc acc: 0.2775882352941177.

 2021-10-19:23:12:49 [train.py]: Current acc: {'total': 0.186, '00-09': 0.202, '10-19': 0.184, '20-29': 0.222, '30-39': 0.178, '40-49': 0.219, '50-59': 0.139, '60-69': 0.21, '70-79': 0.162, '80-89': 0.163, '90-99': 0.18}.

 2021-10-19:23:12:49 [train.py]: Avg inc acc top5: 0.5672156862745097.

 2021-10-19:23:12:49 [train.py]: Current acc top5: {'total': 0.437}.

 2021-10-19:23:12:49 [train.py]: Forgetting: 0.47154545454545455.

 2021-10-19:23:12:49 [train.py]: Cord metric: 0.26.

 2021-10-19:23:12:49 [train.py]: Old accuracy: 0.18, mean: 0.26.

 2021-10-19:23:12:49 [train.py]: New accuracy: 0.51, mean: 0.66.

 2021-10-19:23:12:49 [train.py]: Average Incremental Accuracy: 0.2775882352941177.

 2021-10-19:23:12:49 [train.py]: Training finished in 4317s.

 2021-10-19:23:12:49 [train.py]: Label was: icarl_cnn_cifar100_50steps

 2021-10-19:23:12:49 [train.py]: Results done on 3 seeds: avg: 38.43 +/- 9.32, last: 28.0 +/- 8.14, forgetting: 41.62 +/- 5.17

 2021-10-19:23:12:49 [train.py]: Individual results avg: [44.96, 42.55, 27.76]

 2021-10-19:23:12:49 [train.py]: Individual results last: [32.9, 32.5, 18.6]

 2021-10-19:23:12:49 [train.py]: Individual results forget: [40.79, 36.92, 47.15]

Without pretraining on first 50 classes

Hi @arthurdouillard ,

Thanks for the great codebase!

I would like to evaluate all algorithms in the setting where pretraining on the first 50 classes is not allowed.
I'd like to know if just setting --initial-increment and --increment to appropriate values (eg. 5 and 5) would do.

Would it need any change in hyperparameters, in any of the algorithms?

Thank you very much!

Python and PyTorch version

Hi, I was wondering if you could provide the Python and PyTorch versions that this code was tested with?

PODNet - NME accuracy for ImageNet

Hi Arthur @arthurdouillard ,

I see that your paper PODNet reports PODNet-CNN and PODNet-NME accuracies from CIFAR-100.

But for ImageNet, it reports only PODNet-CNN accuracy. Could you please point out why? If there are available, can you please provide the numbers?

Thank you!

confiugre file for podnet_nme_cifar100

I run your code using podnet_nme_cifar100 & podnet_cnn_cifar100.

In case of podnet_cnn_cifar100, I got nearly same results as yours reported in your paper.
However, I could not get your results for podnet_nme_cifar100(mine: 56.78/0.41, yours: 61.40/0.68).
Q1), Could you check your podnet_nme_cifar100.yaml ?

Also, I would like to ask another question about how many gpus you use for ImageNet experiment?
In the configure file for ImageNet experiments, you use a batch size of 64.
Q2) However, in your main paper, the batch size is set to 128 for all datasets. Could you let me know about the batch size and the number of gpus for ImageNet ?

Add support for more metrics

Like forward & backward transfer

E2E code is not using exemplars

Where is the train loader being appended with old class exemplars in E2E model. For instance, before doing this, shouldn't we be updating the train loader by adding the old-class exemplars aswell?

Could share the config file for your UCIR for imagenet 100.

This is what I am using right now:

dataset: imagenet100

model: ucir
convnet: rebuffi
convnet_config:
last_relu: false

batch_size: 64
memory_size: 2000
fixed_memory: True

classifier_config:
scaling: 1
gamma: 1
type: cosine
proxy_per_class: 1
distance: neg_stable_cosine_distance

less_forget:
scheduled_factor: true
lambda: 10

postprocessor_config:
initial_value: 1.0
type: learned_scaling

ranking_loss:
factor: 1.0
nb_negatives: 2
margin: 0.5

finetuning_config:
tuning: classifier
lr: 0.01
epochs: 20

lr: 0.1
weight_decay: 0.0001
scheduling:
type: step
epochs: [30, 60]
gamma: 0.1
lr_decay: 0.1
optimizer: sgd
epochs: 90

weight_generation:
type: imprinted

But I am getting a difference of around 8% in the base task(accuracy after training on the base task) that is joint training while doing increments of 20-20-20-20-20, So in the first 20. Am I doing something incorrect? Also I noticed an increase in the don't base task performance when compared to UCIR paper's implementation by 3%. I first though that it might be due number of proxy per class used in podnet but even when I reduced it to 1 in your config file, it displayed a similar 3% increase, any ideas why it might be performing better in joint training of base task.

A small bug & option file for UCIR model is missing.

Nice work!

A small bug I found:
In your models/ucir.py:105 , test_loader should be data_loader.

I want to run experiments on ucir model. Cound you please provide an option file about ucir model?

Thanks~

Add support for multi-datasets incremental

Like imagenet -> birds, or cifar10 -> cifar100

where can I change the targets value to torch.long

I keep having this: "tensors used as indices must be long, byte or bool tensors" lib/losses/base.py line 77

Guess I need to change the target tensor dtype from torch.int32 to torch.int64.

But I can find the origin place that variable targets was set...

Would you share the model file for the imagenet's top 500 classes?

When to set use_multi_fc over and bias flags?

Can you mention the reason behind the use of multi_fc (E2E) and single_fc (iCaRL and LwF)? Also, when does one set the bias in the classifier (True for E2E and iCaRL while False for LwF)?

Finetune in PODNet

The code conduct finetune stage in training phase. It seem not be mentioned in the paper. Is the result reported in paper obtained with finetune?

Trying to incorporate podnet in ucir code

In the classifier.py file while using cosine classifier

The value of both self.scaling and self.gamma remains 1 throughout the run.
Could you briefly explain the use of both.

If we normalise both the features and weights their dot product will lie in -1 to 1 range so we need to scale them, which variable in your code is exactly doing that.

my run command is this -

python3 -minclearn --options options/podnet/podnet_cnn_cifar100.yaml options/data/cifar100_3orders.yaml
--initial-increment 50 --increment 1 --fixed-memory
--device <GPU_ID> --label podnet_cnn_cifar100_50steps
--data-path <PATH/TO/DATA>

result about Imagenet1000

Hello, thank you very much for being able to open source this code. I want to ask, the result I ran with your code in Imagenet1000 is two points lower than the one you report in your paper. Could you please help me check it?

result on cifar100(5 step) using ucir_resnet

I think the Resnet used in UCIR's code is more standard, so I run the experiment on cifar100(5 step) using your ucir_resnet.py, the result is worse than before a lot. Do I need to modify some other configurations to get a better result?

that is what I modify on your ucir_resnet.py

def forward(self, x, **kwargs):
    x = self.conv1(x)
    x = self.bn1(x)
    x = self.relu(x)

    fea1 = self.layer1(x)
    fea2 = self.layer2(fea1)
    fea3 = self.layer3(fea2)

    attentions = [fea1, fea2, fea3]
    raw_features = self.end_features(fea3)
    features = self.end_features(F.relu(fea3, inplace=False))

    return {"raw_features": raw_features, "features": features, "attention": attentions}

def end_features(self, x):
    x = self.avgpool(x)
    x = x.view(x.size(0), -1)

    return x

arthurdouillard / incremental_learning.pytorch Goto Github PK