google-research / l2p Goto Github PK

Learning to Prompt (L2P) for Continual Learning @ CVPR22 and DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning @ ECCV22

Home Page: https://arxiv.org/pdf/2112.08654.pdf

License: Apache License 2.0

Python 100.00%

deep-learning continual-learning jax

l2p's Issues

Comparison with few/zero-shot performance and Task Specific prompts.

Hi, thank you for your great work. I was wondering if you have done any of the following experiments.

Have you evaluated the few-shot and zero-shot performance of the base ViT model on the CIFAR100 dataset?
Tuned Task specific prompts for the based ViT model used for classification? I feel this is a crucial number to compare with L2p.

Thanks!

code for dualprompt

Hi, thanks for ur great work! Just wondering has the code section for dualprompt been released here?

About optionally diversifying prompt-selection

Thanks for the great idea and the result!

As the title says, I'd like to know how to use optionally diversifying prompt-selection, I don't see where to use the arguments for this method, nor do I see an implementation of it in . /models/prompt.py

I would like to ask about how to normalize the frequency of each prompt into a penalty factor, I don't see a specific description in the paper.

Reproduce issue

Dear author.

Thank you for your great work.

I'm having a little problem with reproducing L2P.

First, please modify the environment setup of README.md.

The link for adjust the jax version in README does not support CUDA version.
I think the link

https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

seems to have changed to the link above.

The other thing is that even if batch_size is set to 1, I can't run the code completely in my 4 A5000 GPUs because of out of memory issue.
Can you also provide small models such as ViT-Tiny or ViT-Small?

I look forward to hearing from you.

Thank you.

Questions about the pre-trained ViT

Dear authors,

Thanks for your great job in building CIL learners with pre-trained models. I have a simple question regarding the pre-trained ViT. I noticed there are several versions of pre-trained ViT on the market. Seeing the fact that the current repo suggests downloading the pre-trained model from https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz, does it mean the model is pre-trained with IN21K?

Thx in advance.

Reproducing experiments

Hello,
thanks for sharing the code, since it seems to be a pretty simple yet general idea which deserves further investigation. We are trying to reproduce the experiments in order to design possible extensions.
However, in the paper we can see results on cifar100, 5-datasets, CORe50 but in the code it seems that among them, only cifar is supported (and we can't find any code for running the baseline experiments consistently).

Is the released version of the code a non-final version? If so, we are kindly asking you to release the full code related to the published paper, in particular the libml/input_pipeline.py and all the relevant configs file.

Thank you for your time and consideration

Questions about domain-incremental setting, positional embedding and location of prompt

Dear author,

Thank you for your great work.

There are some questions while reproducing the official code.

From what I understand, the key of the L2P is to freeze a well-pretrained backbone (ViT) and train only small-sized prompts to achieve amazing performance.

However, if you look at the config in the domain increment setting using CORe50, the freeze part is an empty list.
When reproduced without any config modification in my environment, I got results (77.91%) similar to the paper.
According to the results, it is expected that full tuning without freezing of the backbone will be the result of the paper.

1. Why didn't you freeze the backbone in the domain-incremental setting?
2. Was it written in the paper? I also read the supplementary and didn't see anything about it.
Trivial question.
Only 99% of the samples of the entire CORe50 dataset are used because the subsmaple_rate is -1 in this part (test, train).
3. Is this the intended implementation?

And about positional embedding,
Before the release of the code version integrated with DualPrompt, the positional embedding was also added to prompts in L2P.
However, in the version of code that is integrated with DualPrompt, the positional embedding is no longer added to prompts (only added to image tokens) in L2P.
I think positional embeding will have a great impact on performance.
4. Which is the right?

Additionally, when using L2P in code integrated with DualPrompt,
Encoders have the input as [Prompts, CLS, Image tokens].
But the code before the integration with DualPrompt is [CLS, Prompts, Image tokens].
5. which one is correct?

Please let me know if there is anything I missed.

Best,
Jaeho Lee.

Only can run in CPU

Both in Google Colab with TPU and my own Unbuntu with Cuda,

your code only run in CPU, not run in GPU or TPU!

Question about the paper's comparison.

I've read the paper and have some small questions about the paper's comparison to other models like EWC, ER and DER++ because I've not gained or maybe missed the information about the details of conducting these methods in the pretrained ViT model.
Which part of the ViT is trained or finetuned in Upper-bound and those methods and which parts are using pretrained weights? I guess only the classifier is trained but need some confirmation.

RESOURCE_EXHAUSTED: Out of memory while trying to allocate # bytes.

Hi, I am trying to run the model on a CIFAR100 dataset. I am getting the following error. I have 4 Tesla V100 GPUs.

2022-08-05 10:00:26.833817: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.00MiB (rounded to 9437184)requested by op 
2022-08-05 10:00:26.835182: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:491] *********************************************************************************x**************x***
2022-08-05 10:00:26.835281: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2130] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 9437184 bytes.
BufferAssignment OOM Debugging.

The complete running logs can be found here. Please help me with solving the issue.

===============

For your information, I was getting a RuntimeError: Visible devices cannot be modified after being initialized error. Hence, I added the following code snippet in main.py from https://www.tensorflow.org/guide/gpu, and it solved the issue.

"""Main file for running the example."""

import os
os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"] = "false"

imports ...

FLAGS = flags.FLAGS
...

def main(argv):
  del argv

  # Hide any GPUs form TensorFlow. Otherwise TF might reserve memory and make
  # it unavailable to JAX.
  # tf.config.experimental.set_visible_devices([], "GPU")

  gpus = tf.config.list_physical_devices('GPU')
  if gpus:
    # Restrict TensorFlow to only use the first GPU
    try:
      tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
      logical_gpus = tf.config.list_logical_devices('GPU')
      print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
    except RuntimeError as e:
      # Visible devices must be set before GPUs have been initialized
      print(e)

  # if gpus:
  #   # Create 2 virtual GPUs with 1GB memory each
  #   try:
  #     tf.config.set_logical_device_configuration(
  #         gpus[0],
  #         [tf.config.LogicalDeviceConfiguration(memory_limit=1024),
  #         tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
  #     logical_gpus = tf.config.list_logical_devices('GPU')
  #     print(len(gpus), "Physical GPU,", len(logical_gpus), "Logical GPUs")
  #   except RuntimeError as e:
  #     # Virtual devices must be set before GPUs have been initialized
  #     print(e)


  if FLAGS.exp_id:
     ...

invalid PRETRIAN model link

Hi, thanks for sharing the codes. It seems that the pretrained vit checkpoint link is invalid, Could you share it again?

Split ImageNet-R？

Could you share the split of training and test set of the proposed Split ImageNet-R benchmark?

constraint_coefficient of surrogate loss for pulling selected keys closer

The constraint_coefficient of surrogate loss is 0.5 in the paper.

But, in the codes, that is set to -0.1.

What is correct? Which is better?

Thank you.

Inference

Hi, thanks for the interesting work.

I have one question regarding the choice of prompt during the testing.

It seems that both DualPrompt and L2P use batch mode during testing, and each batch will choose the same prompt using majority voting. However, assuming that every test sample per batch is from the same task is questionable. Do you have any thoughts about this? Looking forward to hearing from you!

Bests,
Yuansheng

Question about the t-SNE visualization of prompts (Figure 4)

Hi,

thanks for your great work! I have a question about Figure 4 in the paper.

In the paper you said that "For a prompt with shape L×D, we treat it as L prompts with dimension D.", and in Figure 4, each point represents a prompt vector of D=768.

But both general and task-specific prompts are inserted to multiple layers of the model. I'm wondering prompts of which layer are used for visualization in Figure 4? (From the number of points it seems only prompts from one certain layer are visualized) Is there any special reason for selecting prompts of a certain layer for visualization?

Looking forward to your reply and thanks in advance!

ImportError: cannot import name 'resnet_v1' from 'models' (unknown location)

In the File "/media/iiau/LiGong4T2/zwb/code/l2p/train_continual.py", line 42, I can see:

from models import resnet_v1

but in "./models" folder, i can't find resnet_v1, so I get error like this:

ImportError: cannot import name 'resnet_v1' from 'models' (unknown location)

How to solve this problem? Thanks!

about providing the class orders.

i was wondering if you could provide the seed and the corresponding class orders used in your imagenet-r experiment?

DualPrompt: The Results without Prompts

Hi~ Thanks for your excellent work!
I try to reproduce the results on Spilit-CIFAR100 (DualPrompt). I am curious about how the prompts work, so I disable the prompt mechanism by setting the use-g-prompt, use-e-prompt, prompt-pool args to False and run the code, which I expect to be as low as reported in the paper (like ablation study Table 4 with almost 40% performance degradation on ImageNet-R). However, I got the surprising results, which are:
Acc@1: 70.2400 Acc@5: 91.1500 Loss: 1.1950 Forgetting: 9.8889 Backward: -9.8889
These results are competitive compared with some latest works without prompts. So I wonder if I missed some important things or any potential problems exist in my experiment.

Appreciate it if you can solve my puzzles!

Question regarding the average and last accuracy.

Dear authors,

Thank you for open-sourcing your work. I am slightly confused about the metrics you guys used for the evaluation. Here is my understanding from your readme.

accuracy_n: accuracy evaluation on only the n-th task after training for the n-th task.
forgetting: Average forgetting up until the current task.
avg_acc: Average evaluation accuracy after training for the n-th task.

for (3), after training for the n-th task, we got,
acc_per_task = [a1, a2,...an];
avg_acc = average(acc_per_task);

Is it right?

Using different ViT and ResNet based models in L2P

Hi, thank you for the great work! I was trying to get the some results on CIFAR100 dataset with a resnet18 model and a ViT-Small model. as mentioned in the readme, I was looking at the config file cifar100_l2p.py and finding appropriate changes to make.

For the ViT-S model, I tried to change the config.model_name = "ViT-S_16" as mentioned in the vit.py file and then used the command python main.py --my_config configs/cifar100_l2p.py --workdir=./l2p --my_config.init_checkpoint=./ViT-S_16.npz where the file ViT-S_16.npz is downloaded from here. When I do this I get some error regarding the shape mismatch. Can you please point me to the place where I can download the ViT-S_16 model checkpoint?

For the experiments with resnet18, I see that the file resnet_v1.py has a model resnet18_cifar. I changed the config.model_name = "resnet18_cifar" and ran the command python main.py --my_config configs/cifar100_l2p.py --workdir=./l2p and got the error

  File "main.py", line 64, in <module>
    app.run(main)
  File "/mnt/efs/people/ptky/miniconda3/envs/l2p/lib/python3.7/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/mnt/efs/people/ptky/miniconda3/envs/l2p/lib/python3.7/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "main.py", line 58, in main
    train_continual.train_and_evaluate(FLAGS.my_config, FLAGS.workdir)
  File "/mnt/efs/people/ptky/project/repos/l2p/train_continual.py", line 975, in train_and_evaluate
    config.model_name)
  File "/mnt/efs/people/ptky/project/repos/l2p/models/vit.py", line 698, in create_original_vit
    raise ValueError(f'Model {name} does not exist.')
ValueError: Model resnet18_cifar does not exist.

I would be really grateful, if you can please point me to the correct checkpoints to download, the changes that are required to be made in the config file, and the corresponding command.

Thanks,
Prateek

Reproduce issue

Dear author.
Thank you for your great work.

I want to experiment with replay buffer, but options such as num_reviews_epochs are not provided.
Can you tell me the settings for the experiment using the replay buffer? (which produces the same results as the paper)

I look forward to hearing from you.

Thank you.

question on the G(eneral)-Prompt learning

Thanks for your impressive work. There is still one thing that I can't figure out. According to the setting of continual learning, when learning on the current task, the data from previous tasks is unavailable. My question is, how do you make sure that the G(eneral)-Prompt learned on the current task also works for previous tasks? How do you deal with the forgetting problem of the G(eneral)-Prompt itself?

Questions about the reproducibility of the code and the results of the paper

I sincerely question the reproducibility of the code and the results of the paper, in this repo issue:

I have seen people using the same V100 GPU as the author in time, but unable to run through the code and experiencing OOM errors. No reply was forthcoming #1 #20
My own reproduction of the code does not achieve the results shown in the left panel of Figure 3 in the paper, and I do not understand why catastrophic forgetting does not occur when such statistical results occur. And even without using the Optionally diversifying prompt-selection method, I can't get this statistic, same as in the #18 #24 This issue comes with detailed statistics logs. By looking at the Histogram records, we can see that only four prompts were selected and that all tasks share these prompts. I think this is inevitably going to cause catastrophic forgetting.
The use of pre-trained ViT may have caused an information leak #11.
The given requirement.txt does not directly install the required runtime environment, and even if it does, it will only run on the CPU #1.

And, for myself:

This code is really hard to run on my RTX 3090 GPU, and even after a lot of effort and without any error reporting, the program is stuck at training step 5 of the first task.
I have not seen anyone in the issues who has successfully reproduced the results.

I very sincerely hope that the author will answer the above questions.

Loss become NaN. Results mismatch between different convolution algorithms.

When running on GPUs, loss becomes NaN.
You can find the training log here.

Transfer prompt parameter during training process.

Regarding transferring previous learned prompt params to the new prompt

Hi @KingSpencer @zizhaozhang , I have a doubt regarding prompt pool initialization in every new task, inside train_continual.py > train_and_evaluate_per_task(). Why do you transfer previously learned prompt keys to the new prompt parameters in prompt pool and prompt key (lines 550-584)?

Based on my understanding from the paper, the prompt pool and prompt key are shared across tasks and the method learns to select the relevant ones using query function-key matching. So if a subsequent incremental task is initializing prompts from the previous step, why are the prompts being shifted? Are new prompts being added at every step or am I missing something?

Any help will be appreciated.

ReadMe issues

I am trying to reproduce the results of the tables included in your nice paper. And I don't think that the code you have provided allows me to reproduce the results of Table 3.
Table 3 uses CORe50, and I was curious to know when the code that produces the results of Table 3 will be available.

Sorry to take your time with this, and hope you are doing well.

How to train and test my data? And How to create my config file? Thanks

It's nice to have such a great work, but I'm at a loss to train my own data and ask for your help

Method issue

Hi,

I am trying to do some experiments using l2p and there's something a little strange about l2p.

According to the contents of the paper and my understanding, the prompts in the prompt pool should be selected evenly for each task.
However, when I look at the prompt index and the Tensorboard histogram, it seems that only a few prompts are learned.
Only the 3rd, 6th, 7th, and 9th prompts are used in the official code of the reproduce result.

Did I misunderstand? or was the code implemented incorrectly?

I look forward to hearing from you.

Best.
Jaeho Lee.

Possible information leakage from pretrained model

Dear author,

Thank you for your excellent work!

I am a little curious about the pretrained model, it is trained on the entire ImageNet-21k dataset, and is fixed during training. But will this lead to information leakage?

Take the class incremental setting as an example, I think all 100 classes of CIFAR100 can be found in ImageNet-21k so it is possible that the model has already learned all the features necessary for CIFAR100. But in practice, the model is expected to learn new features. We can not assume the classes in new tasks have already been observed by the backbone, right?

Have you tried to remove CIFAR100 classes in ImageNet and pretrained a model or evaluate the model on some datasets disjoint with ImageNet?

Thank you very much!

This code can not run in requirement.txt, please check again!

This code can not run in requirement.txt,

jaxlib and cuda and cudd is fine in my ubuntu.

please check again!

Evaluation metrics on CORe50

Is test accuracy obtained on the remaining 3 sessions (#3，#7 and #10) after completing 8 sessions training sequentially? and is the average of 10 runs calculated?

Bug in classifier?

Hi, thanks for your great work. In configs, I see use_cls_token = True and vit_classifier = "prompt".
While in vit.py line 447,

    elif self.classifier == 'prompt':
      x = x[:, 0:total_prompt_len]

Maybe a token is missed since cls_token is the first token?

Question regarding on FT-seq-Frozen

In 'Learning to prompt for continual learning' paper, I understand 'FT-seq-Frozen' in Table 1 as a naive prompt tuning at the input token feature.

To implement the FT-seq-Frozen setting in CIFAR100, I set prompt pool_size as 1.
The result shows Acc@1 81.49 with Forgetting 6.3667.

Any point that I missed?
How did you set the hyperparamters for FT-seq-Frozen?
Specifically, did you set the argument 'train_mask = False' for FT-seq-Frozen?

reproducing 5-datasets with dualprompt

I applied the dual prompt for 5-datasets.
Other settings are identical with the l2p case:

num_epochs=5
lenght=10
pull_constraint_coeff=0.5

The result was acc@1 75.5033 | forgetting 14.9538,
which is much degraded than the result in the paper, acc@1 88.08 | forgetting 2.21

what are the settings for 5-datasets for dual prompt?

How to reproduce the results with replay?

Hi, thanks for the great work!

With the script you provided, I successfully reproduced the L2P result (of Split CIFAR-100 dataset) without replay.

I am now trying to reproduce the results with replays, but even using replay buffer storing 50 samples/class, I got much lower results (acc 80.10/forgetting 9.13) compared to the reported ones (acc 86.31/forgetting 5.83).

It doesn't make sense that using replay leads to higher forgetting. I guess I might miss something.

Since there is no examples how to use replays in given configuration file (i.e., cifar100_l2p.py), I added some lines to handle replays as below.

  # add for replay
  config.continual.replay = ml_collections.ConfigDict()
  config.continual.replay.num_samples_per_class = 50
  config.continual.replay_no_mask = True
  config.continual.replay_reverse_mask = False
  config.continual.replay.include_new_task = True
  config.continual.review_trick = False
  config.continual.num_review_steps = -1
  config.continual.num_review_epochs = 5

I guess review_trick is for fine-tuning the model with balanced dataset.

Strangely, when I set review_trick=True, I got much lower result (especially, very low learning accuracy).

And when I set review_trick=False, then the model is kept updating with replays, but still it shows much low accuracy (acc 80.10/forgetting 9.13)

Do you have any advice on what I am missing or where to modify in your code?

Or can you share the correct configuration files to reproduce the result of Table 1 using replay buffer?

Thank you.

请问您的模型是否是一个freeze的vit提取query feature，然后再用相同的Vit进行继续训练？

Confusion about the ImageNet-R dataset

Hi, Thanks for your nice work!

I am currently trying to follow your Split ImageNet-R benchmark but encounter some problems.
Firstly, I am not able to download the ImageNet-R dataset. I've got errors like this.
2022-10-25 16:43:39.764732: E tensorflow/core/platform/cloud/curl_http_request.cc:614] The transmission of request 0x668a800 (URI: https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Fimagenet_r%2F0.2.0?fields=size%2Cgeneration%2Cupdated) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 0.004201 (No error), connect time: 0 (No error), pre-transfer time: 0 (No error), start-transfer time: 0 (No error)
It seems like a networking issue, so I access the uri(https://www.googleapis.com/storage/v1/b/tfds-data/o/dataset_info%2Fimagenet_r%2F0.2.0?fields=size%2Cgeneration%2Cupdated) through my web browser, but I got such a info:

{ "error": { "code": 404, "message": "No such object: tfds-data/dataset_info/imagenet_r/0.2.0", "errors": [ { "message": "No such object: tfds-data/dataset_info/imagenet_r/0.2.0", "domain": "global", "reason": "notFound" } ] } }

It seems that the dataset is not accessible at this time.

Secound, I tried to understand the codes you priveded in libml/input_pipeline.py and try to understand 2 things: (1) How to split the training and testing set from the original dataset. (2) What is the training order of the classes in continual learning tasks.

For (1), I found that the codes using TFDS split, in Line414-424 in input_pipeline.py.
I am not familiar with TFDS, Is this a stratified sampling (based on classes) or just a split based on the whole data?

For (2), I found that the codes permutate class order here, if the config.continual.rand_seed is set. However in configs/imr_dualprompt.py, config.continual.rand_seed is set to $-1$. Does this mean that the dualprompt conducts experiments in the order of natural numbers？ (i.e,. 0, 1, 2, 3, ...)

Hope you counld kindly help me :)

Looking forward to your early reply and thanks again!

google-research / l2p Goto Github PK

l2p's Issues

Recommend Projects

Recommend Topics

Recommend Org