sun-hailong / lamda-pilot Goto Github PK

🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox

License: MIT License

Python 100.00%

machine-learning continual-learning deep-learning incremental-learning pre-trained-models vision-language-model vision-transformer reproducible-research lifelong-learning pytorch

lamda-pilot's Introduction

Hi there, I'm Hai-Long 👋

😊 I’m a first-year master student in the LAMDA Group, Nanjing University (NJU).
🎓 My research interests include Machine Learning and Data Mining. Currently, I focus on Pre-trained Model-based Class-Incremental Learning and Continual Learning with Multimodal Large Language Models.
🔭 Find more information about me on my personal page.
📫 Contact me via [email protected], welcome to collaborate and communicate!

lamda-pilot's People

Contributors

Stargazers

Watchers

lamda-pilot's Issues

Question on the results of L2P on imagenet-r

I performed L2P method on ImageNet-R using the same pretrained model with oringial L2P, i.e., ViT_B_16 (https://storage.googleapis.com/vit_models/imagenet21k/ViT-B_16.npz). I used the default config except for the pretrained model. The results on the last task are as follows,

2024-01-11 21:34:33,538 [trainer.py] => All params: 171970192
2024-01-11 21:34:33,540 [trainer.py] => Trainable params: 199880
2024-01-11 21:34:33,540 [l2p.py] => Learning on 180-200
2024-01-11 21:41:09,156 [l2p.py] => Task 9, Epoch 10/10 => Loss 0.115, Train_accy 91.08, Test_accy 72.87
2024-01-11 21:41:55,918 [trainer.py] => No NME accuracy.
2024-01-11 21:41:55,918 [trainer.py] => CNN: {'total': 72.87, '00-19': 78.68, '20-39': 73.56, '40-59': 71.26, '60-79': 71.35, '80-99': 66.67, '100-119': 68.69, '120-139': 72.25, '140-159': 79.0, '160-179': 67.79, '180-199': 75.52, 'old': 72.59, 'new': 75.52}
2024-01-11 21:41:55,918 [trainer.py] => CNN top1 curve: [90.44, 84.38, 80.3, 79.0, 77.01, 75.53, 74.84, 74.34, 73.95, 72.87]
2024-01-11 21:41:55,918 [trainer.py] => CNN top5 curve: [97.94, 96.11, 94.58, 92.54, 91.6, 90.87, 90.42, 90.02, 89.04, 88.82]

2024-01-11 21:41:55,919 [trainer.py] => Average Accuracy (CNN): 78.266 

2024-01-11 21:41:55,921 [trainer.py] => Accuracy Matrix (CNN):
[[90.44 87.35 86.32 86.03 84.85 83.24 83.24 80.29 79.71 78.68]
 [ 0.   81.31 78.57 78.72 77.66 74.47 74.01 72.34 72.49 68.69]
 [ 0.    0.   75.22 76.76 76.08 77.36 73.6  72.83 73.03 72.25]
 [ 0.    0.    0.   73.13 73.31 76.42 76.9  80.05 79.66 79.  ]
 [ 0.    0.    0.    0.   70.81 70.46 74.7  73.4  70.55 67.79]
 [ 0.    0.    0.    0.    0.   68.84 70.28 73.49 73.56 75.52]
 [ 0.    0.    0.    0.    0.    0.   68.44 70.46 73.15 73.56]
 [ 0.    0.    0.    0.    0.    0.    0.   68.44 71.   71.26]
 [ 0.    0.    0.    0.    0.    0.    0.    0.   68.64 71.35]
 [ 0.    0.    0.    0.    0.    0.    0.    0.    0.   66.67]]
2024-01-11 21:41:55,921 [trainer.py] => Forgetting (CNN): 4.16111111111111

By the metrics in the original L2P or DualPrompt, the Average Accuracy (At) (mean of the last column in Accuracy Matrix (CNN), see page 20 in DualPrompt ) of L2P or DualPrompt should be 61.57% and 68.13%. However, By PILOT, the result if L2P is 72.48%.

I am wondering why the performance by PILOT is so good? Did I calculate the metric incorrectly or miss something?

Notably, the result will increase to 73+% by using IN21K pretrained and IN1k fine-tuned ViT_B_16 (vit_base_patch16_224.augreg2_in21k_ft_in1k in timm).

Please ignore the error caused by the 'sorted' function becuase i was using the old version of PILOT, and I found this bug has been fixed in the new version.

Thank you for your reply.

How to Using the Toolbox with ImageNet1k Pre-training

Thank you for the excellent toolbox on pre-trained model-based continual learning. I've gone through the README.md and inspected the provided files. Could you guide me on how to use the model on pre-trained ImageNet1k? It seems straightforward to run the model retrained on ImageNet21k, yielding impressive results.

imageneta-B0-Inc20

Pretrained Model Loading Error

Hi, I experienced the same error when I run the experiment on L2P, DualPrompt, and CodaPrompt. How can I fix that?

Traceback (most recent call last):
File "/data/hdc/jinglong/LAMDA-PILOT/main.py", line 25, in
main()
File "/data/hdc/jinglong/LAMDA-PILOT/main.py", line 11, in main
train(args)
File "/data/hdc/jinglong/LAMDA-PILOT/trainer.py", line 18, in train
_train(args)
File "/data/hdc/jinglong/LAMDA-PILOT/trainer.py", line 62, in _train
model = factory.get_model(args["model_name"], args)
File "/data/hdc/jinglong/LAMDA-PILOT/utils/factory.py", line 34, in get_model
return Learner(args)
File "/data/hdc/jinglong/LAMDA-PILOT/models/l2p.py", line 20, in init
self._network = PromptVitNet(args, True)
File "/data/hdc/jinglong/LAMDA-PILOT/utils/inc_net.py", line 517, in init
self.backbone = get_backbone(args, pretrained)
File "/data/hdc/jinglong/LAMDA-PILOT/utils/inc_net.py", line 100, in get_backbone
model = timm.create_model(
File "/data/hdc/jinglong/anaconda3/envs/torch2/lib/python3.9/site-packages/timm/models/_factory.py", line 114, in create_model
model = create_fn(
File "/data/hdc/jinglong/LAMDA-PILOT/backbone/vision_transformer_l2p.py", line 810, in vit_base_patch16_224_l2p
model = _create_vision_transformer('vit_base_patch16_224', pretrained=pretrained, **model_kwargs)
File "/data/hdc/jinglong/LAMDA-PILOT/backbone/vision_transformer_l2p.py", line 723, in _create_vision_transformer
pretrained_custom_load='npz' in pretrained_cfg['url'],
TypeError: 'PretrainedCfg' object is not subscriptable

Code Support of "Learning without Forgetting for Vision-Language Models" (PROOF)

Thanks authors for releasing this fantastic toolbox! This great work saves my life! I'm just wondering whether there will be support for a recent work also from you on "Learning without Forgetting for Vision-Language Models" (PROOF)? It would be great if you could also release support for this wonderful work also.

Many thanks in advance!

Error when using multiple GPUs

When I'm running the l2p.json experiment and use 4 GPUs (i.e. cuda:5, cuda:6, cuda:7, cuda:8 on my machine), the program throws this exception:

AttributeError: 'DataParallel' object has no attribute 'backbone'

How to fix it? I have never experienced multiple GPUs running. Look forward to your response.

Seems that Dual-Prompt's implement has some issues

In backbone/vision_transformer_dual_prompt.py, line 250, it seems that args "g_prompt" isn't concatenated with x, and doesn't participate in forward propagation. Maybe I made a mistake, but as stated in the original paper, g_prompt needs to be update, and e_prompt has the similar problem. I'm looking forward to your response. Thank you very much.

Running dualprompt

Hi,
first of all, thanks for your great work!

I was running into errors when trying to run the code with python main.py --config=./exps/dualprompt.json:

File "MA-Pilot-MM/backbone/vit_dualprompt.py", line 856, in _create_vision_transformer pretrained_custom_load='npz' in pretrained_cfg['url'], TypeError: 'PretrainedCfg' object is not subscriptable -> it seems that the call should be pretrained_cfg.url, which is also a problem for the coda_prompt.json
from backbone.prompt import EPrompt - No module named 'backbone' -> __init__.py is missing in backbone/
File "MA-Pilot-MM/backbone/vit_dualprompt.py", line 852, in _create_vision_transformer model = build_model_with_cfg( File "/home/users1/ostertms/.local/lib/python3.12/site-packages/timm/models/_builder.py", line 398, in build_model_with_cfg model = model_cls(**kwargs) TypeError: VisionTransformer.__init__() got an unexpected keyword argument 'pretrained_custom_load'
File "/home/users1/ostertms/.local/lib/python3.12/site-packages/torchvision/datasets/folder.py", line 41, in find_classes classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir() FileNotFoundError: [Errno 2] No such file or directory: './data/imagenet-r/train/'

Maybe you have stronger requirements on the package/python versions?

My set-up: I cloned the repo and then installed the requirements as described in the README.md in a conda environment with python=3.12.

Best,
Magnus

Pretrained ResNet

Hello,
I really enjoy working with this great framework, thank you so much.
I'm interested in utilizing the ResNet architecture as well. Would you be able to provide the pre-trained ResNet implementation?
Thank you in advance.

memo method: TypeError: argument of type 'PretrainedCfg' is not iterable

Hello,

Thank you for such an amazing framework.
I can run Foster and DER methods without any error but encounter an issue when I try to run the 'memo' method.
I would be so happy if you could help me with this.
Thank you so much.

`File "C:\PycharmProjects\LAMDA-PILOT\main.py", line 25, in
main()

File "C:\PycharmProjects\LAMDA-PILOT\main.py", line 11, in main
train(args)

File "C:\PycharmProjects\LAMDA-PILOT\trainer.py", line 19, in train
_train(args)

File "C:\PycharmProjects\LAMDA-PILOT\trainer.py", line 63, in _train
model = factory.get_model(args["model_name"], args)

File "C:\PycharmProjects\LAMDA-PILOT\utils\factory.py", line 36, in get_model
return Learner(args)

File "C:\PycharmProjects\LAMDA-PILOT\models\memo.py", line 23, in init
self._network = AdaptiveNet(args, True)

File "C:\PycharmProjects\LAMDA-PILOT\utils\inc_net.py", line 829, in init
self.TaskAgnosticExtractor , _ = get_backbone(args, pretrained) #Generalized blocks

File "C:\PycharmProjects\LAMDA-PILOT\utils\inc_net.py", line 25, in get_backbone
_basenet, _adaptive_net = timm.create_model("vit_base_patch16_224_memo", pretrained=True, num_classes=0)

File "C:\Anaconda3\envs\yildirimceren\lib\site-packages\timm\models_factory.py", line 117, in create_model
model = create_fn(

File "C:\PycharmProjects\LAMDA-PILOT\backbone\vision_transformer_memo.py", line 896, in vit_base_patch16_224_memo
base_model = _create_vision_transformer_base('vit_base_patch16_224', pretrained=pretrained, **model_kwargs)

File "C:\PycharmProjects\LAMDA-PILOT\backbone\vision_transformer_memo.py", line 867, in _create_vision_transformer_base
pretrained_custom_load='npz' in pretrained_cfg,

TypeError: argument of type 'PretrainedCfg' is not iterable
`

When I try to use non-continuous learning to obtain “Upper-Bound” data

"prefix": "reproduce",
"dataset": "imagenetr",
"memory_size": 0,
"memory_per_class": 0,
"fixed_memory": false,
"shuffle": true,
"init_cls": 200,
"increment": 0,
"model_name": "finetune",
"backbone_type": "vit_base_patch16_224",
"device": ["1"],
"seed": [1993],
"init_epoch": 20,
"init_lr": 1e-3,
"init_milestones": [60, 120, 170],
"init_lr_decay": 0.1,
"init_weight_decay": 0.0005,
"epochs": 20,
"lrate": 1e-3,
"milestones": [40, 70],
"lrate_decay": 0.1,
"batch_size": 128,
"weight_decay": 2e-4

As you can see, when I set init_cls=200 and incremental=0 (which aligns with our intuition in non continuous learning scenarios).
But a bug occurred:

Traceback (most recent call last):
File "/media/user/data1/ldz cil/clip4prompt/main. py", line 33, in
Main()
File "/media/user/data1/ldz cil/clip4prompt/main. py", line 19, in main
Train (args)
File "/media/user/data1/ldz cil/clip4prompt/trainer. py", line 19, in train
Train (args)
File "/media/user/data1/ldz cil/clip4prompt/trainer. py", line 77, in train
Cnn.accy, nme-accy=model. eval_task()
File "/media/user/data1/ldz cil/clip4prompt/models/base. py", line 118, in eval_task
Cnn.accy=self Evaluate (y_pred, y_true)
File "/media/user/data1/ldz cil/clip4prompt/models/base. py", line 106, in _evaluate
Grouped=accuracy (y_pred. T [0], y_true, self. _knowledge-based classes, self. args ["init_cls"], self. args ["increment"])
File "/media/user/data1/ldz cil/clip4prompt/utils/toolkit. py", line 121, in accuracy
For class_id in range (init_cls, np. max (y_true), increase):
Value Error: range() arg 3 must not be zero

Is this setting correct? Or rather, the author did not provide an interface to retrieve Upper data

Final average accuracy not printed in logs

Hi @sun-hailong,

At the end of a run, the metrics (in this case for exps/simplecil_inr.json) are printed out like this:

2024-01-18 10:00:30,251 [trainer.py] => No NME accuracy.
2024-01-18 10:00:30,251 [trainer.py] => CNN: {'total': 61.28, '00-19': 60.96, '20-39': 61.39, '40-59': 62.38, '60-79': 63.4, '80-99': 56.06, '100-119': 60.45, '120-139': 63.44, '140-159': 62.41, '160-179': 61.3, '180-199': 60.1, 'old': 61.41, 'new': 60.1}
2024-01-18 10:00:30,251 [trainer.py] => CNN top1 curve: [78.96, 72.14, 70.11, 68.29, 66.0, 64.42, 64.05, 63.08, 62.28, 61.28]
2024-01-18 10:00:30,251 [trainer.py] => CNN top5 curve: [95.36, 90.01, 86.4, 84.07, 82.35, 80.8, 79.9, 78.94, 77.75, 76.68]

Average Accuracy (CNN): 67.061
2024-01-18 10:00:30,251 [trainer.py] => Average Accuracy (CNN): 67.061 

Accuracy Matrix (CNN):
[[78.96 75.33 71.41 68.94 67.2  65.02 64.59 62.84 61.39 60.96]
 [ 0.   68.67 66.46 65.66 65.35 64.56 63.77 62.5  62.34 61.39]
 [ 0.    0.   72.44 69.64 66.83 65.68 65.02 64.69 63.37 62.38]
 [ 0.    0.    0.   69.   67.78 65.85 65.5  64.97 64.1  63.4 ]
 [ 0.    0.    0.    0.   62.01 60.78 59.34 57.7  56.88 56.06]
 [ 0.    0.    0.    0.    0.   63.94 63.18 61.67 61.06 60.45]
 [ 0.    0.    0.    0.    0.    0.   66.26 65.55 64.15 63.44]
 [ 0.    0.    0.    0.    0.    0.    0.   63.92 63.37 62.41]
 [ 0.    0.    0.    0.    0.    0.    0.    0.   63.18 61.3 ]
 [ 0.    0.    0.    0.    0.    0.    0.    0.    0.   60.1 ]]
2024-01-18 10:00:30,251 [trainer.py] => Forgetting (CNN): 6.2877777777777775

The issue is that none of the printed/logged metrics include "Final average accuracy", which is the most often used CIL metric.

The printed "Average Accuracy" is the average of the final CNN top 1 curve. But this is defined quite differently to "Final average accuracy"
The "Final Average Accuracy" is easy to compute from the information you do print: its just the average of your task split values in the CNN dict, or equivalently the average of the final column in "Accuracy Matrix (CNN)"
In this example the Final Average Accuracy calculated from the accuracy matrix is 61.189%
- often its close to the last value in the top-1 curve, but not always, depending on the dataset

It would be appreciated if you could change what gets printed out so that comparisons with values reported in the literature are much more easily made.

As an aside, "Accuracy Matrix (CNN)" is only printed to the terminal, and not logged to the log file. It would be good to have it in the log file.

sun-hailong / lamda-pilot Goto Github PK

lamda-pilot's Introduction

Hi there, I'm Hai-Long 👋

lamda-pilot's People

Contributors

Stargazers

Watchers

Forkers

lamda-pilot's Issues

Recommend Projects

Recommend Topics

Recommend Org