Comments (5)
You can simply add --resume ./checkpoint/nextqa/checkpoint_best.pth
to your running command to resume the training or run inference.
from flipped-vqa.
I did tried with the --resume option before posting but it trows an error about missing keys in state_dict loaded from misc.load_model:
Steps to replicate
torchrun --rdzv_endpoint 127.0.0.1:1234 --nproc_per_node 1 train.py --model 7B --max_seq_len 128 --batch_size 8 --epochs 5 --warmup_epochs 2 --bias 3.5 --tau 100. --max_feats 10 --dataset nextqa --blr 9e-2 --weight_decay 0.14 --output_dir ./checkpoint/nextqa --accum_iter 2 --vaq --qav --resume checkpoint/nextqa/checkpoint_best.pth
Actual Behavior
Traceback (most recent call last):
File "train.py", line 163, in <module>
main(args)
File "train.py", line 121, in main
misc.load_model(args=args, model_without_ddp=model_without_ddp, optimizer=optimizer, loss_scaler=loss_scaler)
File "/home/admin-guest/Documents/multimodal-ml/iqui/Flipped-VQA/util/misc.py", line 302, in load_model
model_without_ddp.load_state_dict(checkpoint['model'])
File "/home/admin-guest/anaconda3/envs/flippedvqa_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1482, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Transformer:
Missing key(s) in state_dict: "tok_embeddings.weight", "layers.0.attention.wq.weight", "layers.0.attention.wk.weight", "layers.0.attention.wv.weight", "layers.0.attention.wo.weight", "layers.0.feed_forward.w1.weight", "layers.0.feed_forward.w2.weight", "layers.0.feed_forward.w3.weight", "layers.0.attention_norm.weight", "layers.0.ffn_norm.weight", "layers.1.attention.wq.weight", "layers.1.attention.wk.weight", "layers.1.attention.wv.weight", "layers.1.attention.wo.weight", "layers.1.feed_forward.w1.weight", "layers.1.feed_forward.w2.weight"...
from flipped-vqa.
Saved checkpoint only contains trainable parameters, so you need to set strict=False
in model_without_ddp.load_state_dict(checkpoint['model'], strict=False)
.
Please pull our code again.
from flipped-vqa.
Saved checkpoint only contains trainable parameters, so you need to set
strict=False
inmodel_without_ddp.load_state_dict(checkpoint['model'], strict=False)
. Please pull our code again.
I tried it, but it still doesnโt work. If I directly load the weight file after finetune without changing the epoch, the process will be terminated directly and no inference will be performed.
from flipped-vqa.
This is because the loaded epoch of the checkpoint is the same as the total epoch.
You may load checkpoint without the epoch or increasing the total epoch.
from flipped-vqa.
Related Issues (19)
- How to extract features using CLIP VIT-L? HOT 2
- Error when training with TVQA dataset: AttributeError in DataLoader worker process HOT 1
- From where to download LLaMA-v1 model? HOT 3
- Checkpoints HOT 1
- ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 3 (pid: 55662) of binary: /usr/bin/python3 HOT 3
- What is the function of the parameter `max_feats`? HOT 1
- A question about QAV task in the code HOT 2
- How many GPUs are needed to train the model? HOT 2
- Concerns and Clarifications Regarding MCQ to Generation Task Conversion HOT 3
- Not getting the reported number. HOT 4
- Cannot reproduce the result HOT 2
- finetuned using lamma-13B HOT 3
- Number of frames and its use in code and max_feats10 for video feature HOT 1
- meaning of qav loss HOT 1
- about self.gate2 HOT 1
- need llama-13B finetuned checkpoints
- What are the average stats being reported at the end of every epoch in training? HOT 2
- How was the STAR dataset preprocessed for this code HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flipped-vqa.