Comments (18)
But I print the model.embeddings.token_type_embeddings it was Embedding(16,768) .
from transformers.
which model are you loading?
from transformers.
which model are you loading?
the pre-trained model chinese_L-12_H-768_A-12
from transformers.
mycode:
bert_config = BertConfig.from_json_file('bert_config.json')
model=BertModel(bert_config)
model.load_state_dict(torch.load('pytorch_model.bin'))
The error:
RuntimeError: Error(s) in loading state_dict for BertModel:
size mismatch for embeddings.token_type_embeddings.weight: copying a param of torch.Size([16, 768]) from checkpoint, where the shape is torch.Size([2, 768]) in current model.
from transformers.
I'm testing the chinese model.
Do you use the config.json
of the chinese_L-12_H-768_A-12 ?
Can you send the content of your config_json
?
from transformers.
I'm testing the chinese model.
Do you use theconfig.json
of the chinese_L-12_H-768_A-12 ?
Can you send the content of yourconfig_json
?
In the 'config.json' of the chinese_L-12_H-768_A-12 ,the type_vocab_size=2.But I change the config.type_vocab_size=16, it still error.
from transformers.
I'm testing the chinese model.
Do you use theconfig.json
of the chinese_L-12_H-768_A-12 ?
Can you send the content of yourconfig_json
?
{
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 21128
}
I change my code:
bert_config = BertConfig.from_json_file('bert_config.json')
bert_config.type_vocab_size=16
model=BertModel(bert_config)
model.load_state_dict(torch.load('pytorch_model.bin'))
it still error.
from transformers.
I see you have
"type_vocab_size": 2
in your config file, how is that?
Yes,but I change it in my code.
from transformers.
is your
pytorch_model.bin
the good converted model of the chinese one (and not of an English one)?
I think it's good.
from transformers.
Ok, I have the models. I think type_vocab_size
should be 2 also for chinese. I am wondering why it is 16 in your pytorch_model.bin
from transformers.
I have no idea.Did my model make the wrong convert?
from transformers.
I am testing that right now. I haven't played with the multi-lingual models yet.
from transformers.
I am testing that right now. I haven't played with the multi-lingual models yet.
I also use it for the first time.I am looking forward to your test results.
from transformers.
I am testing that right now. I haven't played with the multi-lingual models yet.
When I was converting the model .
Traceback (most recent call last):
File "convert_tf_checkpoint_to_pytorch.py", line 95, in
convert()
File "convert_tf_checkpoint_to_pytorch.py", line 85, in convert
assert pointer.shape == array.shape
AssertionError: (torch.Size([16, 768]), (2, 768))
from transformers.
are you supplying a config file with "type_vocab_size": 2
to the conversion script?
from transformers.
are you supplying a config file with
"type_vocab_size": 2
to the conversion script?
I used the 'bert_config.json' of the chinese_L-12_H-768_A-12 when I was converting .
from transformers.
Ok, I think I found the issue, your BertConfig is not build from the configuration file for some reason and thus use the default value of type_vocab_size
in BertConfig which is 16.
This error happen on my system when I use config = BertConfig('bert_config.json')
instead of config = BertConfig.from_json_file('bert_config.json')
.
I will make sure these two ways of initializing the configuration file (from parameters or from json file) cannot be messed up.
from transformers.
运行时错误:加载 BertModel state_dict时出错:embeddings.token_type_embeddings 的大小不匹配.weight:
复制火炬参数。大小([16, 768]) 从检查点开始,其中形状为火炬。当前模型中的大小([2, 768]
i have the same problem as you. did you solve the problem?
from transformers.
Related Issues (20)
- The chatml template doesn't have a system message in the tokenizer_config.json HOT 3
- Error during inference of SpeechT5 model for TTS HOT 1
- flash-attention is not running, although is_flash_attn_2_available() returns true HOT 2
- Llama3 with LlamaForSequenceClassification - Shape mismatch error HOT 2
- Cannot replicate results from object detection task guide HOT 9
- Idefics2 fine-tuning: Error when unscale_gradients called on FP16 gradients during training with transformers and accelerate HOT 3
- Wav2Vec2CTCTokenizer adds random unknown tokens to encoded input HOT 1
- MLFlowCallback MLFLOW_RUN_ID not used HOT 1
- Correct check for SDPA in Vision Language Models HOT 1
- KeyError Issue Reason HOT 1
- ValueError Reason HOT 7
- Issue with `output_router_logits` Parameter Not Being Passed Correctly in `SwitchTransformersForConditionalGeneration`
- Make fx traced model with the use of `past_key_values` pickable again?
- Community contribution: enable dynamic resolution input for more vision models. HOT 5
- mistralai/Mixtral-8x7B-v0.1 bfloat16 much slower than FP32 on Intel EMR CPU HOT 3
- ReadTimeOutError with from_pretrained for some model checkpoints only HOT 1
- The Phi-3 tokenizer is not inverting the chat template correctly. HOT 3
- i cannot find the code that transformers trainer model_wrapped by deepspeed , i can find the theory about model_wrapped was wraped by DDP(Deepspeed(transformer model )) ,but i only find the code transformers model wrapped by ddp, where is the deepspeed wrapped ? thanks ^-^ HOT 4
- Whisper Translation on low resource languages
- Cache in different devices when use split model with dispatch_model() function and model.generate()
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.