japanese-alpaca-lora's Introduction
japanese-alpaca-lora's People
Forkers
as-nobu yas isichan0501 pjahad ttttdiva techthiyanes colspan tweetyukky takeyarise epy0n0ff yosshi-git swltown kenichiinouejapanese-alpaca-lora's Issues
llama-13bใซๅคๆดใใใจ "RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:" ใๅบใ
Japanese-Alpaca-LoRAใฎๅ
ฌ้ใใใใจใใใใใพใใ
ๆฉ้colab (pro, GPU VRAM40G)ไธใง่ฉฆใใฆใใพใใllama-7b
ใฏใใฎใพใพใงใๅใใใฎใงใใใไปฅไธใฎๆงใซ llama-13b
ใซๅใๆฟใใใจใใใRuntimeErrorใ็บ็ใใพใใใ
ใชใฝใผใน็ถๆณใ่ฆใฆใใใกใขใชใผใซใคใใฆใฏไฝ่ฃใใพใ ใพใ ใใใใใงใใใ
llamaใฎใขใใซใตใคใบใๅใๆฟใใ้ใซใไปใซใไฟฎๆญฃ็ฎๆใใใใฐๆใใฆใใใ ใใใใงใใ
็ฐๅข
ไฟฎๆญฃ็ฎๆ
# colab proไปฅไธใงใฎใใฉใณใงA100ใไฝฟ็จใใชใใจๅใใชใใใ
# BASE_MODEL = "decapoda-research/llama-7b-hf"
BASE_MODEL = "decapoda-research/llama-13b-hf"
# BASE_MODEL = "decapoda-research/llama-30b-hf"
# BASE_MODEL = "decapoda-research/llama-65b-hf"
tokenizer = LlamaTokenizer.from_pretrained(BASE_MODEL,device_map={'': 0})
# LORA_WEIGHTS = "kunishou/Japanese-Alpaca-LoRA-7b-v0"
LORA_WEIGHTS ="kunishou/Japanese-Alpaca-LoRA-13b-v0"
# LORA_WEIGHTS = "kunishou/Japanese-Alpaca-LoRA-30b-v0"
# LORA_WEIGHTS = "kunishou/Japanese-Alpaca-LoRA-65b-v0"
Errorๅ ๅฎน
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
Errorๅ จๆ
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.9/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/sys/fs/cgroup/memory.events /var/colab/cgroup/jupyter-children/memory.events')}
warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('--listen_host=172.28.0.12 --target_host=172.28.0.12 --tunnel_background_save_url=https'), PosixPath('//colab.research.google.com/tun/m/cc48301118ce562b961b3c22d803539adc1e0c19/gpu-a100-s-396jeoh5eio6u --tunnel_background_save_delay=10s --tunnel_periodic_background_save_frequency=30m0s --enable_output_coalescing=true --output_coalescing_required=true')}
warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
warn(msg)
/usr/local/lib/python3.9/dist-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//ipykernel.pylab.backend_inline')}
warn(msg)
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization.
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'.
The class this function is called from is 'LlamaTokenizer'.
Loading checkpoint shards: 100%
41/41 [02:49<00:00, 3.92s/it]
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Traceback (most recent call last) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ in <module>:53 โ
โ โ
โ /usr/local/lib/python3.9/dist-packages/peft/peft_model.py:161 in from_pretrained โ
โ โ
โ 158 โ โ โ filename, map_location=torch.device("cuda" if torch.cuda.is_available() else โ
โ 159 โ โ ) โ
โ 160 โ โ # load the weights into the model โ
โ โฑ 161 โ โ model = set_peft_model_state_dict(model, adapters_weights) โ
โ 162 โ โ if getattr(model, "hf_device_map", None) is not None: โ
โ 163 โ โ โ device_map = kwargs.get("device_map", "auto") โ
โ 164 โ โ โ max_memory = kwargs.get("max_memory", None) โ
โ โ
โ /usr/local/lib/python3.9/dist-packages/peft/utils/save_and_load.py:74 in โ
โ set_peft_model_state_dict โ
โ โ
โ 71 โ โ peft_model_state_dict (`dict`): The state dict of the Peft model. โ
โ 72 โ """ โ
โ 73 โ โ
โ โฑ 74 โ model.load_state_dict(peft_model_state_dict, strict=False) โ
โ 75 โ if model.peft_config.peft_type != PeftType.LORA: โ
โ 76 โ โ model.prompt_encoder.embedding.load_state_dict( โ
โ 77 โ โ โ {"weight": peft_model_state_dict["prompt_embeddings"]}, strict=True โ
โ โ
โ /usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1671 in load_state_dict โ
โ โ
โ 1668 โ โ โ โ โ โ ', '.join('"{}"'.format(k) for k in missing_keys))) โ
โ 1669 โ โ โ
โ 1670 โ โ if len(error_msgs) > 0: โ
โ โฑ 1671 โ โ โ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( โ
โ 1672 โ โ โ โ โ โ โ self.__class__.__name__, "\n\t".join(error_msgs))) โ
โ 1673 โ โ return _IncompatibleKeys(missing_keys, unexpected_keys) โ
โ 1674 โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.1.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.1.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.1.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.1.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.2.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.2.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.2.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.2.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.3.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.3.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.3.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.3.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.4.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.4.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.4.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.4.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.5.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.5.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.5.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.5.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.6.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.6.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.6.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.6.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.7.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.7.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.7.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.7.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.8.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.8.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.8.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.8.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.9.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.9.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.9.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.9.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.10.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.10.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.10.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.10.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.11.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.11.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.11.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.11.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.12.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.12.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.12.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.12.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.13.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.13.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.13.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.13.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.14.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.14.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.14.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.14.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.15.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.15.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.15.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.15.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.16.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.16.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.16.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.16.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.17.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.17.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.17.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.17.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.18.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.18.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.18.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.18.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.19.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.19.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.19.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.19.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.20.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.20.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.20.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.20.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.21.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.21.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.21.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.21.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.22.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.22.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.22.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.22.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.23.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.23.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.23.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.23.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.24.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.24.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.24.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.24.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.25.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.25.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.25.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.25.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.26.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.26.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.26.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.26.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.27.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.27.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.27.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.27.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.28.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.28.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.28.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.28.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.29.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.29.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.29.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.29.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.30.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.30.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.30.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.30.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.31.self_attn.q_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.31.self_attn.q_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_A.weight: copying a param with
shape torch.Size([8, 4096]) from checkpoint, the shape in current model is torch.Size([8, 5120]).
size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_B.weight: copying a param with
shape torch.Size([4096, 8]) from checkpoint, the shape in current model is torch.Size([5120, 8]).
multi-gpus training for 65B model
Could you provide a tutorial on how to train a 65B lora-alpaca using multiple gpus?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.