longyuewangdcu / chinese-llama-2 Goto Github PK

improve Llama-2's proficiency in comprehension, generation, and translation of Chinese.

Shell 0.07% Python 90.84% Makefile 0.01% Dockerfile 0.04% Jsonnet 0.01% MDX 8.61% C++ 0.03% Cuda 0.38% Cython 0.01% C 0.01%

chinese-llama large-language-models llama-2

chinese-llama-2's Introduction

Hi there 👋

🔭 I’m a senior researcher at Tencent AI Lab
🌱 I got my Ph.D. degree in 2018
👯 I practiced in a broad field of NLP
🤔 I’m interested in MT and DL
😄 I have some internship positions
⚡ I like to participate academic competition
💬 My homepage is http://longyuewang.com
📫 Contact me [email protected]

Star History

chinese-llama-2's People

Contributors

Stargazers

Watchers

chinese-llama-2's Issues

Missing test/inference.py

It seems like there is no "test/inference.py" file?

Mac

Hello,
Does it support Mac /M1/M2 or Linux?

please advise

the model download from https://huggingface.co/seeledu/Chinese-Llama-2-7B can no be used

Loading checkpoint shards: 0%| | 0/2 [01:33<?, ?it/s]
Traceback (most recent call last):
File "/home/chenjunhao/chinese-llama-2/test/inference.py", line 137, in
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.bfloat16, device_map="auto")
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/modeling_utils.py", line 2643, in from_pretrained
) = cls._load_pretrained_model(
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/modeling_utils.py", line 2966, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/modeling_utils.py", line 671, in _load_state_dict_into_meta_model
set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
File "/home/chenjunhao/.local/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 149, in set_module_tensor_to_device
new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!

Re-dowloaded model files, still got the "no data" error

          > ok, if you have any other questions, you can open another issue to discuss.

Sorry, but I re-download the model files and still got the same error

[2023-07-24 06:38:46,649] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]python-BaseException
Loading checkpoint shards:   0%|          | 0/2 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "/data/kexin/anaconda3/envs/cllama2/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 149, in set_module_tensor_to_device
    new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!

Process finished with exit code 143

Originally posted by @XiongKexin in #2 (comment)

Error of llama.cpp convert

Traceback (most recent call last):
File "convert.py", line 1264, in
main()
File "convert.py", line 1244, in main
model_plus = load_some_model(args.model)
File "convert.py", line 1165, in load_some_model
models_plus.append(lazy_load_file(path))
File "convert.py", line 955, in lazy_load_file
return lazy_load_torch_file(fp, path)
File "convert.py", line 826, in lazy_load_torch_file
model = unpickler.load()
File "convert.py", line 815, in find_class
return self.CLASSES[(module, name)]
KeyError: ('torch._utils', '_rebuild_meta_tensor_no_storage')

支持13B和70B参数的模型微调吗

你好，非常开心和感激你让llama-2对中文的支持。我看到你是支持了7B参数的，相同的代码可以用于支持13B和70B参数的模型微调吗？

如何让它的回答更加丰富

相对于llama-2英文版的回复，中文的回复是比较短，没有英文的丰富，如何让它的回答更加丰富？谢谢

can't run llama-2-7b-hf

Hi there. I'm running fine-tune codes and get the error message.

Traceback (most recent call last):
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/configuration_utils.py", line 672, in _get_config_dict
resolved_config_file = cached_file(
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/utils/hub.py", line 417, in cached_file
resolved_file = hf_hub_download(
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
validate_repo_id(arg_value)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/autodl-tmp/Chinese-Llama-2/model/llama2-7B-HF'. Use repo_type argument if needed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/autodl-tmp/Chinese-Llama-2/train/run_clm_lora.py", line 786, in
main()
File "/root/autodl-tmp/Chinese-Llama-2/train/run_clm_lora.py", line 454, in main
config = AutoConfig.from_pretrained(model_args.model_name_or_path, config_kwargs)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 983, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/configuration_utils.py", line 617, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/transformers/configuration_utils.py", line 693, in _get_config_dict
raise EnvironmentError(
OSError: Can't load the configuration of '/root/autodl-tmp/Chinese-Llama-2/model/llama2-7B-HF'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/root/autodl-tmp/Chinese-Llama-2/model/llama2-7B-HF' is the correct path to a directory containing a config.json file
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 11867) of binary: /root/miniconda3/envs/llama-v2/bin/python
Traceback (most recent call last):
File "/root/miniconda3/envs/llama-v2/bin/torchrun", line 8, in
sys.exit(main())
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init**.py", line 346, in wrapper
return f(*args, kwargs)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/root/miniconda3/envs/llama-v2/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/root/autodl-tmp/Chinese-Llama-2/train/run_clm_lora.py FAILED

Failures:
[1]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 11868)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 11869)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 11870)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 11871)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 11872)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 11873)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 11874)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 11867)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Fine tuning

I'm running the bash script to fine-tune the model and get the following error message:

[W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address).

Could you please check that?

model inference error

{'': 0}
Using pad_token, but it is not set yet.
Setting pad_token_id to eos_token_id:2 for open-end generation.
Traceback (most recent call last):
File "/home/chenjunhao/chinese-llama-2/test/inference_lora.py", line 160, in
generated_ids = model.generate(inputs=input_ids, attention_mask=attn_mask, generation_config=gen_config)
File "/home/chenjunhao/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/generation/utils.py", line 1462, in generate
return self.sample(
File "/home/chenjunhao/chinese-llama-2/transformers/src/transformers/generation/utils.py", line 2514, in sample
next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either inf, nan or element < 0

every elements in probs are 0

Unable to download the Chinese-Llama-2-7B-conpre model

Hello, downloading the Chinese-Llama-2-7B-conpre model requires a password. Can you share the download password?

Question about the tokenizer

Hello! It is very nice that you adapt Llama 2 for Chinese language and got great result.
I am new to LLM, and I wonder how to get the tokenizer for Llama 2? If I remember correctly, Llama 2 does not offically support Chinese, and the official model only have a couple hunderds of Chinese characters in its tokenizer.
Any explanation will be greatly appreciated, thanks!

NotImplementedError: Cannot copy out of meta tensor; no data!

When I run test/inference.py, there is an error "NotImplementedError: Cannot copy out of meta tensor; no data!". I don't know how to fix it. Is this due to my wrong transformer version(4.29.0)?

[2023-07-24 05:20:38,101] [INFO] [real_accelerator.py:133:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]python-BaseException
Loading checkpoint shards:   0%|          | 0/2 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "/data/kexin/anaconda3/envs/cllama2/lib/python3.8/site-packages/accelerate/utils/modeling.py" line 149, in set_module_tensor_to_device
    new_value = value.to(device)
NotImplementedError: Cannot copy out of meta tensor; no data!

Process finished with exit code 143

longyuewangdcu / chinese-llama-2 Goto Github PK

chinese-llama-2's Introduction

Hi there 👋

Star History

chinese-llama-2's People

Contributors

Stargazers

Watchers

Forkers

chinese-llama-2's Issues

/root/autodl-tmp/Chinese-Llama-2/train/run_clm_lora.py FAILED

Root Cause (first observed failure): [0]: time : 2023-07-31_09:50:12 host : autodl-container-95b911bb00-66f99c55 rank : 0 (local_rank: 0) exitcode : 1 (pid: 11867) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Recommend Projects

Recommend Topics

Recommend Org

Root Cause (first observed failure):
[0]:
time : 2023-07-31_09:50:12
host : autodl-container-95b911bb00-66f99c55
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 11867)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html