Git Product home page Git Product logo

tutorial's Introduction

InternLM

👋 join us on Discord and WeChat

Introduction

InternLM2 series are released with the following features:

  • 200K Context window: Nearly perfect at finding needles in the haystack with 200K-long context, with leading performance on long-context tasks like LongBench and L-Eval. Try it with LMDeploy for 200K-context inference.

  • Outstanding comprehensive performance: Significantly better than the last generation in all dimensions, especially in reasoning, math, code, chat experience, instruction following, and creative writing, with leading performance among open-source models in similar sizes. In some evaluations, InternLM2-Chat-20B may match or even surpass ChatGPT (GPT-3.5).

  • Code interpreter & Data analysis: With code interpreter, InternLM2-Chat-20B obtains compatible performance with GPT-4 on GSM8K and MATH. InternLM2-Chat also provides data analysis capability.

  • Stronger tool use: Based on better tool utilization-related capabilities in instruction following, tool selection and reflection, InternLM2 can support more kinds of agents and multi-step tool calling for complex tasks. See examples.

News

[2024.03.26] We release InternLM2 technical report. See arXiv for details.

[2024.01.31] We release InternLM2-1.8B, along with the associated chat model. They provide a cheaper deployment option while maintaining leading performance.

[2024.01.23] We release InternLM2-Math-7B and InternLM2-Math-20B with pretraining and SFT checkpoints. They surpass ChatGPT with small sizes. See InternLM-Math for details and download.

[2024.01.17] We release InternLM2-7B and InternLM2-20B and their corresponding chat models with stronger capabilities in all dimensions. See model zoo below for download or model cards for more details.

[2023.12.13] InternLM-7B-Chat and InternLM-20B-Chat checkpoints are updated. With an improved finetuning strategy, the new chat models can generate higher quality responses with greater stylistic diversity.

[2023.09.20] InternLM-20B is released with base and chat versions.

Model Zoo

Model Transformers(HF) ModelScope(HF) OpenXLab(HF) OpenXLab(Origin) Release Date
InternLM2-1.8B 🤗internlm2-1.8b internlm2-1.8b Open in OpenXLab Open in OpenXLab 2024-01-31
InternLM2-Chat-1.8B-SFT 🤗internlm2-chat-1.8b-sft internlm2-chat-1.8b-sft Open in OpenXLab Open in OpenXLab 2024-01-31
InternLM2-Chat-1.8B 🤗internlm2-chat-1.8b internlm2-chat-1.8b Open in OpenXLab Open in OpenXLab 2024-02-19
InternLM2-Base-7B 🤗internlm2-base-7b internlm2-base-7b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-7B 🤗internlm2-7b internlm2-7b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Chat-7B-SFT 🤗internlm2-chat-7b-sft internlm2-chat-7b-sft Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Chat-7B 🤗internlm2-chat-7b internlm2-chat-7b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Base-20B 🤗internlm2-base-20b internlm2-base-20b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-20B 🤗internlm2-20b internlm2-20b Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Chat-20B-SFT 🤗internlm2-chat-20b-sft internlm2-chat-20b-sft Open in OpenXLab Open in OpenXLab 2024-01-17
InternLM2-Chat-20B 🤗internlm2-chat-20b internlm2-chat-20b Open in OpenXLab Open in OpenXLab 2024-01-17

Notes:

The release of InternLM2 series contains two model sizes: 7B and 20B. 7B models are efficient for research and application and 20B models are more powerful and can support more complex scenarios. The relation of these models are shown as follows.

  1. InternLM2-Base: Foundation models with high quality and high adaptation flexibility, which serve as a good starting point for downstream deep adaptations.
  2. InternLM2: Further pretrain with general domain data and domain-enhanced corpus, obtaining state-of-the-art performance in evaluation with good language capability. InternLM2 models are recommended for consideration in most applications.
  3. InternLM2-Chat-SFT: Intermediate version of InternLM2-Chat that only undergoes supervised fine-tuning (SFT), based on the InternLM2-Base model. We release them to benefit research on alignment.
  4. InternLM2-Chat: Further aligned on top of InternLM2-Chat-SFT through online RLHF. InternLM2-Chat exhibits better instruction following, chat experience, and function call, which is recommended for downstream applications.

Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.

Supplements: HF refers to the format used by HuggingFace in transformers, whereas Origin denotes the format adopted by the InternLM team in InternEvo.

Performance

Objective Evaluation

Dataset Baichuan2-7B-Chat Mistral-7B-Instruct-v0.2 Qwen-7B-Chat InternLM2-Chat-7B ChatGLM3-6B Baichuan2-13B-Chat Mixtral-8x7B-Instruct-v0.1 Qwen-14B-Chat InternLM2-Chat-20B
MMLU 50.1 59.2 57.1 63.7 58.0 56.6 70.3 66.7 66.5
CMMLU 53.4 42.0 57.9 63.0 57.8 54.8 50.6 68.1 65.1
AGIEval 35.3 34.5 39.7 47.2 44.2 40.0 41.7 46.5 50.3
C-Eval 53.9 42.4 59.8 60.8 59.1 56.3 54.0 71.5 63.0
TrivialQA 37.6 35.0 46.1 50.8 38.1 40.3 57.7 54.5 53.9
NaturalQuestions 12.8 8.1 18.6 24.1 14.0 12.7 22.5 22.9 25.9
C3 78.5 66.9 84.4 91.5 79.3 84.4 82.1 91.5 93.5
CMRC 8.1 5.6 14.6 63.8 43.2 27.8 5.3 13.0 50.4
WinoGrande 49.9 50.8 54.2 65.8 61.7 50.9 60.9 55.7 74.8
BBH 35.9 46.5 45.5 61.2 56.0 42.5 57.3 55.8 68.3
GSM-8K 32.4 48.3 44.1 70.7 53.8 56.0 71.7 57.7 79.6
Math 5.7 8.6 12.0 23.0 20.4 4.3 22.5 27.6 31.9
HumanEval 17.7 35.4 36.0 59.8 52.4 19.5 37.8 40.9 67.1
MBPP 37.7 25.7 33.9 51.4 55.6 40.9 40.9 30.0 65.8
  • Performance of MBPP is reported with MBPP(Sanitized)

Alignment Evaluation

  • We have evaluated our model on AlpacaEval 2.0 and InternLM2-Chat-20B surpass Claude 2, GPT-4(0613) and Gemini Pro.
Model Name Win Rate Length
GPT-4 Turbo 50.00% 2049
GPT-4 23.58% 1365
GPT-4 0314 22.07% 1371
Mistral Medium 21.86% 1500
XwinLM 70b V0.1 21.81% 1775
InternLM2 Chat 20B 21.75% 2373
Mixtral 8x7B v0.1 18.26% 1465
Claude 2 17.19% 1069
Gemini Pro 16.85% 1315
GPT-4 0613 15.76% 1140
Claude 2.1 15.73% 1096
  • According to the released performance of 2024-01-17.

Requirements

  • Python >= 3.8
  • PyTorch >= 1.12.0 (2.0.0 and above are recommended)
  • Transformers >= 4.38

Usages

We briefly show the usages with Transformers, ModelScope, and Web demos. The chat models adopt chatml format to support both chat and agent applications. To ensure a better usage effect, please make sure that the installed transformers library version meets the following requirements before performing inference with Transformers or ModelScope:

transformers >= 4.38

Import from Transformers

To load the InternLM2-7B-Chat model using Transformers, use the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-7b", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-7b", device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
# Output: Hello? How can I help you today?
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Import from ModelScope

To load the InternLM2-7B-Chat model using ModelScope, use the following code:

import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm2-chat-7b')
tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Dialogue

You can interact with the InternLM Chat 7B model through a frontend interface by running the following code:

pip install streamlit
pip install transformers>=4.38
streamlit run ./chat/web_demo.py

Deployment

We use LMDeploy for fast deployment of InternLM.

With only 4 lines of codes, you can perform internlm2-chat-7b inference after pip install lmdeploy>=0.2.1.

from lmdeploy import pipeline
pipe = pipeline("internlm/internlm2-chat-7b")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Please refer to the guidance for more usages about model deployment. For additional deployment tutorials, feel free to explore here.

200K-long-context Inference

By enabling the Dynamic NTK feature of LMDeploy, you can acquire the long-context inference power.

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=200000)
pipe = pipeline('internlm/internlm2-chat-7b', backend_config=backend_config)
prompt = 'Use a long prompt to replace this sentence'
response = pipe(prompt)
print(response)

Agent

InternLM2-Chat models have excellent tool utilization capabilities and can work with function calls in a zero-shot manner. See more examples in agent session.

Fine-tuning

Please refer to finetune docs for fine-tuning with InternLM.

Note: We have migrated the whole training functionality in this project to InternEvo for easier user experience, which provides efficient pre-training and fine-tuning infra for training InternLM.

Evaluation

We utilize OpenCompass for model evaluation. In InternLM-2, we primarily focus on standard objective evaluation, long-context evaluation (needle in a haystack), data contamination assessment, agent evaluation, and subjective evaluation.

Objective Evaluation

To evaluate the InternLM model, please follow the guidelines in the OpenCompass tutorial. Typically, we use ppl for multiple-choice questions on the Base model and gen for all questions on the Chat model.

Long-Context Evaluation (Needle in a Haystack)

For the Needle in a Haystack evaluation, refer to the tutorial provided in the documentation. Feel free to try it out.

Data Contamination Assessment

To learn more about data contamination assessment, please check the contamination eval.

Agent Evaluation

  • To evaluate tool utilization, please refer to T-Eval.
  • For code interpreter evaluation, use the Math Agent Evaluation provided in the repository.

Subjective Evaluation

  • Please follow the tutorial for subjective evaluation.

Contribution

We appreciate all the contributors for their efforts to improve and enhance InternLM. Community users are highly encouraged to participate in the project. Please refer to the contribution guidelines for instructions on how to contribute to the project.

License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表(中文). For other questions or collaborations, please contact [email protected].

Citation

@misc{cai2024internlm2,
      title={InternLM2 Technical Report},
      author={Zheng Cai and Maosong Cao and Haojiong Chen and Kai Chen and Keyu Chen and Xin Chen and Xun Chen and Zehui Chen and Zhi Chen and Pei Chu and Xiaoyi Dong and Haodong Duan and Qi Fan and Zhaoye Fei and Yang Gao and Jiaye Ge and Chenya Gu and Yuzhe Gu and Tao Gui and Aijia Guo and Qipeng Guo and Conghui He and Yingfan Hu and Ting Huang and Tao Jiang and Penglong Jiao and Zhenjiang Jin and Zhikai Lei and Jiaxing Li and Jingwen Li and Linyang Li and Shuaibin Li and Wei Li and Yining Li and Hongwei Liu and Jiangning Liu and Jiawei Hong and Kaiwen Liu and Kuikun Liu and Xiaoran Liu and Chengqi Lv and Haijun Lv and Kai Lv and Li Ma and Runyuan Ma and Zerun Ma and Wenchang Ning and Linke Ouyang and Jiantao Qiu and Yuan Qu and Fukai Shang and Yunfan Shao and Demin Song and Zifan Song and Zhihao Sui and Peng Sun and Yu Sun and Huanze Tang and Bin Wang and Guoteng Wang and Jiaqi Wang and Jiayu Wang and Rui Wang and Yudong Wang and Ziyi Wang and Xingjian Wei and Qizhen Weng and Fan Wu and Yingtong Xiong and Chao Xu and Ruiliang Xu and Hang Yan and Yirong Yan and Xiaogui Yang and Haochen Ye and Huaiyuan Ying and Jia Yu and Jing Yu and Yuhang Zang and Chuyu Zhang and Li Zhang and Pan Zhang and Peng Zhang and Ruijie Zhang and Shuo Zhang and Songyang Zhang and Wenjian Zhang and Wenwei Zhang and Xingcheng Zhang and Xinyue Zhang and Hui Zhao and Qian Zhao and Xiaomeng Zhao and Fengzhe Zhou and Zaida Zhou and Jingming Zhuo and Yicheng Zou and Xipeng Qiu and Yu Qiao and Dahua Lin},
      year={2024},
      eprint={2403.17297},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

tutorial's People

Contributors

ajupyter avatar axyzdong avatar bittersweet1999 avatar crazysteeaam avatar fanqino1 avatar fly2tomato avatar grey2818 avatar hongru0306 avatar hscspring avatar islinxu avatar jianfeng777 avatar jiegenius avatar jimmyma99 avatar kmno4-zx avatar lindsey-chang avatar logan-zou avatar maxchiron avatar pommespeter avatar rangeking avatar saaraas-1300 avatar saigering avatar sanbuphy avatar seifer08ms avatar shengshenlan avatar tonysy avatar vansin avatar woodx9 avatar xiaomile avatar zhanghui-china avatar zhjunqin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tutorial's Issues

关于训练后的模型文件结构以及使用方式上的疑惑

image
以上为我qlora之后得到的lora文件以及合并后的文件结构,以下为原始qwen-1.8b-chat文件结构,两者并不完全一致。
image
由于qwen在modelscope上对chat模型做了特殊的问答代码处理,比如chat模型可以以history保存对话记录,如果我们的模型结构不一致,并且少了部分文件,是否可以用同样的代码调用训练好的模型呢?

微调相关问题

我想做普通语料的微调,不是对话形式的,要怎么准备语料

在开发机环境中,使用huggingface_hub下载一直timeout,是墙的原因么?

import os 
from huggingface_hub import hf_hub_download  # Load model directly

hf_hub_download(repo_id="internlm/internlm-20b", filename="config.json")
Traceback (most recent call last):
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/connectionpool.py", line 712, in urlopen
    self._prepare_proxy(conn)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1012, in _prepare_proxy
    conn.connect()
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/connection.py", line 374, in connect
    self._tunnel()
  File "/root/.conda/envs/internlm/lib/python3.10/http/client.py", line 921, in _tunnel
    (version, code, message) = response._read_status()
  File "/root/.conda/envs/internlm/lib/python3.10/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/root/.conda/envs/internlm/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /internlm/internlm-20b/resolve/main/config.json (Caused by ProxyError('Cannot connect to proxy.', TimeoutError('timed out')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/code/InternLM/homework/course_2/huggingface_download.py", line 5, in <module>
    hf_hub_download(repo_id="internlm/internlm-20b", filename="config.json", cache_dir=saved_dir)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
    r = _request_wrapper(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
    response = _request_wrapper(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 408, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 67, in send
    return super().send(request, *args, **kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/requests/adapters.py", line 513, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /internlm/internlm-20b/resolve/main/config.json (Caused by ProxyError('Cannot connect to proxy.', TimeoutError('timed out')))"), '(Request ID: af7451db-5de3-48cc-903f-b8ed77108e81)')

typeerror

is_event_data = issubclass(parameter_types.get(name, int), EventData)

TypeError: issubclass() arg 1 must be a class
image

lmdeploy量化问题

image
在基于internlm开发的导盲犬工程中,使用W4A16进行模型量化的时候遇到了以上问题,怀疑是ptb数据集未能成功联网下载,手动下载ptb数据集之后不知道该如何链接到本地,希望能提供解决思路,谢谢!

第1节课笔记(11班)

Discussed in #62

Originally posted by Shengshenlan January 3, 2024

https://zhuanlan.zhihu.com/p/676289592

xtuner下的self.md文档有漏写

在2.6 网页demo运行前,需先进入到 /root/personal_assistant/code/InternLM目录下
否则demo在推理时会报错
即将

streamlit run /root/personal_assistant/code/InternLM/web_demo.py --server.address 127.0.0.1 --server.port 6006

改成

cd /root/personal_assistant/code/InternLM 
streamlit run /root/personal_assistant/code/InternLM/web_demo.py --server.address 127.0.0.1 --server.port 6006

HF Unavailable.

Since HF is unavailable, I made a mirror the minilm model that was used in Lesson 3.
here

2.2代码有点问题

mkdir -p /root/model/Shanghai_AI_Laboratory
cp -r /root/share/temp/model_repos/internlm-chat-7b /root/model/Shanghai_AI_Laboratory

这里应该改正为

mkdir -p /root/model/Shanghai_AI_Laboratory
cp -r /root/share/temp/model_repos/internlm-chat-7b /root/model/Shanghai_AI_Laboratory/internlm-chat-7b

Internlm2-chat-7b在进行单轮对话微调后,灾难性遗忘,求帮助

image

max_length = 2048
pack_to_max_length = True

Scheduler & Optimizer

batch_size = 1 # per_device
accumulative_counts = 16
dataloader_num_workers = 0
max_epochs = 3
optim_type = AdamW
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1 # grad clip
warmup_ratio = 0.03

Save

save_steps = 500
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)

Evaluate the generation performance during the training

evaluation_freq = 500
SYSTEM = ''
evaluation_inputs = [
'请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai'
]
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
padding_side='right')

model = dict(
type=SupervisedFinetune,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))

Xtuner在自定义的数据集上运行报错

环境配置都是按照教程来的。
然后自定义的数据集为:

{"conversation": [{"system": "你是一位旅游路线规划方向上,关键词提取的高手,你能够精确的获取到句子中的关键词,从而能够保证在网络上能够搜索到非常准确且良好的内容以供回复", "input": "我想要计划一个澳大利亚深度旅游,有哪些景点值得推荐?", "output": "澳大利亚, 深度旅游, 景点推荐"}]}
{"conversation": [{"system": "你是一位旅游路线规划方向上,关键词提取的高手,你能够精确的获取到句子中的关键词,从而能够保证在网络上能够搜索到非常准确且良好的内容以供回复", "input": "我准备去日本旅行,有什么好的行程规划建议?", "output": "日本, 旅行, 行程规划建议"}]}
{"conversation": [{"system": "你是一位旅游路线规划方向上,关键词提取的高手,你能够精确的获取到句子中的关键词,从而能够保证在网络上能够搜索到非常准确且良好的内容以供回复", "input": "我打算明年去巴黎旅游,需要提前订票吗?", "output": "巴黎, 旅游, 订票"}]}

总共有200多条。我是直接复制到MedQA上的。模型微调的时候成功运行起来了,其实可能由于数据太少了,运行的非常快,一两分钟就完成了。但是在转换为huggingface的时候报错了。

(xtuner0.1.9) root@intern-studio:# export MKL_SERVICE_FORCE_INTEL=1
(xtuner0.1.9) root@intern-studio:
# xtuner convert pth_to_hf /root/ft-medqa/internlm_chat_7b_qlora_medqa2019_e3.py /root/ft-medqa/work_dirs/internlm_chat_7b_qlora_medqa2019_e3/epoch_3.pth /root/hf
[2024-01-10 14:41:22,702] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
[2024-01-10 14:41:33,272] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
Traceback (most recent call last):
File "/root/xtuner019/xtuner/xtuner/tools/model_converters/pth_to_hf.py", line 105, in
main()
File "/root/xtuner019/xtuner/xtuner/tools/model_converters/pth_to_hf.py", line 82, in main
model = BUILDER.build(cfg.model)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/root/xtuner019/xtuner/xtuner/model/sft.py", line 24, in init
self.llm = self._build_from_cfg_or_module(llm)
File "/root/xtuner019/xtuner/xtuner/model/sft.py", line 76, in _build_from_cfg_or_module
return BUILDER.build(cfg_or_mod)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/root/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 487, in from_pretrained
resolved_config_file = cached_file(
File "/root/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 429, in cached_file
resolved_file = hf_hub_download(
File "/root/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
validate_repo_id(arg_value)
File "/root/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 164, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: './internlm-chat-7b'.

现在不知道怎么来解决。可以指教一下吗?

如何多卡merge合并?

【正确】在微调llama3-70B的时候,可以双卡微调,两张A800-80G足够。
【错误】但进行merge合并时,无法多卡合并,只能单卡合并,因此显存不足,执行NPROC_PER_NODE=2 xtuner convert merge跟CUDA_VISIBLE_DEVICES=0,1 xtuner convert merge均会报错。
【错误图片】
image
【显存图片】
image

opencompass测评结果为--,没有显示内容

运行代码是教程里的demo代码:
python run.py --datasets ceval_gen --hf-path /share/temp/model_repos/internlm-chat-7b/ --tokenizer-path /share/temp/model_repos/internlm-chat-7b/ --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 --debug
我的batch size设了1 2 4,都试过了,debug的过程一直有error。
报错图片如下
92978f754cd0352c453922d8e7bdf34

SSH连接问题

mmexport1704337847650

您好,我按照internstudio教程远程连接,最后一步显示端口被禁用,请问应该如何解决呢?

第2节课作业(6班)

微信名:锡林大街
基础作业:

使用 InternLM-Chat-7B 模型生成 300 字的小故事(需截图)。
image
image
image

熟悉 hugging face 下载功能,使用 huggingface_hub python 包,下载 InternLM-20B 的 config.json 文件到本地(需截图下载过程)。
image

微调时报错:AttributeError: 'InternLM2ForCausalLM' object has no attribute '_set_gradient_checkpointing'. Did you mean: 'is_gradient_checkpointing'?

1、在微调启动时,报这个错,求助:
AttributeError: 'InternLM2ForCausalLM' object has no attribute '_set_gradient_checkpointing'. Did you mean: 'is_gradient_checkpointing'?

image

2、我的环境是这样的:

accelerate==0.27.2
addict==2.4.0
aiohttp==3.9.3
aiosignal==1.3.1
aliyun-python-sdk-core==2.14.0
aliyun-python-sdk-kms==2.16.2
altair==5.2.0
annotated-types==0.6.0
anyio==4.2.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
arxiv==2.1.0
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
bitsandbytes==0.42.0
bleach==6.1.0
blinker==1.7.0
cachetools==5.3.2
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
comm==0.2.1
contourpy==1.2.0
crcmod==1.7
cryptography==42.0.3
cycler==0.12.1
datasets==2.14.7
debugpy==1.8.1
decorator==5.1.1
deepspeed==0.13.2
defusedxml==0.7.1
dill==0.3.7
distro==1.9.0
einops==0.7.0
exceptiongroup==1.2.0
executing==2.0.1
fastjsonschema==2.19.1
feedparser==6.0.10
filelock==3.13.1
fonttools==4.49.0
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2023.6.0
func-timeout==4.3.5
gast==0.5.4
gitdb==4.0.11
GitPython==3.1.42
google-search-results==2.4.2
griffe==0.40.1
h11==0.14.0
hjson==3.1.0
httpcore==1.0.3
httpx==0.26.0
huggingface-hub==0.17.3
idna==3.6
importlib-metadata==6.11.0
ipykernel==6.29.2
ipython==8.21.0
ipywidgets==8.1.2
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
jmespath==0.10.0
json5==0.9.14
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.2
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.12.5
jupyter_server_terminals==0.5.2
jupyterlab==4.1.1
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.3
jupyterlab_widgets==3.0.10
kiwisolver==1.4.5
lagent==0.2.1
lxml==5.1.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mdurl==0.1.2
mistune==3.0.2
mmengine==0.10.3
modelscope==1.12.0
mpi4py_mpich==3.1.5
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
nbclient==0.9.0
nbconvert==7.16.0
nbformat==5.9.2
nest-asyncio==1.6.0
networkx==3.2.1
ninja==1.11.1.1
notebook==7.1.0
notebook_shim==0.2.4
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
opencv-python==4.9.0.80
oss2==2.18.4
overrides==7.7.0
packaging==23.2
pandas==2.2.0
pandocfilters==1.5.1
parso==0.8.3
peft==0.8.2
pexpect==4.9.0
phx-class-registry==4.1.0
Pillow==9.5.0
platformdirs==4.2.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
py-cpuinfo==9.0.0
pyarrow==15.0.0
pyarrow-hotfix==0.6
pycparser==2.21
pycryptodome==3.20.0
pydantic==2.6.1
pydantic_core==2.16.2
pydeck==0.8.1b0
Pygments==2.17.2
Pympler==1.0.1
pynvml==11.5.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-json-logger==2.0.7
python-pptx==0.6.23
pytz==2024.1
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0.1
pyzmq==25.1.2
qtconsole==5.5.1
QtPy==2.4.1
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.0
rpds-py==0.18.0
safetensors==0.4.2
scipy==1.12.0
Send2Trash==1.8.2
sentencepiece==0.1.99
sgmllib3k==1.0.0
simplejson==3.19.2
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.5
stack-data==0.6.3
streamlit==1.24.0
sympy==1.12
tenacity==8.2.3
termcolor==2.4.0
terminado==0.18.0
tiktoken==0.6.0
timeout-decorator==0.5.0
tinycss2==1.2.1
tokenizers==0.14.1
toml==0.10.2
tomli==2.0.1
toolz==0.12.1
torch==2.2.0
tornado==6.4
tqdm==4.66.2
traitlets==5.14.1
transformers==4.34.0
transformers-stream-generator==0.0.4
triton==2.2.0
types-python-dateutil==2.8.19.20240106
typing_extensions==4.9.0
tzdata==2024.1
tzlocal==4.3.1
uri-template==1.3.0
urllib3==2.2.0
validators==0.22.0
watchdog==4.0.0
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
widgetsnbextension==4.0.10
XlsxWriter==3.1.9
-e git+https://gitee.com/Internlm/xtuner@9f686f08c8e60e568e811aaad8daf9c08462d42d#egg=xtuner
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zipp==3.17.0

在自己数据集上xunter train报错

`(xt) root@autodl-container-acc940bcfe-be30ce08:/autodl-tmp/ft# cd config
(xt) root@autodl-container-acc940bcfe-be30ce08:
/autodl-tmp/ft/config# xtuner train internlm2_chat_7b_qlora_alpaca_e3_copy.py
[2024-05-30 18:04:29,376] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
05/30 18:04:29 - mmengine - WARNING - WARNING: command error: '[Errno 2] No such file or directory: '/root/autodl-tmp/ft/data/cutlass/CHANGELOG.md''!
05/30 18:04:29 - mmengine - WARNING -
Arguments received: ['xtuner', 'train', 'internlm2_chat_7b_qlora_alpaca_e3_copy.py']. xtuner commands use the following syntax:

    xtuner MODE MODE_ARGS ARGS

    Where   MODE (required) is one of ('list-cfg', 'copy-cfg', 'log-dataset', 'check-custom-dataset', 'train', 'test', 'chat', 'convert', 'preprocess', 'mmbench', 'eval_refcoco')
            MODE_ARG (optional) is the argument for specific mode
            ARGS (optional) are the arguments for specific command

Some usages for xtuner commands: (See more by using -h for specific command!)

    1. List all predefined configs:
        xtuner list-cfg
    2. Copy a predefined config to a given path:
        xtuner copy-cfg $CONFIG $SAVE_FILE
    3-1. Fine-tune LLMs by a single GPU:
        xtuner train $CONFIG
    3-2. Fine-tune LLMs by multiple GPUs:
        NPROC_PER_NODE=$NGPUS NNODES=$NNODES NODE_RANK=$NODE_RANK PORT=$PORT ADDR=$ADDR xtuner dist_train $CONFIG $GPUS
    4-1. Convert the pth model to HuggingFace's model:
        xtuner convert pth_to_hf $CONFIG $PATH_TO_PTH_MODEL $SAVE_PATH_TO_HF_MODEL
    4-2. Merge the HuggingFace's adapter to the pretrained base model:
        xtuner convert merge $LLM $ADAPTER $SAVE_PATH
        xtuner convert merge $CLIP $ADAPTER $SAVE_PATH --is-clip
    4-3. Split HuggingFace's LLM to the smallest sharded one:
        xtuner convert split $LLM $SAVE_PATH
    5-1. Chat with LLMs with HuggingFace's model and adapter:
        xtuner chat $LLM --adapter $ADAPTER --prompt-template $PROMPT_TEMPLATE --system-template $SYSTEM_TEMPLATE
    5-2. Chat with VLMs with HuggingFace's model and LLaVA:
        xtuner chat $LLM --llava $LLAVA --visual-encoder $VISUAL_ENCODER --image $IMAGE --prompt-template $PROMPT_TEMPLATE --system-template $SYSTEM_TEMPLATE
    6-1. Preprocess arxiv dataset:
        xtuner preprocess arxiv $SRC_FILE $DST_FILE --start-date $START_DATE --categories $CATEGORIES
    6-2. Preprocess refcoco dataset:
        xtuner preprocess refcoco --ann-path $RefCOCO_ANN_PATH --image-path $COCO_IMAGE_PATH --save-path $SAVE_PATH
    7-1. Log processed dataset:
        xtuner log-dataset $CONFIG
    7-2. Verify the correctness of the config file for the custom dataset:
        xtuner check-custom-dataset $CONFIG
    8. MMBench evaluation:
        xtuner mmbench $LLM --llava $LLAVA --visual-encoder $VISUAL_ENCODER --prompt-template $PROMPT_TEMPLATE --data-path $MMBENCH_DATA_PATH
    9. Refcoco evaluation:
        xtuner eval_refcoco $LLM --llava $LLAVA --visual-encoder $VISUAL_ENCODER --prompt-template $PROMPT_TEMPLATE --data-path $REFCOCO_DATA_PATH
    10. List all dataset formats which are supported in XTuner

Run special commands:

    xtuner help
    xtuner version

GitHub: https://github.com/InternLM/xtuner`

config文件:
`# Copyright (c) OpenMMLab. All rights reserved.
import torch
from datasets import load_dataset
from mmengine.dataset import DefaultSampler
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
from peft import LoraConfig
from torch.optim import AdamW
from transformers import (AutoModelForCausalLM, AutoTokenizer,
BitsAndBytesConfig)

from xtuner.dataset import process_hf_dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.map_fns import openai_map_fn, template_map_fn_factory
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
VarlenAttnArgsToMessageHubHook)
from xtuner.engine.runner import TrainLoop
from xtuner.model import SupervisedFinetune
from xtuner.parallel.sequence import SequenceParallelSampler
from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE

#######################################################################

PART 1 Settings

#######################################################################

Model

pretrained_model_name_or_path = './autodl-tmp/RAG-langchain/models/internlm2-chat-7b'
use_varlen_attn = False

Data

alpaca_en_path = './autodl-tmp/ft/data/train_fold_1.json'
prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = 1024
pack_to_max_length = True

parallel

sequence_parallel_size = 1

Scheduler & Optimizer

batch_size = 1 # per_device
accumulative_counts = 16
accumulative_counts *= sequence_parallel_size
dataloader_num_workers = 0
max_epochs = 2
optim_type = AdamW
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1 # grad clip
warmup_ratio = 0.03

Save

save_steps = 300
save_total_limit = 3 # Maximum checkpoints to keep (-1 means unlimited)

Evaluate the generation performance during the training

evaluation_freq = 300
SYSTEM = ''
evaluation_inputs = ['判断以下新闻情绪,积极为1,消极为0', '判断该新闻正负面,正面为1,负面为0', '请判断以下新闻是积极还是消极,积极为1,消极为0,你的答案只有1或0']

#######################################################################

PART 2 Model & Tokenizer

#######################################################################
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
padding_side='right')

model = dict(
type=SupervisedFinetune,
use_varlen_attn=use_varlen_attn,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))

#######################################################################

PART 3 Dataset & Dataloader

#######################################################################
alpaca_en = dict(
type=process_hf_dataset,
dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)),
tokenizer=tokenizer,
max_length=max_length,
dataset_map_fn=openai_map_fn,
template_map_fn=dict(
type=template_map_fn_factory, template=prompt_template),
remove_unused_columns=True,
shuffle_before_pack=True,
pack_to_max_length=pack_to_max_length,
use_varlen_attn=use_varlen_attn)

sampler = SequenceParallelSampler
if sequence_parallel_size > 1 else DefaultSampler
train_dataloader = dict(
batch_size=batch_size,
num_workers=dataloader_num_workers,
dataset=alpaca_en,
sampler=dict(type=sampler, shuffle=True),
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))

#######################################################################

PART 4 Scheduler & Optimizer

#######################################################################

optimizer

optim_wrapper = dict(
type=AmpOptimWrapper,
optimizer=dict(
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
accumulative_counts=accumulative_counts,
loss_scale='dynamic',
dtype='float16')

learning policy

More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501

param_scheduler = [
dict(
type=LinearLR,
start_factor=1e-5,
by_epoch=True,
begin=0,
end=warmup_ratio * max_epochs,
convert_to_iter_based=True),
dict(
type=CosineAnnealingLR,
eta_min=0.0,
by_epoch=True,
begin=warmup_ratio * max_epochs,
end=max_epochs,
convert_to_iter_based=True)
]

train, val, test setting

train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)

#######################################################################

PART 5 Runtime

#######################################################################

Log the dialogue periodically during the training process, optional

custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
dict(
type=EvaluateChatHook,
tokenizer=tokenizer,
every_n_iters=evaluation_freq,
evaluation_inputs=evaluation_inputs,
system=SYSTEM,
prompt_template=prompt_template)
]

if use_varlen_attn:
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]

configure default hooks

default_hooks = dict(
# record the time of every iteration.
timer=dict(type=IterTimerHook),
# print log every 10 iterations.
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
# enable the parameter scheduler.
param_scheduler=dict(type=ParamSchedulerHook),
# save checkpoint per save_steps.
checkpoint=dict(
type=CheckpointHook,
by_epoch=False,
interval=save_steps,
max_keep_ckpts=save_total_limit),
# set sampler seed in distributed evrionment.
sampler_seed=dict(type=DistSamplerSeedHook),
)

configure environment

env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)

set visualizer

visualizer = None

set log level

log_level = 'INFO'

load from which checkpoint

load_from = None

whether to resume training from the loaded checkpoint

resume = False

Defaults to use random seed and disable deterministic

randomness = dict(seed=None, deterministic=False)

set log processor

log_processor = dict(by_epoch=False)`

无法使用lmdeploy chat

我按照教程配置了所有的环境内容,但是一运行就报错..
后面我离线转换的也是报错这个

(lmdeploy) root@intern-studio:~# lmdeploy chat turbomind /share/temp/model_repos/internlm-chat-7b/ --model-name internlm-chat-7b
model_source: hf_model
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
model_config:
{
"model_name": "internlm-chat-7b",
"tensor_para_size": 1,
"head_num": 32,
"kv_head_num": 32,
"vocab_size": 103168,
"num_layer": 32,
"inter_size": 11008,
"norm_eps": 1e-06,
"attn_bias": 1,
"start_id": 1,
"end_id": 2,
"session_len": 2056,
"weight_type": "fp16",
"rotary_embedding": 128,
"rope_theta": 10000.0,
"size_per_head": 128,
"group_size": 0,
"max_batch_size": 64,
"max_context_token_num": 1,
"step_length": 1,
"cache_max_entry_count": 0.5,
"cache_block_seq_len": 128,
"cache_chunk_size": 1,
"use_context_fmha": 1,
"quant_policy": 0,
"max_position_embeddings": 2048,
"rope_scaling_factor": 0.0,
"use_logn_attn": 0
}
get 323 model params
[WARNING] gemm_config.in is not found; using default GEMM algo
session 1

double enter to end input >>> hello

<|System|>:You are an AI assistant whose name is InternLM (书生·浦语).

  • InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
  • InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.

<|User|>:hello
<|Bot|>: [AMP ERROR][CudaFrontend.cpp:94][1705496068:532304]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT

===============================================

Back trace dump:
/usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fc0d92cc302]
/lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, int const*)+0x241) [0x7fc0d94fb471]
/lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, const CUdevice*): Assertion `0' failed.
Aborted (core dumped)

tests.test_query_gradi 脚本名称拼写错误

错误原因: gradio写成了gradi

具体内容:
python3 -m tests.test_query_gradi --work_dir <新向量数据库路径>
应该改为
python3 -m tests.test_query_gradio --work_dir <新向量数据库路径>

importlib.metadata.PackageNotFoundError: No package metadata was found for xtuner

反复按照教程安装出现这个问题:
(xtuner0.1.9) root@intern-studio:~/xtuner019/xtuner# xtuner
Traceback (most recent call last):
File "/root/.local/bin/xtuner", line 33, in
sys.exit(load_entry_point('xtuner', 'console_scripts', 'xtuner')())
File "/root/.local/bin/xtuner", line 22, in importlib_load_entry_point
for entry_point in distribution(dist_name).entry_points
File "/share/conda_envs/internlm-base/lib/python3.10/importlib/metadata/init.py", line 969, in
distribution return Distribution.from_name(distribution_name)
File "/share/conda_envs/internlm-base/lib/python3.10/importlib/metadata/init.py", line 548, in
from_name raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for xtuner

之前可以训练,后来更换参数和数据报以下错误,就直接尝试在base环境里安装就出现上面的错误。
nohup: ignoring input
[2024-01-11 19:09:22,787] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-11 19:10:00,577] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
01/11 19:10:22 - mmengine - INFO -

System environment:
sys.platform: linux
Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 716082487
GPU 0: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.0.1
PyTorch compiling details: PyTorch built with:

  • GCC 9.3

  • C++ Version: 201703

  • Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications

  • Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)

  • OpenMP 201511 (a.k.a. OpenMP 4.5)

  • LAPACK is enabled (usually provided by MKL)

  • NNPACK is enabled

  • CPU capability usage: AVX2

  • CUDA Runtime 11.7

  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37

  • CuDNN 8.5

  • Magma 2.6.1

  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

    TorchVision: 0.15.2
    OpenCV: 4.9.0
    MMEngine: 0.10.2

Runtime environment:
launcher: none
randomness: {'seed': None, 'deterministic': False}
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: None
deterministic: False
Distributed launcher: none
Distributed training: False
GPU number: 1

01/11 19:10:22 - mmengine - INFO - Config:
SYSTEM = ''
accumulative_counts = 16
batch_size = 1
betas = (
0.9,
0.999,
)
custom_hooks = [
dict(
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.engine.DatasetInfoHook'),
dict(
evaluation_inputs=[
'请给我介绍五个上海的景点',
'Please tell me five scenic spots in Shanghai',
],
every_n_iters=500,
prompt_template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
system='',
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.engine.EvaluateChatHook'),
]
data_path = '/root/code/xturn/grade-school-math/grade_school_math/data/new'
dataloader_num_workers = 0
default_hooks = dict(
checkpoint=dict(interval=1, type='mmengine.hooks.CheckpointHook'),
logger=dict(interval=10, type='mmengine.hooks.LoggerHook'),
param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),
sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),
timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(
cudnn_benchmark=False,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
evaluation_freq = 500
evaluation_inputs = [
'请给我介绍五个上海的景点',
'Please tell me five scenic spots in Shanghai',
]
launcher = 'none'
load_from = None
log_level = 'INFO'
lr = 0.0002
max_epochs = 3
max_length = 2048
max_norm = 1
model = dict(
llm=dict(
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
quantization_config=dict(
bnb_4bit_compute_dtype='torch.float16',
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True,
llm_int8_has_fp16_weight=False,
llm_int8_threshold=6.0,
load_in_4bit=True,
load_in_8bit=False,
type='transformers.BitsAndBytesConfig'),
torch_dtype='torch.float16',
trust_remote_code=True,
type='transformers.AutoModelForCausalLM.from_pretrained'),
lora=dict(
bias='none',
lora_alpha=16,
lora_dropout=0.1,
r=64,
task_type='CAUSAL_LM',
type='peft.LoraConfig'),
type='xtuner.model.SupervisedFinetune')
optim_type = 'bitsandbytes.optim.PagedAdamW32bit'
optim_wrapper = dict(
optimizer=dict(
betas=(
0.9,
0.999,
),
lr=0.0002,
type='bitsandbytes.optim.PagedAdamW32bit',
weight_decay=0),
type='DeepSpeedOptimWrapper')
pack_to_max_length = True
param_scheduler = dict(
T_max=3,
by_epoch=True,
convert_to_iter_based=True,
eta_min=0.0,
type='mmengine.optim.CosineAnnealingLR')
pretrained_model_name_or_path = '/root/model/Shanghai_AI_Laboratory/internlm-chat-7b'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.internlm_chat'
randomness = dict(deterministic=False, seed=None)
resume = False
runner_type = 'FlexibleRunner'
strategy = dict(
config=dict(
bf16=dict(enabled=True),
fp16=dict(enabled=False, initial_scale_power=16),
gradient_accumulation_steps='auto',
gradient_clipping='auto',
train_micro_batch_size_per_gpu='auto',
zero_allow_untested_optimizer=True,
zero_force_ds_cpu_optimizer=False,
zero_optimization=dict(overlap_comm=True, stage=2)),
exclude_frozen_parameters=True,
gradient_accumulation_steps=16,
gradient_clipping=1,
train_micro_batch_size_per_gpu=1,
type='DeepSpeedStrategy')
tokenizer = dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(by_epoch=True, max_epochs=3, val_interval=1)
train_dataloader = dict(
batch_size=1,
collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn'),
dataset=dict(
dataset=dict(
path=
'/root/code/xturn/grade-school-math/grade_school_math/data/new',
type='datasets.load_dataset'),
dataset_map_fn='xtuner.dataset.map_fns.oasst1_map_fn',
max_length=2048,
pack_to_max_length=True,
remove_unused_columns=True,
shuffle_before_pack=True,
template_map_fn=dict(
template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
type='xtuner.dataset.map_fns.template_map_fn_factory'),
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.dataset.process_hf_dataset'),
num_workers=0,
sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler'))
train_dataset = dict(
dataset=dict(
path='/root/code/xturn/grade-school-math/grade_school_math/data/new',
type='datasets.load_dataset'),
dataset_map_fn='xtuner.dataset.map_fns.oasst1_map_fn',
max_length=2048,
pack_to_max_length=True,
remove_unused_columns=True,
shuffle_before_pack=True,
template_map_fn=dict(
template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
type='xtuner.dataset.map_fns.template_map_fn_factory'),
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.dataset.process_hf_dataset')
visualizer = None
weight_decay = 0
work_dir = './work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy'

01/11 19:10:23 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
01/11 19:10:24 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook

before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DatasetInfoHook
(NORMAL ) EvaluateChatHook
(VERY_LOW ) CheckpointHook

before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook

before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook

after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) EvaluateChatHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_train_epoch:
(NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

before_val:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook

before_val_epoch:
(NORMAL ) IterTimerHook

before_val_iter:
(NORMAL ) IterTimerHook

after_val_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_val:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) EvaluateChatHook

after_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) EvaluateChatHook
(VERY_LOW ) CheckpointHook

before_test:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook

before_test_epoch:
(NORMAL ) IterTimerHook

before_test_iter:
(NORMAL ) IterTimerHook

after_test_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_test:
(VERY_HIGH ) RuntimeInfoHook

after_run:
(BELOW_NORMAL) LoggerHook

01/11 19:10:26 - mmengine - WARNING - Dataset Dataset has no metainfo. dataset_meta in visualizer will be None.
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>

Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Loading checkpoint shards: 12%|█▎ | 1/8 [00:03<00:23, 3.34s/it]
Loading checkpoint shards: 25%|██▌ | 2/8 [00:06<00:19, 3.30s/it]
Loading checkpoint shards: 38%|███▊ | 3/8 [00:09<00:16, 3.25s/it]
Loading checkpoint shards: 50%|█████ | 4/8 [00:12<00:12, 3.17s/it]
Loading checkpoint shards: 62%|██████▎ | 5/8 [00:16<00:09, 3.20s/it]
Loading checkpoint shards: 75%|███████▌ | 6/8 [00:20<00:07, 3.65s/it]
Loading checkpoint shards: 88%|████████▊ | 7/8 [00:23<00:03, 3.43s/it]
Loading checkpoint shards: 100%|██████████| 8/8 [00:24<00:00, 2.55s/it]
Loading checkpoint shards: 100%|██████████| 8/8 [00:24<00:00, 3.04s/it]
01/11 19:10:53 - mmengine - INFO - dispatch internlm attn forward
01/11 19:10:53 - mmengine - WARNING - Due to the implementation of the PyTorch version of flash attention, even when the output_attentions flag is set to True, it is not possible to return the attn_weights.
[2024-01-11 19:11:48,486] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown
[2024-01-11 19:11:48,486] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-11 19:11:48,486] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-01-11 19:11:48,653] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.237.56, master_port=29500
[2024-01-11 19:11:48,653] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-11 19:11:49,014] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-01-11 19:11:49,019] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-01-11 19:11:49,019] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-01-11 19:11:49,089] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = PagedAdamW32bit
[2024-01-11 19:11:49,089] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=PagedAdamW32bit type=<class 'bitsandbytes.optim.adamw.PagedAdamW32bit'>
[2024-01-11 19:11:49,089] [WARNING] [engine.py:1166:do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2024-01-11 19:11:49,089] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-01-11 19:11:49,090] [INFO] [stage_1_and_2.py:148:init] Reduce bucket size 500,000,000
[2024-01-11 19:11:49,090] [INFO] [stage_1_and_2.py:149:init] Allgather bucket size 500,000,000
[2024-01-11 19:11:49,090] [INFO] [stage_1_and_2.py:150:init] CPU Offload: False
[2024-01-11 19:11:49,090] [INFO] [stage_1_and_2.py:151:init] Round robin gradient partitioning: False
[2024-01-11 19:11:51,497] [INFO] [utils.py:791:see_memory_usage] Before initializing optimizer states
[2024-01-11 19:11:51,498] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 5.93 GB CA 6.31 GB Max_CA 6 GB
[2024-01-11 19:11:51,498] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 105.09 GB, percent = 5.2%
[2024-01-11 19:11:51,748] [INFO] [utils.py:791:see_memory_usage] After initializing optimizer states
[2024-01-11 19:11:51,748] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 6.23 GB CA 6.91 GB Max_CA 7 GB
[2024-01-11 19:11:51,749] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 105.32 GB, percent = 5.2%
[2024-01-11 19:11:51,749] [INFO] [stage_1_and_2.py:516:init] optimizer state initialized
[2024-01-11 19:11:51,870] [INFO] [utils.py:791:see_memory_usage] After initializing ZeRO optimizer
[2024-01-11 19:11:51,871] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 5.63 GB CA 6.91 GB Max_CA 7 GB
[2024-01-11 19:11:51,871] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 105.39 GB, percent = 5.2%
[2024-01-11 19:11:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = PagedAdamW32bit
[2024-01-11 19:11:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-01-11 19:11:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-01-11 19:11:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002], mom=[(0.9, 0.999)]
[2024-01-11 19:11:51,886] [INFO] [config.py:984:print] DeepSpeedEngine configuration:
[2024-01-11 19:11:51,886] [INFO] [config.py:988:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-01-11 19:11:51,886] [INFO] [config.py:988:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-01-11 19:11:51,886] [INFO] [config.py:988:print] amp_enabled .................. False
[2024-01-11 19:11:51,886] [INFO] [config.py:988:print] amp_params ................... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] bfloat16_enabled ............. True
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] checkpoint_parallel_write_pipeline False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] checkpoint_tag_validation_enabled True
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] checkpoint_tag_validation_fail False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f0b4ba0bee0>
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] communication_data_type ...... None
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] curriculum_enabled_legacy .... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] curriculum_params_legacy ..... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] data_efficiency_enabled ...... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] dataloader_drop_last ......... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] disable_allgather ............ False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] dump_state ................... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] dynamic_loss_scale_args ...... None
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_enabled ........... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_gas_boundary_resolution 1
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_layer_num ......... 0
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_max_iter .......... 100
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_stability ......... 1e-06
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_tol ............... 0.01
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_verbose ........... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] elasticity_enabled ........... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] fp16_auto_cast ............... None
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] fp16_enabled ................. False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] fp16_master_weights_and_gradients False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] global_rank .................. 0
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] grad_accum_dtype ............. None
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] gradient_accumulation_steps .. 16
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] gradient_clipping ............ 1
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] gradient_predivide_factor .... 1.0
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] graph_harvesting ............. False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] initial_dynamic_scale ........ 1
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] load_universal_checkpoint .... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] loss_scale ................... 1.0
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] memory_breakdown ............. False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] mics_hierarchial_params_gather False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] mics_shard_size .............. -1
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] optimizer_legacy_fusion ...... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] optimizer_name ............... None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] optimizer_params ............. None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] pld_enabled .................. False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] pld_params ................... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] prescale_gradients ........... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] scheduler_name ............... None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] scheduler_params ............. None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] seq_parallel_communication_data_type torch.float32
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] sparse_attention ............. None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] sparse_gradients_enabled ..... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] steps_per_print .............. 10000000000000
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] train_batch_size ............. 16
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] train_micro_batch_size_per_gpu 1
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] use_data_before_expert_parallel
False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] use_node_local_storage ....... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] wall_clock_breakdown ......... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] weight_quantization_config ... None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] world_size ................... 1
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_allow_untested_optimizer True
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_enabled ................. True
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_force_ds_cpu_optimizer .. False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_optimization_stage ...... 2
[2024-01-11 19:11:51,888] [INFO] [config.py:974:print_user_config] json = {
"gradient_accumulation_steps": 16,
"train_micro_batch_size_per_gpu": 1,
"gradient_clipping": 1,
"zero_allow_untested_optimizer": true,
"zero_force_ds_cpu_optimizer": false,
"zero_optimization": {
"stage": 2,
"overlap_comm": true
},
"fp16": {
"enabled": false,
"initial_scale_power": 16
},
"bf16": {
"enabled": true
},
"steps_per_print": 1.000000e+13
}
Traceback (most recent call last):
File "/root/xtuner019/xtuner/xtuner/tools/train.py", line 260, in
main()
File "/root/xtuner019/xtuner/xtuner/tools/train.py", line 256, in main
runner.train()
File "/root/.local/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1182, in train
self.strategy.prepare(
File "/root/.local/lib/python3.10/site-packages/mmengine/_strategy/deepspeed.py", line 389, in prepare
self.param_schedulers = self.build_param_scheduler(
File "/root/.local/lib/python3.10/site-packages/mmengine/_strategy/base.py", line 658, in build_param_scheduler
param_schedulers = self._build_param_scheduler(
File "/root/.local/lib/python3.10/site-packages/mmengine/_strategy/base.py", line 563, in _build_param_scheduler
PARAM_SCHEDULERS.build(
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 294, in build_scheduler_from_cfg
return scheduler_cls.build_iter_from_epoch( # type: ignore
File "/root/.local/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 663, in build_iter_from_epoch
assert epoch_length is not None and epoch_length > 0,
AssertionError: epoch_length must be a positive integer, but got 0.
请各位大佬看看

3090量化llama3 70b爆显存

在3090显卡,24g显存,使用lmdeploy lite awq量化llama3 70b在79层爆显存,按照建议增加了PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
在nvidia-smi看只使用了第一张卡,可以使用多张卡吗?
爆显存

openxlab pip 安装的时候显示缺少debian依赖项 swig 但是根目录packages.txt已经存放了Debian依赖项 swig

openxlab pip 安装的时候显示缺少该依赖项 swig
但是packages.txt已经存放了Debian依赖项 swig
我的项目地址:https://github.com/jabberwockyang/MedicalReviewAgent/

我的requirements.txt

accelerate>=0.26.1
aiohttp
auto-gptq
bcembedding
beautifulsoup4
datrie==0.8.2
duckduckgo_search
cachetools==5.3.3
einops
faiss-gpu
hanziconv==0.3.2
gradio==4.25.0
langchain>=0.1.12
langchain-community==0.0.38
loguru
lxml_html_clean
openai>=1.0.0
openpyxl
pandas
pdfplumber==0.10.4
pydantic>=1.10.13
PyPDF2==3.0.1
pymupdf
python-docx
pytoml
readability-lxml
redis
requests
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
scikit-learn
sentence_transformers==2.2.2
StrEnum==0.4.15
# See https://github.com/deanmalmgren/textract/issues/461
# textract @ git+https://github.com/tpoisonooo/textract@master
textract
tiktoken
torch>=2.0.0
transformers>=4.37.0
transformers_stream_generator
unstructured
xgboost==2.0.3
# onnxruntime-gpu==1.17.1
onnxruntime-gpu
shapely==2.0.3
pyclipper==1.3.0.post5
xpinyin==0.7.6
opencv-python==4.9.0.80

我的packages.txt:

libgl1-mesa-glx
swig
libpulse-dev

构建log报错部分:


  Building wheel for pocketsphinx (setup.py): started
  Building wheel for pocketsphinx (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [4 lines of output]
      running bdist_wheel
      running build_ext
      swig -python -modern -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/android -Ideps/sphinxbase/swig -outdir sphinxbase -o swig/sphinxbase/ad_wrap.c swig/sphinxbase/ad.i
      error: command 'swig' failed: No such file or directory
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pocketsphinx
  Running setup.py clean for pocketsphinx
  Building wheel for python-pptx (setup.py): started
  Building wheel for python-pptx (setup.py): finished with status 'done'
  Created wheel for python-pptx: filename=python_pptx-0.6.5-py3-none-any.whl size=237916 sha256=6b403598c7ab61172040c7a30d733d11a730e49dd58e8ee353889672d9354647
  Stored in directory: /home/xlab-app-center/.cache/pip/wheels/9e/be/30/b674cc595ee134f617509231a1bfb635a3f6927b9846735e37
Successfully built datrie hanziconv textract docx2txt EbookLib python-pptx
Failed to build pocketsphinx
ERROR: Could not build wheels for pocketsphinx, which is required to install pyproject.toml-based projects
ERROR: executor failed running [/bin/sh -c /bin/bash -c 'if [[ -f requirements.txt ]]; then pip --timeout=100 install --no-warn-script-location  -r requirements.txt; else echo no requirements.txt ; fi']: runc did not terminate sucessfully
============================== build end ==============================

第3节课作业(6班)

Discussed in #274

Originally posted by maxchiron January 11, 2024
第3节课作业(6班)

基础作业:

复现课程知识库助手搭建过程 (截图)
LangChain相关环境配置
image
数据收集
image

构建检索问答链
image
最终结果
image

天气agent的问题,无输出

我跑了范例,发现没有输出。然后我进行debug。 debug到这个信息,麻烦指点一下,为什么actionReturn 一个 NoAction?

前面输出正常
image

后面action出错
image

debug位置
image

感觉adapter有问题,

比如 prompt如下:
[{'role': 'system', 'content': "你是一个可以调用外部工具的助手,可以使用的工具包括:\n{'GoogleSearch': '一个可以从谷歌搜索结果的API。\n当你需要对于一个特定问题找到简短明了的回答时,可以使用它。\n输入应该是一个搜索查询。\n'}\n如果使用工具请遵循以下格式回复:\n\nThought:思考你当前步骤需要解决什么问题,是否需要使用工具\nAction:工具名称,你的工具必须从 [['GoogleSearch']] 选择\nAction Input:工具输入参数\n\n工具返回按照以下格式回复:\n\nResponse:调用工具后的结果\n\n如果你已经知道了答案,或者你不需要工具,请遵循以下格式回复\n\nThought:给出最终答案的思考过程\nFinal Answer:最终答案\n\n开始!"}, {'role': 'user', 'content': '深圳明天的天气?'}]

模型会回应如下:
"Thought:我需要查询天气预报API来回答这个问题。\nAction:forecast_weather\nAction Input:{'location': '深圳', 'days': 1}}"

追踪到lagent/actions/action_executor.py 发现forecast_weather不合法, 程序只能接受GoogleSearch,也就是说模型不给出GoogleSearch的action就无法获得结果。 到了react.py chat第4轮, 如果action是 NoAction,那么就会得到空值。 如果没有在4轮以内收到action是FinishAction,那么得到一个default_response,无法回答问题。

感觉debug差不多了, 估计要麻烦老师看一下魔搭的adapter了。

4bit量化后问答失效

internlm模型

直接转为 turbomind模型,并部署,一切正常:

41e4e6d01812b2e5999dc81424822e9 2253f644f2e92deb081c1f651057681

先计算minmax
lmdeploy lite calibrate
--model /home/zhanghui/models/internlm/internlm-chat-7b/
--calib_dataset "c4"
--calib_samples 128
--calib_seqlen 2048
--work_dir ./quant_output
image
image

再#通过 minmax 获取量化参数
lmdeploy lite kv_qparams
--work_dir ./quant_output
--turbomind_dir workspace/triton_models/weights/
--kv_sym False
--num_tp 1
image

再进行4bit量化:
lmdeploy lite auto_awq
--model /home/zhanghui/models/internlm/internlm-chat-7b/
--w_bits 4
--w_group_size 128
--work_dir ./quant_output
image
image

将量化后的模型转为TurboMind格式:
lmdeploy convert internlm-chat-7b ./quant_output
--model-format awq
--group-size 128
--dst_path ./workspace_quant
image

image

lmdeploy chat turbomind ./workspace_quant
image

问答失效了!

lmdeploy chat turbomind ./workspace_quant --model-format awq

image

依然如此

具体流程参见 https://zhuanlan.zhihu.com/p/678960135

申请openXLab 的docker预装一下 glibc-2.29

在部署应用到openXLab GPU 平台环境时发现,缺省的docker缺乏glibc 2.29.
由于新版transformers的tokenizers需要2.29版本的GLIBC,所以申请预安装。

脚本如下:

wget -c https://ftp.gnu.org/gnu/glibc/glibc-2.29.tar.gz
tar -zxvf glibc-2.29.tar.gz
mkdir glibc-2.29/build
cd glibc-2.29/build
../configure --prefix=/opt/glibc
make
make install

申请openXLab 的docker预装sqlite3 >= 3.35.0

当跑到加载chroma向量数据库时,会出现一下error:

File "/usr/local/share/python/.pyenv/versions/3.9.16/lib/python3.9/site-packages/chromadb/init.py", line 79, in
raise RuntimeError(
RuntimeError: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0.

需要docker预装的sqlite3 >= 3.35.0

camp2 lmdeploy llava运行时输入高分辨率图片会返回空字符串

camp2 lmdeploy llava运行时输入高分辨率图片会返回空字符串

项目地址 https://github.com/InternLM/Tutorial/blob/camp2/lmdeploy/README.md#61-%E4%BD%BF%E7%94%A8lmdeploy%E8%BF%90%E8%A1%8C%E8%A7%86%E8%A7%89%E5%A4%9A%E6%A8%A1%E6%80%81%E5%A4%A7%E6%A8%A1%E5%9E%8Bllava

当输入一张分辨率为 1920*1080 分辨率的图片时不会返回文字
llava4
打印response时显示text为空
llava5

解决方法为手动降低分辨率

import gradio as gr
from lmdeploy import pipeline


# pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b') 非开发机运行此命令
pipe = pipeline('/share/new_models/liuhaotian/llava-v1.6-vicuna-7b')

def model(image, text):
    if image is None:
        return [(text, "请上传一张图片。")]
    else:
        width, height = image.size
        print(f"width = {width}, height = {height}")

        # 调整图片最长宽/高为256
        if max(width, height) > 256:
            ratio = max(width, height) / 256
            n_width = int(width / ratio)
            n_height = int(height / ratio)
            print(f"new width = {n_width}, new height = {n_height}")
            image = image.resize((n_width, n_height))

        response = pipe((text, image)).text
        print(f"response: {response}")
        return [(text, response)]

demo = gr.Interface(fn=model, inputs=[gr.Image(type="pil"), gr.Textbox()], outputs=gr.Chatbot())
demo.launch()   

更改后效果可以正常返回text
llava6
llava7

lmdeploy[all]==0.3.0安装是否只能在linux下进行

你好,lmdeploy[all]==0.3.0安装是否只能在linux下进行?windows下报这个错,然后我从网上找到的triton都是linux版本的

(lmdeploy) PS D:\pythonworkspace\MyPET> pip install lmdeploy[all]==0.3.0
Looking in indexes: https://mirrors.aliyun.com/pypi/simple
Collecting lmdeploy==0.3.0 (from lmdeploy[all]==0.3.0)
Downloading https://mirrors.aliyun.com/pypi/packages/2e/7c/b616d7485c4b81b2fe0a6a734548697d25293d5468210aafbf371d24a790/lmdeploy-0.3.0-cp310-cp310-win_amd64.whl (56.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 5.3 MB/s eta 0:00:00
Requirement already satisfied: fastapi in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.110.1)
Requirement already satisfied: fire in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.6.0)
Requirement already satisfied: mmengine-lite in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.10.3)
Requirement already satisfied: numpy in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (1.26.4)
Collecting peft<=0.9.0 (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0)
Downloading https://mirrors.aliyun.com/pypi/packages/08/87/3e7eb34ac06d3f4ac72e2302e9e69bef12247a8a627c59a4d8a498135727/peft-0.9.0-py3-none-any.whl (190 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 190.9/190.9 kB 3.9 MB/s eta 0:00:00
Requirement already satisfied: pillow in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (10.3.0)
Requirement already satisfied: protobuf in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (4.25.3)
Requirement already satisfied: pydantic>2.0.0 in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (2.6.4)
Requirement already satisfied: pynvml in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (11.5.0)
Requirement already satisfied: safetensors in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.4.2)
Requirement already satisfied: sentencepiece in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.2.0)
Requirement already satisfied: shortuuid in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (1.0.13)
Requirement already satisfied: tiktoken in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.6.0)
Collecting torch<=2.1.2,>=2.0.0 (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0)
Downloading https://mirrors.aliyun.com/pypi/packages/16/bf/2ba0f0f7c07b9a14c027e181e44c58824e13f7352607ed32db18321599a2/torch-2.1.2-cp310-cp310-win_amd64.whl (192.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 192.3/192.3 MB 7.3 MB/s eta 0:00:00
Collecting transformers<=4.38.2,>=4.33.0 (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0)
Downloading https://mirrors.aliyun.com/pypi/packages/b6/4d/fbe6d89fde59d8107f0a02816c4ac4542a8f9a85559fdf33c68282affcc1/transformers-4.38.2-py3-none-any.whl (8.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.5/8.5 MB 7.7 MB/s eta 0:00:00
INFO: pip is looking at multiple versions of lmdeploy to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement triton<=2.2.0,>=2.1.0 (from lmdeploy) (from versions: none)
ERROR: No matching distribution found for triton<=2.2.0,>=2.1.0

lmdeploy教程疑问 - KV Cache量化和W4A16量化怎么叠加?

lmdeploy教程量化部分 分别介绍了如何做KV Cache量化和W4A16量化,两者结果都得到turbomind格式的模型。
但怎么把这两者结合起来?比如在KV Cache量化的结果上做W4A16量化。
lmdeploy lite calibratelmdeploy lite auto_awq都𣎴接受turbomind格式的模型,该如何去叠加?

另外,如果想把量化后的模型和别人共享,怎么把turbomind格式的转换成hugging face格式的?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.