internlm / tutorial Goto Github PK

View Code? Open in Web Editor NEW

915.0 915.0 182.0 66.95 MB

LLM Tutorial

Python 100.00%

tutorial's Introduction

InternLM

InternLM ^HOT

📘Commercial Application | 🤗HuggingFace | 🆕Update News | 🤔Reporting Issues | 📜Technical Report

English | 简体中文

👋 join us on Discord and WeChat

Introduction

InternLM2 series are released with the following features:

200K Context window: Nearly perfect at finding needles in the haystack with 200K-long context, with leading performance on long-context tasks like LongBench and L-Eval. Try it with LMDeploy for 200K-context inference.
Outstanding comprehensive performance: Significantly better than the last generation in all dimensions, especially in reasoning, math, code, chat experience, instruction following, and creative writing, with leading performance among open-source models in similar sizes. In some evaluations, InternLM2-Chat-20B may match or even surpass ChatGPT (GPT-3.5).
Code interpreter & Data analysis: With code interpreter, InternLM2-Chat-20B obtains compatible performance with GPT-4 on GSM8K and MATH. InternLM2-Chat also provides data analysis capability.
Stronger tool use: Based on better tool utilization-related capabilities in instruction following, tool selection and reflection, InternLM2 can support more kinds of agents and multi-step tool calling for complex tasks. See examples.

News

[2024.03.26] We release InternLM2 technical report. See arXiv for details.

[2024.01.31] We release InternLM2-1.8B, along with the associated chat model. They provide a cheaper deployment option while maintaining leading performance.

[2024.01.23] We release InternLM2-Math-7B and InternLM2-Math-20B with pretraining and SFT checkpoints. They surpass ChatGPT with small sizes. See InternLM-Math for details and download.

[2024.01.17] We release InternLM2-7B and InternLM2-20B and their corresponding chat models with stronger capabilities in all dimensions. See model zoo below for download or model cards for more details.

[2023.12.13] InternLM-7B-Chat and InternLM-20B-Chat checkpoints are updated. With an improved finetuning strategy, the new chat models can generate higher quality responses with greater stylistic diversity.

[2023.09.20] InternLM-20B is released with base and chat versions.

Model Zoo

Model	Transformers(HF)	ModelScope(HF)	Release Date
InternLM2-1.8B	🤗internlm2-1.8b	internlm2-1.8b	2024-01-31
InternLM2-Chat-1.8B-SFT	🤗internlm2-chat-1.8b-sft	internlm2-chat-1.8b-sft	2024-01-31
InternLM2-Chat-1.8B	🤗internlm2-chat-1.8b	internlm2-chat-1.8b	2024-02-19
InternLM2-Base-7B	🤗internlm2-base-7b	internlm2-base-7b	2024-01-17
InternLM2-7B	🤗internlm2-7b	internlm2-7b	2024-01-17
InternLM2-Chat-7B-SFT	🤗internlm2-chat-7b-sft	internlm2-chat-7b-sft	2024-01-17
InternLM2-Chat-7B	🤗internlm2-chat-7b	internlm2-chat-7b	2024-01-17
InternLM2-Base-20B	🤗internlm2-base-20b	internlm2-base-20b	2024-01-17
InternLM2-20B	🤗internlm2-20b	internlm2-20b	2024-01-17
InternLM2-Chat-20B-SFT	🤗internlm2-chat-20b-sft	internlm2-chat-20b-sft	2024-01-17
InternLM2-Chat-20B	🤗internlm2-chat-20b	internlm2-chat-20b	2024-01-17

Notes:

The release of InternLM2 series contains two model sizes: 7B and 20B. 7B models are efficient for research and application and 20B models are more powerful and can support more complex scenarios. The relation of these models are shown as follows.

InternLM2-Base: Foundation models with high quality and high adaptation flexibility, which serve as a good starting point for downstream deep adaptations.
InternLM2: Further pretrain with general domain data and domain-enhanced corpus, obtaining state-of-the-art performance in evaluation with good language capability. InternLM2 models are recommended for consideration in most applications.
InternLM2-Chat-SFT: Intermediate version of InternLM2-Chat that only undergoes supervised fine-tuning (SFT), based on the InternLM2-Base model. We release them to benefit research on alignment.
InternLM2-Chat: Further aligned on top of InternLM2-Chat-SFT through online RLHF. InternLM2-Chat exhibits better instruction following, chat experience, and function call, which is recommended for downstream applications.

Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements, the model may still produce unexpected outputs due to its size and probabilistic generation paradigm. For example, the generated responses may contain biases, discrimination, or other harmful content. Please do not propagate such content. We are not responsible for any consequences resulting from the dissemination of harmful information.

Supplements: HF refers to the format used by HuggingFace in transformers, whereas Origin denotes the format adopted by the InternLM team in InternEvo.

Performance

Objective Evaluation

Dataset	Baichuan2-7B-Chat	Mistral-7B-Instruct-v0.2	Qwen-7B-Chat	InternLM2-Chat-7B	ChatGLM3-6B	Baichuan2-13B-Chat	Mixtral-8x7B-Instruct-v0.1	Qwen-14B-Chat	InternLM2-Chat-20B
MMLU	50.1	59.2	57.1	63.7	58.0	56.6	70.3	66.7	66.5
CMMLU	53.4	42.0	57.9	63.0	57.8	54.8	50.6	68.1	65.1
AGIEval	35.3	34.5	39.7	47.2	44.2	40.0	41.7	46.5	50.3
C-Eval	53.9	42.4	59.8	60.8	59.1	56.3	54.0	71.5	63.0
TrivialQA	37.6	35.0	46.1	50.8	38.1	40.3	57.7	54.5	53.9
NaturalQuestions	12.8	8.1	18.6	24.1	14.0	12.7	22.5	22.9	25.9
C3	78.5	66.9	84.4	91.5	79.3	84.4	82.1	91.5	93.5
CMRC	8.1	5.6	14.6	63.8	43.2	27.8	5.3	13.0	50.4
WinoGrande	49.9	50.8	54.2	65.8	61.7	50.9	60.9	55.7	74.8
BBH	35.9	46.5	45.5	61.2	56.0	42.5	57.3	55.8	68.3
GSM-8K	32.4	48.3	44.1	70.7	53.8	56.0	71.7	57.7	79.6
Math	5.7	8.6	12.0	23.0	20.4	4.3	22.5	27.6	31.9
HumanEval	17.7	35.4	36.0	59.8	52.4	19.5	37.8	40.9	67.1
MBPP	37.7	25.7	33.9	51.4	55.6	40.9	40.9	30.0	65.8

Performance of MBPP is reported with MBPP(Sanitized)

Alignment Evaluation

We have evaluated our model on AlpacaEval 2.0 and InternLM2-Chat-20B surpass Claude 2, GPT-4(0613) and Gemini Pro.

Model Name	Win Rate	Length
GPT-4 Turbo	50.00%	2049
GPT-4	23.58%	1365
GPT-4 0314	22.07%	1371
Mistral Medium	21.86%	1500
XwinLM 70b V0.1	21.81%	1775
InternLM2 Chat 20B	21.75%	2373
Mixtral 8x7B v0.1	18.26%	1465
Claude 2	17.19%	1069
Gemini Pro	16.85%	1315
GPT-4 0613	15.76%	1140
Claude 2.1	15.73%	1096

According to the released performance of 2024-01-17.

Requirements

Python >= 3.8
PyTorch >= 1.12.0 (2.0.0 and above are recommended)
Transformers >= 4.38

Usages

We briefly show the usages with Transformers, ModelScope, and Web demos. The chat models adopt chatml format to support both chat and agent applications. To ensure a better usage effect, please make sure that the installed transformers library version meets the following requirements before performing inference with Transformers or ModelScope:

transformers >= 4.38

Import from Transformers

To load the InternLM2-7B-Chat model using Transformers, use the following code:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-7b", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained("internlm/internlm2-chat-7b", device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
# Output: Hello? How can I help you today?
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Import from ModelScope

To load the InternLM2-7B-Chat model using ModelScope, use the following code:

import torch
from modelscope import snapshot_download, AutoTokenizer, AutoModelForCausalLM
model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm2-chat-7b')
tokenizer = AutoTokenizer.from_pretrained(model_dir, device_map="auto", trust_remote_code=True)
# Set `torch_dtype=torch.float16` to load model in float16, otherwise it will be loaded as float32 and might cause OOM Error.
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, torch_dtype=torch.float16)
# (Optional) If on low resource devices, you can load model in 4-bit or 8-bit to further save GPU memory via bitsandbytes.
  # InternLM 7B in 4bit will cost nearly 8GB GPU memory.
  # pip install -U bitsandbytes
  # 8-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_8bit=True)
  # 4-bit: model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True, load_in_4bit=True)
model = model.eval()
response, history = model.chat(tokenizer, "hello", history=[])
print(response)
response, history = model.chat(tokenizer, "please provide three suggestions about time management", history=history)
print(response)

Dialogue

You can interact with the InternLM Chat 7B model through a frontend interface by running the following code:

pip install streamlit
pip install transformers>=4.38
streamlit run ./chat/web_demo.py

Deployment

We use LMDeploy for fast deployment of InternLM.

With only 4 lines of codes, you can perform internlm2-chat-7b inference after pip install lmdeploy>=0.2.1.

from lmdeploy import pipeline
pipe = pipeline("internlm/internlm2-chat-7b")
response = pipe(["Hi, pls intro yourself", "Shanghai is"])
print(response)

Please refer to the guidance for more usages about model deployment. For additional deployment tutorials, feel free to explore here.

200K-long-context Inference

By enabling the Dynamic NTK feature of LMDeploy, you can acquire the long-context inference power.

from lmdeploy import pipeline, GenerationConfig, TurbomindEngineConfig

backend_config = TurbomindEngineConfig(rope_scaling_factor=2.0, session_len=200000)
pipe = pipeline('internlm/internlm2-chat-7b', backend_config=backend_config)
prompt = 'Use a long prompt to replace this sentence'
response = pipe(prompt)
print(response)

Agent

InternLM2-Chat models have excellent tool utilization capabilities and can work with function calls in a zero-shot manner. See more examples in agent session.

Fine-tuning

Please refer to finetune docs for fine-tuning with InternLM.

Note: We have migrated the whole training functionality in this project to InternEvo for easier user experience, which provides efficient pre-training and fine-tuning infra for training InternLM.

Evaluation

We utilize OpenCompass for model evaluation. In InternLM-2, we primarily focus on standard objective evaluation, long-context evaluation (needle in a haystack), data contamination assessment, agent evaluation, and subjective evaluation.

Objective Evaluation

To evaluate the InternLM model, please follow the guidelines in the OpenCompass tutorial. Typically, we use ppl for multiple-choice questions on the Base model and gen for all questions on the Chat model.

Long-Context Evaluation (Needle in a Haystack)

For the Needle in a Haystack evaluation, refer to the tutorial provided in the documentation. Feel free to try it out.

Data Contamination Assessment

To learn more about data contamination assessment, please check the contamination eval.

Agent Evaluation

To evaluate tool utilization, please refer to T-Eval.
For code interpreter evaluation, use the Math Agent Evaluation provided in the repository.

Subjective Evaluation

Please follow the tutorial for subjective evaluation.

Contribution

We appreciate all the contributors for their efforts to improve and enhance InternLM. Community users are highly encouraged to participate in the project. Please refer to the contribution guidelines for instructions on how to contribute to the project.

License

The code is licensed under Apache-2.0, while model weights are fully open for academic research and also allow free commercial usage. To apply for a commercial license, please fill in the application form (English)/申请表（中文）. For other questions or collaborations, please contact [email protected].

Citation

@misc{cai2024internlm2,
      title={InternLM2 Technical Report},
      author={Zheng Cai and Maosong Cao and Haojiong Chen and Kai Chen and Keyu Chen and Xin Chen and Xun Chen and Zehui Chen and Zhi Chen and Pei Chu and Xiaoyi Dong and Haodong Duan and Qi Fan and Zhaoye Fei and Yang Gao and Jiaye Ge and Chenya Gu and Yuzhe Gu and Tao Gui and Aijia Guo and Qipeng Guo and Conghui He and Yingfan Hu and Ting Huang and Tao Jiang and Penglong Jiao and Zhenjiang Jin and Zhikai Lei and Jiaxing Li and Jingwen Li and Linyang Li and Shuaibin Li and Wei Li and Yining Li and Hongwei Liu and Jiangning Liu and Jiawei Hong and Kaiwen Liu and Kuikun Liu and Xiaoran Liu and Chengqi Lv and Haijun Lv and Kai Lv and Li Ma and Runyuan Ma and Zerun Ma and Wenchang Ning and Linke Ouyang and Jiantao Qiu and Yuan Qu and Fukai Shang and Yunfan Shao and Demin Song and Zifan Song and Zhihao Sui and Peng Sun and Yu Sun and Huanze Tang and Bin Wang and Guoteng Wang and Jiaqi Wang and Jiayu Wang and Rui Wang and Yudong Wang and Ziyi Wang and Xingjian Wei and Qizhen Weng and Fan Wu and Yingtong Xiong and Chao Xu and Ruiliang Xu and Hang Yan and Yirong Yan and Xiaogui Yang and Haochen Ye and Huaiyuan Ying and Jia Yu and Jing Yu and Yuhang Zang and Chuyu Zhang and Li Zhang and Pan Zhang and Peng Zhang and Ruijie Zhang and Shuo Zhang and Songyang Zhang and Wenjian Zhang and Wenwei Zhang and Xingcheng Zhang and Xinyue Zhang and Hui Zhao and Qian Zhao and Xiaomeng Zhao and Fengzhe Zhou and Zaida Zhou and Jingming Zhuo and Yicheng Zou and Xipeng Qiu and Yu Qiao and Dahua Lin},
      year={2024},
      eprint={2403.17297},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

tutorial's People

Contributors

Stargazers

Watchers

Forkers

snowkingliu mrguanglei jiangye520 strategist922 wakefuldreamer atang0729 ming-zch shengshenlan 8baby8 seifer08ms angellyao islinxu hongru0306 jianfeng777 lum1104 enableasync maxchiron logan-zou starcycle halolah fly2tomato wocclyl tangent-90c 0ax0 tianxia0907 sunxiaobei hscspring xingyuan3 1131764933 codingonion keesh0410 jesean liltomato1998 grey2818 rarachel77 ykuang johnnyabc dqyydq aitejiu bantomato forestemperor zhjunqin zhanghui-china zhangjiahuan17 yunhuisong2023 jamiechoi1995 chengaoj woodx9 rangeking matrixgame2018 www6v yongxie66 ajupyter everfighting tonysy xiaomile axyzdong lionxu liuaqcsu nandss1 wuwenrui jimmyma99 yokebtc cxljkb1994 lee0v0 tangyiyong sanbuphy jiaenyue yangzhc spike2233 phcwj syejing wangxince yxg2020 chg0901 youxia123-cloud puffy310 jianxindong wanyulaowang mz0411 baiyu96 kmno4-zx phbst random-zhou allenxiao95 zzd2001 varuy322 lisaiplus meowcao lyyyxuan chenfrigate lacacy idealdestructor cnfjb333 jiegenius allyoung woxinyonghen nobody-ml lindsey-chang limafang

tutorial's Issues

关于训练后的模型文件结构以及使用方式上的疑惑

以上为我qlora之后得到的lora文件以及合并后的文件结构，以下为原始qwen-1.8b-chat文件结构，两者并不完全一致。

由于qwen在modelscope上对chat模型做了特殊的问答代码处理，比如chat模型可以以history保存对话记录，如果我们的模型结构不一致，并且少了部分文件，是否可以用同样的代码调用训练好的模型呢？

微调相关问题

我想做普通语料的微调，不是对话形式的，要怎么准备语料

在开发机环境中，使用huggingface_hub下载一直timeout，是墙的原因么？

import os 
from huggingface_hub import hf_hub_download  # Load model directly

hf_hub_download(repo_id="internlm/internlm-20b", filename="config.json")

Traceback (most recent call last):
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/connectionpool.py", line 712, in urlopen
    self._prepare_proxy(conn)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/connectionpool.py", line 1012, in _prepare_proxy
    conn.connect()
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/connection.py", line 374, in connect
    self._tunnel()
  File "/root/.conda/envs/internlm/lib/python3.10/http/client.py", line 921, in _tunnel
    (version, code, message) = response._read_status()
  File "/root/.conda/envs/internlm/lib/python3.10/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/root/.conda/envs/internlm/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
TimeoutError: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/connectionpool.py", line 799, in urlopen
    retries = retries.increment(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/urllib3/util/retry.py", line 592, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /internlm/internlm-20b/resolve/main/config.json (Caused by ProxyError('Cannot connect to proxy.', TimeoutError('timed out')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/code/InternLM/homework/course_2/huggingface_download.py", line 5, in <module>
    hf_hub_download(repo_id="internlm/internlm-20b", filename="config.json", cache_dir=saved_dir)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1238, in hf_hub_download
    metadata = get_hf_file_metadata(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1631, in get_hf_file_metadata
    r = _request_wrapper(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 385, in _request_wrapper
    response = _request_wrapper(
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 408, in _request_wrapper
    response = get_session().request(method=method, url=url, **params)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 67, in send
    return super().send(request, *args, **kwargs)
  File "/root/.conda/envs/internlm/lib/python3.10/site-packages/requests/adapters.py", line 513, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /internlm/internlm-20b/resolve/main/config.json (Caused by ProxyError('Cannot connect to proxy.', TimeoutError('timed out')))"), '(Request ID: af7451db-5de3-48cc-903f-b8ed77108e81)')

教程中xtuner_llava部分图片示例有误

XTuner多模态示例llava教程中的开发机准备部分图片演示为10%的开发机，与文字描述和实际使用需求不符。应为30%开发机，考虑到图片细节，需要修改大概3张图片。

您好，请问有C++部署QWEN-VL模型的demo吗？

typeerror

is_event_data = issubclass(parameter_types.get(name, int), EventData)

TypeError: issubclass() arg 1 must be a class

lmdeploy量化问题

在基于internlm开发的导盲犬工程中，使用W4A16进行模型量化的时候遇到了以上问题，怀疑是ptb数据集未能成功联网下载，手动下载ptb数据集之后不知道该如何链接到本地，希望能提供解决思路，谢谢！

Vscode 连接开发机后，命令行中输入中文时乱码

只能输入英文。

第1节课笔记(11班)

Discussed in #62

^{Originally posted by Shengshenlan January 3, 2024}

https://zhuanlan.zhihu.com/p/676289592

TASK5 命令行本地对话不输入直接换行模型会自问自答

xtuner下的self.md文档有漏写

在2.6 网页demo运行前，需先进入到 /root/personal_assistant/code/InternLM目录下
否则demo在推理时会报错
即将

streamlit run /root/personal_assistant/code/InternLM/web_demo.py --server.address 127.0.0.1 --server.port 6006

改成

cd /root/personal_assistant/code/InternLM 
streamlit run /root/personal_assistant/code/InternLM/web_demo.py --server.address 127.0.0.1 --server.port 6006

HF Unavailable.

Since HF is unavailable, I made a mirror the minilm model that was used in Lesson 3.
here

2.2代码有点问题

mkdir -p /root/model/Shanghai_AI_Laboratory
cp -r /root/share/temp/model_repos/internlm-chat-7b /root/model/Shanghai_AI_Laboratory

这里应该改正为

mkdir -p /root/model/Shanghai_AI_Laboratory
cp -r /root/share/temp/model_repos/internlm-chat-7b /root/model/Shanghai_AI_Laboratory/internlm-chat-7b

bash: xtuner: command not found

xtuner安装后无法使用，安装过程输出如下：

lmdeploy量化，如何用自己的数据集calibrate？

openxlab加载不了带RAG的食神大模型

我们的大模型项目：
https://github.com/zhanghui-china/TheGodOfCookery

在本地wsl环境启动完全正常：

页面：

但是部署到 openxlab环境下总是失败：

加载模型的时候就被killed了。

运行地址如下：
https://openxlab.org.cn/apps/detail/zhanghui-china/nlp_shishen2

环境配置：

Internlm2-chat-7b在进行单轮对话微调后，灾难性遗忘，求帮助

max_length = 2048
pack_to_max_length = True

Scheduler & Optimizer

batch_size = 1 # per_device
accumulative_counts = 16
dataloader_num_workers = 0
max_epochs = 3
optim_type = AdamW
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1 # grad clip
warmup_ratio = 0.03

Save

save_steps = 500
save_total_limit = 2 # Maximum checkpoints to keep (-1 means unlimited)

Evaluate the generation performance during the training

evaluation_freq = 500
SYSTEM = ''
evaluation_inputs = [
'请给我介绍五个上海的景点', 'Please tell me five scenic spots in Shanghai'
]
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
padding_side='right')

model = dict(
type=SupervisedFinetune,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))

请问lmdeploy如何在多卡部署

输入命令lmdeploy serve api_server后，默认是在单卡（0号卡）上部署，请问怎么实现在多卡上部署？

按照opencompass教程跑InternLM2-chat-7B评测，单卡A100 OOM了

我按照这里的指引跑了opencompass对InternLM2-chat-7B评测，在单卡A100上报如下错误：

之后又在很多任务上报类似下图的错误，说是没找到预测结果

这个评测要在两张A100上才能跑起来吗？

Xtuner在自定义的数据集上运行报错

环境配置都是按照教程来的。
然后自定义的数据集为：

{"conversation": [{"system": "你是一位旅游路线规划方向上，关键词提取的高手，你能够精确的获取到句子中的关键词，从而能够保证在网络上能够搜索到非常准确且良好的内容以供回复", "input": "我想要计划一个澳大利亚深度旅游，有哪些景点值得推荐？", "output": "澳大利亚, 深度旅游, 景点推荐"}]}
{"conversation": [{"system": "你是一位旅游路线规划方向上，关键词提取的高手，你能够精确的获取到句子中的关键词，从而能够保证在网络上能够搜索到非常准确且良好的内容以供回复", "input": "我准备去日本旅行，有什么好的行程规划建议？", "output": "日本, 旅行, 行程规划建议"}]}
{"conversation": [{"system": "你是一位旅游路线规划方向上，关键词提取的高手，你能够精确的获取到句子中的关键词，从而能够保证在网络上能够搜索到非常准确且良好的内容以供回复", "input": "我打算明年去巴黎旅游，需要提前订票吗？", "output": "巴黎, 旅游, 订票"}]}

总共有200多条。我是直接复制到MedQA上的。模型微调的时候成功运行起来了，其实可能由于数据太少了，运行的非常快，一两分钟就完成了。但是在转换为huggingface的时候报错了。

(xtuner0.1.9) root@intern-studio:# export MKL_SERVICE_FORCE_INTEL=1
(xtuner0.1.9) root@intern-studio:# xtuner convert pth_to_hf /root/ft-medqa/internlm_chat_7b_qlora_medqa2019_e3.py /root/ft-medqa/work_dirs/internlm_chat_7b_qlora_medqa2019_e3/epoch_3.pth /root/hf
[2024-01-10 14:41:22,702] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Error: mkl-service + Intel(R) MKL: MKL_THREADING_LAYER=INTEL is incompatible with libgomp-a34b3233.so.1 library.
Try to import numpy first or set the threading layer accordingly. Set MKL_SERVICE_FORCE_INTEL to force it.
[2024-01-10 14:41:33,272] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>
Traceback (most recent call last):
File "/root/xtuner019/xtuner/xtuner/tools/model_converters/pth_to_hf.py", line 105, in
main()
File "/root/xtuner019/xtuner/xtuner/tools/model_converters/pth_to_hf.py", line 82, in main
model = BUILDER.build(cfg.model)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/root/xtuner019/xtuner/xtuner/model/sft.py", line 24, in init
self.llm = self._build_from_cfg_or_module(llm)
File "/root/xtuner019/xtuner/xtuner/model/sft.py", line 76, in _build_from_cfg_or_module
return BUILDER.build(cfg_or_mod)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 121, in build_from_cfg
obj = obj_cls(**args) # type: ignore
File "/root/.local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 487, in from_pretrained
resolved_config_file = cached_file(
File "/root/.local/lib/python3.10/site-packages/transformers/utils/hub.py", line 429, in cached_file
resolved_file = hf_hub_download(
File "/root/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
validate_repo_id(arg_value)
File "/root/.local/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 164, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils.validators.HFValidationError: Repo id must use alphanumeric chars or '-', '', '.', '--' and '..' are forbidden, '-' and '.' cannot start or end the name, max length is 96: './internlm-chat-7b'.

现在不知道怎么来解决。可以指教一下吗？

第4节XTuner多模态微调报错You are using an old version of the checkpointing format that is deprecated

You are using an old version of the checkpointing format that is deprecated (We will also silently ignore gradient_checkpointing_kwargs in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method _set_gradient_checkpointing in your model.

'trust_remote_code'问题是注释的哪里的代码？哪个文件的？

TypeError: transformers.models.auto.auto_factory._BaseAutoModelClass.from_pretrained() got multiple values for keyword argument 'trust_remote_code'

Notes for the first class

【(1)书生·浦语大模型全链路开源体系-哔哩哔哩】 https://b23.tv/mSvHxUF

如何多卡merge合并？

【正确】在微调llama3-70B的时候，可以双卡微调，两张A800-80G足够。
【错误】但进行merge合并时，无法多卡合并，只能单卡合并，因此显存不足，执行NPROC_PER_NODE=2 xtuner convert merge跟CUDA_VISIBLE_DEVICES=0,1 xtuner convert merge均会报错。
【错误图片】

【显存图片】

opencompass测评结果为--，没有显示内容

运行代码是教程里的demo代码：
python run.py --datasets ceval_gen --hf-path /share/temp/model_repos/internlm-chat-7b/ --tokenizer-path /share/temp/model_repos/internlm-chat-7b/ --tokenizer-kwargs padding_side='left' truncation='left' trust_remote_code=True --model-kwargs trust_remote_code=True device_map='auto' --max-seq-len 2048 --max-out-len 16 --batch-size 4 --num-gpus 1 --debug
我的batch size设了1 2 4，都试过了，debug的过程一直有error。
报错图片如下

SSH连接问题

您好，我按照internstudio教程远程连接，最后一步显示端口被禁用，请问应该如何解决呢？

第2节课作业(6班)

微信名：锡林大街
基础作业：

使用 InternLM-Chat-7B 模型生成 300 字的小故事（需截图）。

熟悉 hugging face 下载功能，使用 huggingface_hub python 包，下载 InternLM-20B 的 config.json 文件到本地（需截图下载过程）。

lmdeploy 命令 not found ,QA里的方法也试过了

微调时报错：AttributeError: 'InternLM2ForCausalLM' object has no attribute '_set_gradient_checkpointing'. Did you mean: 'is_gradient_checkpointing'?

1、在微调启动时，报这个错，求助：
AttributeError: 'InternLM2ForCausalLM' object has no attribute '_set_gradient_checkpointing'. Did you mean: 'is_gradient_checkpointing'?

2、我的环境是这样的：

accelerate==0.27.2
addict==2.4.0
aiohttp==3.9.3
aiosignal==1.3.1
aliyun-python-sdk-core==2.14.0
aliyun-python-sdk-kms==2.16.2
altair==5.2.0
annotated-types==0.6.0
anyio==4.2.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
arxiv==2.1.0
asttokens==2.4.1
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.3
bitsandbytes==0.42.0
bleach==6.1.0
blinker==1.7.0
cachetools==5.3.2
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
click==8.1.7
colorama==0.4.6
comm==0.2.1
contourpy==1.2.0
crcmod==1.7
cryptography==42.0.3
cycler==0.12.1
datasets==2.14.7
debugpy==1.8.1
decorator==5.1.1
deepspeed==0.13.2
defusedxml==0.7.1
dill==0.3.7
distro==1.9.0
einops==0.7.0
exceptiongroup==1.2.0
executing==2.0.1
fastjsonschema==2.19.1
feedparser==6.0.10
filelock==3.13.1
fonttools==4.49.0
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2023.6.0
func-timeout==4.3.5
gast==0.5.4
gitdb==4.0.11
GitPython==3.1.42
google-search-results==2.4.2
griffe==0.40.1
h11==0.14.0
hjson==3.1.0
httpcore==1.0.3
httpx==0.26.0
huggingface-hub==0.17.3
idna==3.6
importlib-metadata==6.11.0
ipykernel==6.29.2
ipython==8.21.0
ipywidgets==8.1.2
isoduration==20.11.0
jedi==0.19.1
Jinja2==3.1.3
jmespath==0.10.0
json5==0.9.14
jsonpointer==2.4
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.2
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyter_server==2.12.5
jupyter_server_terminals==0.5.2
jupyterlab==4.1.1
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.3
jupyterlab_widgets==3.0.10
kiwisolver==1.4.5
lagent==0.2.1
lxml==5.1.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mdurl==0.1.2
mistune==3.0.2
mmengine==0.10.3
modelscope==1.12.0
mpi4py_mpich==3.1.5
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
nbclient==0.9.0
nbconvert==7.16.0
nbformat==5.9.2
nest-asyncio==1.6.0
networkx==3.2.1
ninja==1.11.1.1
notebook==7.1.0
notebook_shim==0.2.4
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
opencv-python==4.9.0.80
oss2==2.18.4
overrides==7.7.0
packaging==23.2
pandas==2.2.0
pandocfilters==1.5.1
parso==0.8.3
peft==0.8.2
pexpect==4.9.0
phx-class-registry==4.1.0
Pillow==9.5.0
platformdirs==4.2.0
prometheus_client==0.20.0
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
py-cpuinfo==9.0.0
pyarrow==15.0.0
pyarrow-hotfix==0.6
pycparser==2.21
pycryptodome==3.20.0
pydantic==2.6.1
pydantic_core==2.16.2
pydeck==0.8.1b0
Pygments==2.17.2
Pympler==1.0.1
pynvml==11.5.0
pyparsing==3.1.1
python-dateutil==2.8.2
python-json-logger==2.0.7
python-pptx==0.6.23
pytz==2024.1
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0.1
pyzmq==25.1.2
qtconsole==5.5.1
QtPy==2.4.1
referencing==0.33.0
regex==2023.12.25
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.0
rpds-py==0.18.0
safetensors==0.4.2
scipy==1.12.0
Send2Trash==1.8.2
sentencepiece==0.1.99
sgmllib3k==1.0.0
simplejson==3.19.2
six==1.16.0
smmap==5.0.1
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.5
stack-data==0.6.3
streamlit==1.24.0
sympy==1.12
tenacity==8.2.3
termcolor==2.4.0
terminado==0.18.0
tiktoken==0.6.0
timeout-decorator==0.5.0
tinycss2==1.2.1
tokenizers==0.14.1
toml==0.10.2
tomli==2.0.1
toolz==0.12.1
torch==2.2.0
tornado==6.4
tqdm==4.66.2
traitlets==5.14.1
transformers==4.34.0
transformers-stream-generator==0.0.4
triton==2.2.0
types-python-dateutil==2.8.19.20240106
typing_extensions==4.9.0
tzdata==2024.1
tzlocal==4.3.1
uri-template==1.3.0
urllib3==2.2.0
validators==0.22.0
watchdog==4.0.0
wcwidth==0.2.13
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
widgetsnbextension==4.0.10
XlsxWriter==3.1.9
-e git+https://gitee.com/Internlm/xtuner@9f686f08c8e60e568e811aaad8daf9c08462d42d#egg=xtuner
xxhash==3.4.1
yapf==0.40.2
yarl==1.9.4
zipp==3.17.0

在自己数据集上xunter train报错

`(xt) root@autodl-container-acc940bcfe-be30ce08:/autodl-tmp/ft# cd config
(xt) root@autodl-container-acc940bcfe-be30ce08:/autodl-tmp/ft/config# xtuner train internlm2_chat_7b_qlora_alpaca_e3_copy.py
[2024-05-30 18:04:29,376] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
05/30 18:04:29 - mmengine - WARNING - WARNING: command error: '[Errno 2] No such file or directory: '/root/autodl-tmp/ft/data/cutlass/CHANGELOG.md''!
05/30 18:04:29 - mmengine - WARNING -
Arguments received: ['xtuner', 'train', 'internlm2_chat_7b_qlora_alpaca_e3_copy.py']. xtuner commands use the following syntax:

    xtuner MODE MODE_ARGS ARGS

    Where   MODE (required) is one of ('list-cfg', 'copy-cfg', 'log-dataset', 'check-custom-dataset', 'train', 'test', 'chat', 'convert', 'preprocess', 'mmbench', 'eval_refcoco')
            MODE_ARG (optional) is the argument for specific mode
            ARGS (optional) are the arguments for specific command

Some usages for xtuner commands: (See more by using -h for specific command!)

    1. List all predefined configs:
        xtuner list-cfg
    2. Copy a predefined config to a given path:
        xtuner copy-cfg $CONFIG $SAVE_FILE
    3-1. Fine-tune LLMs by a single GPU:
        xtuner train $CONFIG
    3-2. Fine-tune LLMs by multiple GPUs:
        NPROC_PER_NODE=$NGPUS NNODES=$NNODES NODE_RANK=$NODE_RANK PORT=$PORT ADDR=$ADDR xtuner dist_train $CONFIG $GPUS
    4-1. Convert the pth model to HuggingFace's model:
        xtuner convert pth_to_hf $CONFIG $PATH_TO_PTH_MODEL $SAVE_PATH_TO_HF_MODEL
    4-2. Merge the HuggingFace's adapter to the pretrained base model:
        xtuner convert merge $LLM $ADAPTER $SAVE_PATH
        xtuner convert merge $CLIP $ADAPTER $SAVE_PATH --is-clip
    4-3. Split HuggingFace's LLM to the smallest sharded one:
        xtuner convert split $LLM $SAVE_PATH
    5-1. Chat with LLMs with HuggingFace's model and adapter:
        xtuner chat $LLM --adapter $ADAPTER --prompt-template $PROMPT_TEMPLATE --system-template $SYSTEM_TEMPLATE
    5-2. Chat with VLMs with HuggingFace's model and LLaVA:
        xtuner chat $LLM --llava $LLAVA --visual-encoder $VISUAL_ENCODER --image $IMAGE --prompt-template $PROMPT_TEMPLATE --system-template $SYSTEM_TEMPLATE
    6-1. Preprocess arxiv dataset:
        xtuner preprocess arxiv $SRC_FILE $DST_FILE --start-date $START_DATE --categories $CATEGORIES
    6-2. Preprocess refcoco dataset:
        xtuner preprocess refcoco --ann-path $RefCOCO_ANN_PATH --image-path $COCO_IMAGE_PATH --save-path $SAVE_PATH
    7-1. Log processed dataset:
        xtuner log-dataset $CONFIG
    7-2. Verify the correctness of the config file for the custom dataset:
        xtuner check-custom-dataset $CONFIG
    8. MMBench evaluation:
        xtuner mmbench $LLM --llava $LLAVA --visual-encoder $VISUAL_ENCODER --prompt-template $PROMPT_TEMPLATE --data-path $MMBENCH_DATA_PATH
    9. Refcoco evaluation:
        xtuner eval_refcoco $LLM --llava $LLAVA --visual-encoder $VISUAL_ENCODER --prompt-template $PROMPT_TEMPLATE --data-path $REFCOCO_DATA_PATH
    10. List all dataset formats which are supported in XTuner

Run special commands:

    xtuner help
    xtuner version

GitHub: https://github.com/InternLM/xtuner`

config文件：
`# Copyright (c) OpenMMLab. All rights reserved.
import torch
from datasets import load_dataset
from mmengine.dataset import DefaultSampler
from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
LoggerHook, ParamSchedulerHook)
from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
from peft import LoraConfig
from torch.optim import AdamW
from transformers import (AutoModelForCausalLM, AutoTokenizer,
BitsAndBytesConfig)

from xtuner.dataset import process_hf_dataset
from xtuner.dataset.collate_fns import default_collate_fn
from xtuner.dataset.map_fns import openai_map_fn, template_map_fn_factory
from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
VarlenAttnArgsToMessageHubHook)
from xtuner.engine.runner import TrainLoop
from xtuner.model import SupervisedFinetune
from xtuner.parallel.sequence import SequenceParallelSampler
from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE

#######################################################################

PART 1 Settings

#######################################################################

Model

pretrained_model_name_or_path = './autodl-tmp/RAG-langchain/models/internlm2-chat-7b'
use_varlen_attn = False

Data

alpaca_en_path = './autodl-tmp/ft/data/train_fold_1.json'
prompt_template = PROMPT_TEMPLATE.internlm2_chat
max_length = 1024
pack_to_max_length = True

parallel

sequence_parallel_size = 1

Scheduler & Optimizer

batch_size = 1 # per_device
accumulative_counts = 16
accumulative_counts *= sequence_parallel_size
dataloader_num_workers = 0
max_epochs = 2
optim_type = AdamW
lr = 2e-4
betas = (0.9, 0.999)
weight_decay = 0
max_norm = 1 # grad clip
warmup_ratio = 0.03

Save

save_steps = 300
save_total_limit = 3 # Maximum checkpoints to keep (-1 means unlimited)

Evaluate the generation performance during the training

evaluation_freq = 300
SYSTEM = ''
evaluation_inputs = ['判断以下新闻情绪，积极为1，消极为0', '判断该新闻正负面，正面为1，负面为0', '请判断以下新闻是积极还是消极，积极为1，消极为0，你的答案只有1或0']

#######################################################################

PART 2 Model & Tokenizer

#######################################################################
tokenizer = dict(
type=AutoTokenizer.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
padding_side='right')

model = dict(
type=SupervisedFinetune,
use_varlen_attn=use_varlen_attn,
llm=dict(
type=AutoModelForCausalLM.from_pretrained,
pretrained_model_name_or_path=pretrained_model_name_or_path,
trust_remote_code=True,
torch_dtype=torch.float16,
quantization_config=dict(
type=BitsAndBytesConfig,
load_in_4bit=True,
load_in_8bit=False,
llm_int8_threshold=6.0,
llm_int8_has_fp16_weight=False,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type='nf4')),
lora=dict(
type=LoraConfig,
r=64,
lora_alpha=16,
lora_dropout=0.1,
bias='none',
task_type='CAUSAL_LM'))

#######################################################################

PART 3 Dataset & Dataloader

#######################################################################
alpaca_en = dict(
type=process_hf_dataset,
dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)),
tokenizer=tokenizer,
max_length=max_length,
dataset_map_fn=openai_map_fn,
template_map_fn=dict(
type=template_map_fn_factory, template=prompt_template),
remove_unused_columns=True,
shuffle_before_pack=True,
pack_to_max_length=pack_to_max_length,
use_varlen_attn=use_varlen_attn)

sampler = SequenceParallelSampler
if sequence_parallel_size > 1 else DefaultSampler
train_dataloader = dict(
batch_size=batch_size,
num_workers=dataloader_num_workers,
dataset=alpaca_en,
sampler=dict(type=sampler, shuffle=True),
collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))

#######################################################################

PART 4 Scheduler & Optimizer

#######################################################################

optimizer

optim_wrapper = dict(
type=AmpOptimWrapper,
optimizer=dict(
type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
accumulative_counts=accumulative_counts,
loss_scale='dynamic',
dtype='float16')

learning policy

More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501

param_scheduler = [
dict(
type=LinearLR,
start_factor=1e-5,
by_epoch=True,
begin=0,
end=warmup_ratio * max_epochs,
convert_to_iter_based=True),
dict(
type=CosineAnnealingLR,
eta_min=0.0,
by_epoch=True,
begin=warmup_ratio * max_epochs,
end=max_epochs,
convert_to_iter_based=True)
]

train, val, test setting

train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)

#######################################################################

PART 5 Runtime

#######################################################################

Log the dialogue periodically during the training process, optional

custom_hooks = [
dict(type=DatasetInfoHook, tokenizer=tokenizer),
dict(
type=EvaluateChatHook,
tokenizer=tokenizer,
every_n_iters=evaluation_freq,
evaluation_inputs=evaluation_inputs,
system=SYSTEM,
prompt_template=prompt_template)
]

if use_varlen_attn:
custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]

configure default hooks

default_hooks = dict(
# record the time of every iteration.
timer=dict(type=IterTimerHook),
# print log every 10 iterations.
logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
# enable the parameter scheduler.
param_scheduler=dict(type=ParamSchedulerHook),
# save checkpoint per save_steps.
checkpoint=dict(
type=CheckpointHook,
by_epoch=False,
interval=save_steps,
max_keep_ckpts=save_total_limit),
# set sampler seed in distributed evrionment.
sampler_seed=dict(type=DistSamplerSeedHook),
)

configure environment

env_cfg = dict(
# whether to enable cudnn benchmark
cudnn_benchmark=False,
# set multi process parameters
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
# set distributed parameters
dist_cfg=dict(backend='nccl'),
)

set visualizer

visualizer = None

set log level

log_level = 'INFO'

load from which checkpoint

load_from = None

whether to resume training from the loaded checkpoint

resume = False

Defaults to use random seed and disable `deterministic`

randomness = dict(seed=None, deterministic=False)

set log processor

log_processor = dict(by_epoch=False)`

“XTuner多模态训练与测试”训练模型指令执行报错TypeError: 'NoneType object is not subscriptable in xxxx

按照所提供教程执行的，到如图所示命令出现报错无法解决，求助🆘

无法使用lmdeploy chat

我按照教程配置了所有的环境内容，但是一运行就报错..
后面我离线转换的也是报错这个

(lmdeploy) root@intern-studio:~# lmdeploy chat turbomind /share/temp/model_repos/internlm-chat-7b/ --model-name internlm-chat-7b
model_source: hf_model
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
WARNING: Can not find tokenizer.json. It may take long time to initialize the tokenizer.
model_config:
{
"model_name": "internlm-chat-7b",
"tensor_para_size": 1,
"head_num": 32,
"kv_head_num": 32,
"vocab_size": 103168,
"num_layer": 32,
"inter_size": 11008,
"norm_eps": 1e-06,
"attn_bias": 1,
"start_id": 1,
"end_id": 2,
"session_len": 2056,
"weight_type": "fp16",
"rotary_embedding": 128,
"rope_theta": 10000.0,
"size_per_head": 128,
"group_size": 0,
"max_batch_size": 64,
"max_context_token_num": 1,
"step_length": 1,
"cache_max_entry_count": 0.5,
"cache_block_seq_len": 128,
"cache_chunk_size": 1,
"use_context_fmha": 1,
"quant_policy": 0,
"max_position_embeddings": 2048,
"rope_scaling_factor": 0.0,
"use_logn_attn": 0
}
get 323 model params
[WARNING] gemm_config.in is not found; using default GEMM algo
session 1

double enter to end input >>> hello

<|System|>:You are an AI assistant whose name is InternLM (书生·浦语).

InternLM (书生·浦语) is a conversational language model that is developed by Shanghai AI Laboratory (上海人工智能实验室). It is designed to be helpful, honest, and harmless.
InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.

<|User|>:hello
<|Bot|>: [AMP ERROR][CudaFrontend.cpp:94][1705496068:532304]failed to call cuCtxGetDevice(&device), error code: CUDA_ERROR_INVALID_CONTEXT

===============================================

Back trace dump:
/usr/local/harp/lib/libvirtdev-frontend.so.0(LogStream::PrintBacktrace()+0x52) [0x7fc0d92cc302]
/lib/x86_64-linux-gnu/libcuda.so.1(CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, int const*)+0x241) [0x7fc0d94fb471]
/lib/x86_64-linux-gnu/libcuda.so.1(python: CudaFrontend.cpp:94: static const string& CudaFeApiStateData::GetCurrentDevicePciBusId(Frontend*, const CUdevice*): Assertion `0' failed.
Aborted (core dumped)

tests.test_query_gradi 脚本名称拼写错误

错误原因： gradio写成了gradi

具体内容：
python3 -m tests.test_query_gradi --work_dir <新向量数据库路径>
应该改为
python3 -m tests.test_query_gradio --work_dir <新向量数据库路径>

ImportError: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found

在部署应用到openXLab GPU 平台环境时发现，缺省的docker缺乏glibc 2.29.
由于新版transformers的tokenizers需要2.29版本的GLIBC，所以申请预安装。

我看了一个相关的已关闭的issue
但是我自己部署还是报这个错误
镜像使用

importlib.metadata.PackageNotFoundError: No package metadata was found for xtuner

反复按照教程安装出现这个问题：
(xtuner0.1.9) root@intern-studio:~/xtuner019/xtuner# xtuner
Traceback (most recent call last):
File "/root/.local/bin/xtuner", line 33, in
sys.exit(load_entry_point('xtuner', 'console_scripts', 'xtuner')())
File "/root/.local/bin/xtuner", line 22, in importlib_load_entry_point
for entry_point in distribution(dist_name).entry_points
File "/share/conda_envs/internlm-base/lib/python3.10/importlib/metadata/init.py", line 969, in
distribution return Distribution.from_name(distribution_name)
File "/share/conda_envs/internlm-base/lib/python3.10/importlib/metadata/init.py", line 548, in
from_name raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for xtuner

之前可以训练，后来更换参数和数据报以下错误，就直接尝试在base环境里安装就出现上面的错误。
nohup: ignoring input
[2024-01-11 19:09:22,787] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-01-11 19:10:00,577] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
01/11 19:10:22 - mmengine - INFO -

System environment:
sys.platform: linux
Python: 3.10.13 (main, Sep 11 2023, 13:44:35) [GCC 11.2.0]
CUDA available: True
numpy_random_seed: 716082487
GPU 0: NVIDIA A100-SXM4-80GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.0.1
PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2023.1-Product Build 20230303 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 11.7
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
CuDNN 8.5
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.15.2
OpenCV: 4.9.0
MMEngine: 0.10.2

Runtime environment:
launcher: none
randomness: {'seed': None, 'deterministic': False}
cudnn_benchmark: False
mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
dist_cfg: {'backend': 'nccl'}
seed: None
deterministic: False
Distributed launcher: none
Distributed training: False
GPU number: 1

01/11 19:10:22 - mmengine - INFO - Config:
SYSTEM = ''
accumulative_counts = 16
batch_size = 1
betas = (
0.9,
0.999,
)
custom_hooks = [
dict(
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.engine.DatasetInfoHook'),
dict(
evaluation_inputs=[
'请给我介绍五个上海的景点',
'Please tell me five scenic spots in Shanghai',
],
every_n_iters=500,
prompt_template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
system='',
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.engine.EvaluateChatHook'),
]
data_path = '/root/code/xturn/grade-school-math/grade_school_math/data/new'
dataloader_num_workers = 0
default_hooks = dict(
checkpoint=dict(interval=1, type='mmengine.hooks.CheckpointHook'),
logger=dict(interval=10, type='mmengine.hooks.LoggerHook'),
param_scheduler=dict(type='mmengine.hooks.ParamSchedulerHook'),
sampler_seed=dict(type='mmengine.hooks.DistSamplerSeedHook'),
timer=dict(type='mmengine.hooks.IterTimerHook'))
env_cfg = dict(
cudnn_benchmark=False,
dist_cfg=dict(backend='nccl'),
mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
evaluation_freq = 500
evaluation_inputs = [
'请给我介绍五个上海的景点',
'Please tell me five scenic spots in Shanghai',
]
launcher = 'none'
load_from = None
log_level = 'INFO'
lr = 0.0002
max_epochs = 3
max_length = 2048
max_norm = 1
model = dict(
llm=dict(
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
quantization_config=dict(
bnb_4bit_compute_dtype='torch.float16',
bnb_4bit_quant_type='nf4',
bnb_4bit_use_double_quant=True,
llm_int8_has_fp16_weight=False,
llm_int8_threshold=6.0,
load_in_4bit=True,
load_in_8bit=False,
type='transformers.BitsAndBytesConfig'),
torch_dtype='torch.float16',
trust_remote_code=True,
type='transformers.AutoModelForCausalLM.from_pretrained'),
lora=dict(
bias='none',
lora_alpha=16,
lora_dropout=0.1,
r=64,
task_type='CAUSAL_LM',
type='peft.LoraConfig'),
type='xtuner.model.SupervisedFinetune')
optim_type = 'bitsandbytes.optim.PagedAdamW32bit'
optim_wrapper = dict(
optimizer=dict(
betas=(
0.9,
0.999,
),
lr=0.0002,
type='bitsandbytes.optim.PagedAdamW32bit',
weight_decay=0),
type='DeepSpeedOptimWrapper')
pack_to_max_length = True
param_scheduler = dict(
T_max=3,
by_epoch=True,
convert_to_iter_based=True,
eta_min=0.0,
type='mmengine.optim.CosineAnnealingLR')
pretrained_model_name_or_path = '/root/model/Shanghai_AI_Laboratory/internlm-chat-7b'
prompt_template = 'xtuner.utils.PROMPT_TEMPLATE.internlm_chat'
randomness = dict(deterministic=False, seed=None)
resume = False
runner_type = 'FlexibleRunner'
strategy = dict(
config=dict(
bf16=dict(enabled=True),
fp16=dict(enabled=False, initial_scale_power=16),
gradient_accumulation_steps='auto',
gradient_clipping='auto',
train_micro_batch_size_per_gpu='auto',
zero_allow_untested_optimizer=True,
zero_force_ds_cpu_optimizer=False,
zero_optimization=dict(overlap_comm=True, stage=2)),
exclude_frozen_parameters=True,
gradient_accumulation_steps=16,
gradient_clipping=1,
train_micro_batch_size_per_gpu=1,
type='DeepSpeedStrategy')
tokenizer = dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained')
train_cfg = dict(by_epoch=True, max_epochs=3, val_interval=1)
train_dataloader = dict(
batch_size=1,
collate_fn=dict(type='xtuner.dataset.collate_fns.default_collate_fn'),
dataset=dict(
dataset=dict(
path=
'/root/code/xturn/grade-school-math/grade_school_math/data/new',
type='datasets.load_dataset'),
dataset_map_fn='xtuner.dataset.map_fns.oasst1_map_fn',
max_length=2048,
pack_to_max_length=True,
remove_unused_columns=True,
shuffle_before_pack=True,
template_map_fn=dict(
template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
type='xtuner.dataset.map_fns.template_map_fn_factory'),
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.dataset.process_hf_dataset'),
num_workers=0,
sampler=dict(shuffle=True, type='mmengine.dataset.DefaultSampler'))
train_dataset = dict(
dataset=dict(
path='/root/code/xturn/grade-school-math/grade_school_math/data/new',
type='datasets.load_dataset'),
dataset_map_fn='xtuner.dataset.map_fns.oasst1_map_fn',
max_length=2048,
pack_to_max_length=True,
remove_unused_columns=True,
shuffle_before_pack=True,
template_map_fn=dict(
template='xtuner.utils.PROMPT_TEMPLATE.internlm_chat',
type='xtuner.dataset.map_fns.template_map_fn_factory'),
tokenizer=dict(
padding_side='right',
pretrained_model_name_or_path=
'/root/model/Shanghai_AI_Laboratory/internlm-chat-7b',
trust_remote_code=True,
type='transformers.AutoTokenizer.from_pretrained'),
type='xtuner.dataset.process_hf_dataset')
visualizer = None
weight_decay = 0
work_dir = './work_dirs/internlm_chat_7b_qlora_oasst1_e3_copy'

01/11 19:10:23 - mmengine - WARNING - Failed to search registry with scope "mmengine" in the "builder" registry tree. As a workaround, the current "builder" registry in "xtuner" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmengine" is a correct scope, or whether the registry is initialized.
01/11 19:10:24 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook

before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DatasetInfoHook
(NORMAL ) EvaluateChatHook
(VERY_LOW ) CheckpointHook

before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook

before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook

after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) EvaluateChatHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_train_epoch:
(NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

before_val:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook

before_val_epoch:
(NORMAL ) IterTimerHook

before_val_iter:
(NORMAL ) IterTimerHook

after_val_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_val:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) EvaluateChatHook

after_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) EvaluateChatHook
(VERY_LOW ) CheckpointHook

before_test:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook

before_test_epoch:
(NORMAL ) IterTimerHook

before_test_iter:
(NORMAL ) IterTimerHook

after_test_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_test:
(VERY_HIGH ) RuntimeInfoHook

after_run:
(BELOW_NORMAL) LoggerHook

01/11 19:10:26 - mmengine - WARNING - Dataset Dataset has no metainfo. dataset_meta in visualizer will be None.
quantization_config convert to <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>

Loading checkpoint shards: 0%| | 0/8 [00:00<?, ?it/s]
Loading checkpoint shards: 12%|█▎ | 1/8 [00:03<00:23, 3.34s/it]
Loading checkpoint shards: 25%|██▌ | 2/8 [00:06<00:19, 3.30s/it]
Loading checkpoint shards: 38%|███▊ | 3/8 [00:09<00:16, 3.25s/it]
Loading checkpoint shards: 50%|█████ | 4/8 [00:12<00:12, 3.17s/it]
Loading checkpoint shards: 62%|██████▎ | 5/8 [00:16<00:09, 3.20s/it]
Loading checkpoint shards: 75%|███████▌ | 6/8 [00:20<00:07, 3.65s/it]
Loading checkpoint shards: 88%|████████▊ | 7/8 [00:23<00:03, 3.43s/it]
Loading checkpoint shards: 100%|██████████| 8/8 [00:24<00:00, 2.55s/it]
Loading checkpoint shards: 100%|██████████| 8/8 [00:24<00:00, 3.04s/it]
01/11 19:10:53 - mmengine - INFO - dispatch internlm attn forward
01/11 19:10:53 - mmengine - WARNING - Due to the implementation of the PyTorch version of flash attention, even when the output_attentions flag is set to True, it is not possible to return the attn_weights.
[2024-01-11 19:11:48,486] [INFO] [logging.py:96:log_dist] [Rank -1] DeepSpeed info: version=0.12.6, git-hash=unknown, git-branch=unknown
[2024-01-11 19:11:48,486] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-01-11 19:11:48,486] [INFO] [comm.py:652:init_distributed] Not using the DeepSpeed or dist launchers, attempting to detect MPI environment...
[2024-01-11 19:11:48,653] [INFO] [comm.py:702:mpi_discovery] Discovered MPI settings of world_rank=0, local_rank=0, world_size=1, master_addr=192.168.237.56, master_port=29500
[2024-01-11 19:11:48,653] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-01-11 19:11:49,014] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-01-11 19:11:49,019] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-01-11 19:11:49,019] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-01-11 19:11:49,089] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = PagedAdamW32bit
[2024-01-11 19:11:49,089] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=PagedAdamW32bit type=<class 'bitsandbytes.optim.adamw.PagedAdamW32bit'>
[2024-01-11 19:11:49,089] [WARNING] [engine.py:1166:do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2024-01-11 19:11:49,089] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.bfloat16 ZeRO stage 2 optimizer
[2024-01-11 19:11:49,090] [INFO] [stage_1_and_2.py:148:init] Reduce bucket size 500,000,000
[2024-01-11 19:11:49,090] [INFO] [stage_1_and_2.py:149:init] Allgather bucket size 500,000,000
[2024-01-11 19:11:49,090] [INFO] [stage_1_and_2.py:150:init] CPU Offload: False
[2024-01-11 19:11:49,090] [INFO] [stage_1_and_2.py:151:init] Round robin gradient partitioning: False
[2024-01-11 19:11:51,497] [INFO] [utils.py:791:see_memory_usage] Before initializing optimizer states
[2024-01-11 19:11:51,498] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 5.93 GB CA 6.31 GB Max_CA 6 GB
[2024-01-11 19:11:51,498] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 105.09 GB, percent = 5.2%
[2024-01-11 19:11:51,748] [INFO] [utils.py:791:see_memory_usage] After initializing optimizer states
[2024-01-11 19:11:51,748] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 6.23 GB CA 6.91 GB Max_CA 7 GB
[2024-01-11 19:11:51,749] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 105.32 GB, percent = 5.2%
[2024-01-11 19:11:51,749] [INFO] [stage_1_and_2.py:516:init] optimizer state initialized
[2024-01-11 19:11:51,870] [INFO] [utils.py:791:see_memory_usage] After initializing ZeRO optimizer
[2024-01-11 19:11:51,871] [INFO] [utils.py:792:see_memory_usage] MA 5.63 GB Max_MA 5.63 GB CA 6.91 GB Max_CA 7 GB
[2024-01-11 19:11:51,871] [INFO] [utils.py:799:see_memory_usage] CPU Virtual Memory: used = 105.39 GB, percent = 5.2%
[2024-01-11 19:11:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = PagedAdamW32bit
[2024-01-11 19:11:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-01-11 19:11:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = None
[2024-01-11 19:11:51,882] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0002], mom=[(0.9, 0.999)]
[2024-01-11 19:11:51,886] [INFO] [config.py:984:print] DeepSpeedEngine configuration:
[2024-01-11 19:11:51,886] [INFO] [config.py:988:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-01-11 19:11:51,886] [INFO] [config.py:988:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-01-11 19:11:51,886] [INFO] [config.py:988:print] amp_enabled .................. False
[2024-01-11 19:11:51,886] [INFO] [config.py:988:print] amp_params ................... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] bfloat16_enabled ............. True
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] checkpoint_parallel_write_pipeline False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] checkpoint_tag_validation_enabled True
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] checkpoint_tag_validation_fail False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f0b4ba0bee0>
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] communication_data_type ...... None
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] curriculum_enabled_legacy .... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] curriculum_params_legacy ..... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] data_efficiency_enabled ...... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] dataloader_drop_last ......... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] disable_allgather ............ False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] dump_state ................... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] dynamic_loss_scale_args ...... None
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_enabled ........... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_gas_boundary_resolution 1
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_layer_num ......... 0
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_max_iter .......... 100
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_stability ......... 1e-06
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_tol ............... 0.01
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] eigenvalue_verbose ........... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] elasticity_enabled ........... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] fp16_auto_cast ............... None
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] fp16_enabled ................. False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] fp16_master_weights_and_gradients False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] global_rank .................. 0
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] grad_accum_dtype ............. None
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] gradient_accumulation_steps .. 16
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] gradient_clipping ............ 1
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] gradient_predivide_factor .... 1.0
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] graph_harvesting ............. False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] initial_dynamic_scale ........ 1
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] load_universal_checkpoint .... False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] loss_scale ................... 1.0
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] memory_breakdown ............. False
[2024-01-11 19:11:51,887] [INFO] [config.py:988:print] mics_hierarchial_params_gather False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] mics_shard_size .............. -1
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] optimizer_legacy_fusion ...... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] optimizer_name ............... None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] optimizer_params ............. None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] pld_enabled .................. False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] pld_params ................... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] prescale_gradients ........... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] scheduler_name ............... None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] scheduler_params ............. None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] seq_parallel_communication_data_type torch.float32
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] sparse_attention ............. None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] sparse_gradients_enabled ..... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] steps_per_print .............. 10000000000000
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] train_batch_size ............. 16
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] train_micro_batch_size_per_gpu 1
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] use_data_before_expert_parallel False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] use_node_local_storage ....... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] wall_clock_breakdown ......... False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] weight_quantization_config ... None
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] world_size ................... 1
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_allow_untested_optimizer True
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True pipeline_loading_checkpoint=False override_module_apply=True
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_enabled ................. True
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_force_ds_cpu_optimizer .. False
[2024-01-11 19:11:51,888] [INFO] [config.py:988:print] zero_optimization_stage ...... 2
[2024-01-11 19:11:51,888] [INFO] [config.py:974:print_user_config] json = {
"gradient_accumulation_steps": 16,
"train_micro_batch_size_per_gpu": 1,
"gradient_clipping": 1,
"zero_allow_untested_optimizer": true,
"zero_force_ds_cpu_optimizer": false,
"zero_optimization": {
"stage": 2,
"overlap_comm": true
},
"fp16": {
"enabled": false,
"initial_scale_power": 16
},
"bf16": {
"enabled": true
},
"steps_per_print": 1.000000e+13
}
Traceback (most recent call last):
File "/root/xtuner019/xtuner/xtuner/tools/train.py", line 260, in
main()
File "/root/xtuner019/xtuner/xtuner/tools/train.py", line 256, in main
runner.train()
File "/root/.local/lib/python3.10/site-packages/mmengine/runner/_flexible_runner.py", line 1182, in train
self.strategy.prepare(
File "/root/.local/lib/python3.10/site-packages/mmengine/_strategy/deepspeed.py", line 389, in prepare
self.param_schedulers = self.build_param_scheduler(
File "/root/.local/lib/python3.10/site-packages/mmengine/_strategy/base.py", line 658, in build_param_scheduler
param_schedulers = self._build_param_scheduler(
File "/root/.local/lib/python3.10/site-packages/mmengine/_strategy/base.py", line 563, in _build_param_scheduler
PARAM_SCHEDULERS.build(
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/.local/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 294, in build_scheduler_from_cfg
return scheduler_cls.build_iter_from_epoch( # type: ignore
File "/root/.local/lib/python3.10/site-packages/mmengine/optim/scheduler/param_scheduler.py", line 663, in build_iter_from_epoch
assert epoch_length is not None and epoch_length > 0,
AssertionError: epoch_length must be a positive integer, but got 0.
请各位大佬看看

3090量化llama3 70b爆显存

在3090显卡，24g显存，使用lmdeploy lite awq量化llama3 70b在79层爆显存，按照建议增加了PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
在nvidia-smi看只使用了第一张卡，可以使用多张卡吗？

openxlab pip 安装的时候显示缺少debian依赖项 swig 但是根目录packages.txt已经存放了Debian依赖项 swig

openxlab pip 安装的时候显示缺少该依赖项 swig
但是packages.txt已经存放了Debian依赖项 swig
我的项目地址：https://github.com/jabberwockyang/MedicalReviewAgent/

我的requirements.txt

accelerate>=0.26.1
aiohttp
auto-gptq
bcembedding
beautifulsoup4
datrie==0.8.2
duckduckgo_search
cachetools==5.3.3
einops
faiss-gpu
hanziconv==0.3.2
gradio==4.25.0
langchain>=0.1.12
langchain-community==0.0.38
loguru
lxml_html_clean
openai>=1.0.0
openpyxl
pandas
pdfplumber==0.10.4
pydantic>=1.10.13
PyPDF2==3.0.1
pymupdf
python-docx
pytoml
readability-lxml
redis
requests
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
scikit-learn
sentence_transformers==2.2.2
StrEnum==0.4.15
# See https://github.com/deanmalmgren/textract/issues/461
# textract @ git+https://github.com/tpoisonooo/textract@master
textract
tiktoken
torch>=2.0.0
transformers>=4.37.0
transformers_stream_generator
unstructured
xgboost==2.0.3
# onnxruntime-gpu==1.17.1
onnxruntime-gpu
shapely==2.0.3
pyclipper==1.3.0.post5
xpinyin==0.7.6
opencv-python==4.9.0.80

我的packages.txt:

libgl1-mesa-glx
swig
libpulse-dev

构建log报错部分:


  Building wheel for pocketsphinx (setup.py): started
  Building wheel for pocketsphinx (setup.py): finished with status 'error'
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [4 lines of output]
      running bdist_wheel
      running build_ext
      swig -python -modern -Ideps/sphinxbase/include -Ideps/sphinxbase/include/sphinxbase -Ideps/sphinxbase/include/android -Ideps/sphinxbase/swig -outdir sphinxbase -o swig/sphinxbase/ad_wrap.c swig/sphinxbase/ad.i
      error: command 'swig' failed: No such file or directory
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for pocketsphinx
  Running setup.py clean for pocketsphinx
  Building wheel for python-pptx (setup.py): started
  Building wheel for python-pptx (setup.py): finished with status 'done'
  Created wheel for python-pptx: filename=python_pptx-0.6.5-py3-none-any.whl size=237916 sha256=6b403598c7ab61172040c7a30d733d11a730e49dd58e8ee353889672d9354647
  Stored in directory: /home/xlab-app-center/.cache/pip/wheels/9e/be/30/b674cc595ee134f617509231a1bfb635a3f6927b9846735e37
Successfully built datrie hanziconv textract docx2txt EbookLib python-pptx
Failed to build pocketsphinx
ERROR: Could not build wheels for pocketsphinx, which is required to install pyproject.toml-based projects
ERROR: executor failed running [/bin/sh -c /bin/bash -c 'if [[ -f requirements.txt ]]; then pip --timeout=100 install --no-warn-script-location  -r requirements.txt; else echo no requirements.txt ; fi']: runc did not terminate sucessfully
============================== build end ==============================

第3节课作业(6班)

Discussed in #274

^{Originally posted by maxchiron January 11, 2024}
第3节课作业(6班)

基础作业：

复现课程知识库助手搭建过程 (截图)
LangChain相关环境配置

数据收集

构建检索问答链

最终结果

微调训练后，直接提问数据集问题，没有得到预期回答

按文档训练后，直接提问数据集问题，没有得到预期回答

之前训练还加了参数deepspeed，也是一样对话结果，还以为是使用了这个参数影响，后面重新来了一遍跟文档一样的配置操作，仅仅区别只做了一轮训练

天气agent的问题，无输出

我跑了范例，发现没有输出。然后我进行debug。 debug到这个信息，麻烦指点一下，为什么actionReturn 一个 NoAction？

前面输出正常

后面action出错

debug位置

感觉adapter有问题，

比如 prompt如下：
[{'role': 'system', 'content': "你是一个可以调用外部工具的助手，可以使用的工具包括：\n{'GoogleSearch': '一个可以从谷歌搜索结果的API。\n当你需要对于一个特定问题找到简短明了的回答时，可以使用它。\n输入应该是一个搜索查询。\n'}\n如果使用工具请遵循以下格式回复：\n\nThought:思考你当前步骤需要解决什么问题，是否需要使用工具\nAction:工具名称，你的工具必须从 [['GoogleSearch']] 选择\nAction Input:工具输入参数\n\n工具返回按照以下格式回复：\n\nResponse:调用工具后的结果\n\n如果你已经知道了答案，或者你不需要工具，请遵循以下格式回复\n\nThought:给出最终答案的思考过程\nFinal Answer:最终答案\n\n开始!"}, {'role': 'user', 'content': '深圳明天的天气？'}]

模型会回应如下：
"Thought:我需要查询天气预报API来回答这个问题。\nAction:forecast_weather\nAction Input:{'location': '深圳', 'days': 1}}"

追踪到lagent/actions/action_executor.py 发现forecast_weather不合法，程序只能接受GoogleSearch，也就是说模型不给出GoogleSearch的action就无法获得结果。到了react.py chat第4轮, 如果action是 NoAction，那么就会得到空值。如果没有在4轮以内收到action是FinishAction，那么得到一个default_response，无法回答问题。

感觉debug差不多了，估计要麻烦老师看一下魔搭的adapter了。

4bit量化后问答失效

internlm模型

直接转为 turbomind模型，并部署，一切正常：

先计算minmax
lmdeploy lite calibrate
--model /home/zhanghui/models/internlm/internlm-chat-7b/
--calib_dataset "c4"
--calib_samples 128
--calib_seqlen 2048
--work_dir ./quant_output

再#通过 minmax 获取量化参数
lmdeploy lite kv_qparams
--work_dir ./quant_output
--turbomind_dir workspace/triton_models/weights/
--kv_sym False
--num_tp 1

再进行4bit量化：
lmdeploy lite auto_awq
--model /home/zhanghui/models/internlm/internlm-chat-7b/
--w_bits 4
--w_group_size 128
--work_dir ./quant_output

将量化后的模型转为TurboMind格式：
lmdeploy convert internlm-chat-7b ./quant_output
--model-format awq
--group-size 128
--dst_path ./workspace_quant

lmdeploy chat turbomind ./workspace_quant

问答失效了！

lmdeploy chat turbomind ./workspace_quant --model-format awq

依然如此

具体流程参见 https://zhuanlan.zhihu.com/p/678960135

23班第一节课笔记

Uploading 第一节课笔记.docx…

申请openXLab 的docker预装一下 glibc-2.29

在部署应用到openXLab GPU 平台环境时发现，缺省的docker缺乏glibc 2.29.
由于新版transformers的tokenizers需要2.29版本的GLIBC，所以申请预安装。

脚本如下：

wget -c https://ftp.gnu.org/gnu/glibc/glibc-2.29.tar.gz
tar -zxvf glibc-2.29.tar.gz
mkdir glibc-2.29/build
cd glibc-2.29/build
../configure --prefix=/opt/glibc
make
make install

root的password是什么

请问配置本地端口的时候，最后一步这里的password应该填什么呢？

申请openXLab 的docker预装sqlite3 >= 3.35.0

当跑到加载chroma向量数据库时，会出现一下error：

File "/usr/local/share/python/.pyenv/versions/3.9.16/lib/python3.9/site-packages/chromadb/init.py", line 79, in
raise RuntimeError(
RuntimeError: Your system has an unsupported version of sqlite3. Chroma requires sqlite3 >= 3.35.0.

需要docker预装的sqlite3 >= 3.35.0

camp2 lmdeploy llava运行时输入高分辨率图片会返回空字符串

项目地址 https://github.com/InternLM/Tutorial/blob/camp2/lmdeploy/README.md#61-%E4%BD%BF%E7%94%A8lmdeploy%E8%BF%90%E8%A1%8C%E8%A7%86%E8%A7%89%E5%A4%9A%E6%A8%A1%E6%80%81%E5%A4%A7%E6%A8%A1%E5%9E%8Bllava

当输入一张分辨率为 1920*1080 分辨率的图片时不会返回文字

打印response时显示text为空

解决方法为手动降低分辨率

import gradio as gr
from lmdeploy import pipeline


# pipe = pipeline('liuhaotian/llava-v1.6-vicuna-7b') 非开发机运行此命令
pipe = pipeline('/share/new_models/liuhaotian/llava-v1.6-vicuna-7b')

def model(image, text):
    if image is None:
        return [(text, "请上传一张图片。")]
    else:
        width, height = image.size
        print(f"width = {width}, height = {height}")

        # 调整图片最长宽/高为256
        if max(width, height) > 256:
            ratio = max(width, height) / 256
            n_width = int(width / ratio)
            n_height = int(height / ratio)
            print(f"new width = {n_width}, new height = {n_height}")
            image = image.resize((n_width, n_height))

        response = pipe((text, image)).text
        print(f"response: {response}")
        return [(text, response)]

demo = gr.Interface(fn=model, inputs=[gr.Image(type="pil"), gr.Textbox()], outputs=gr.Chatbot())
demo.launch()

更改后效果可以正常返回text

lmdeploy[all]==0.3.0安装是否只能在linux下进行

你好，lmdeploy[all]==0.3.0安装是否只能在linux下进行？windows下报这个错，然后我从网上找到的triton都是linux版本的

(lmdeploy) PS D:\pythonworkspace\MyPET> pip install lmdeploy[all]==0.3.0
Looking in indexes: https://mirrors.aliyun.com/pypi/simple
Collecting lmdeploy==0.3.0 (from lmdeploy[all]==0.3.0)
Downloading https://mirrors.aliyun.com/pypi/packages/2e/7c/b616d7485c4b81b2fe0a6a734548697d25293d5468210aafbf371d24a790/lmdeploy-0.3.0-cp310-cp310-win_amd64.whl (56.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.3/56.3 MB 5.3 MB/s eta 0:00:00
Requirement already satisfied: fastapi in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.110.1)
Requirement already satisfied: fire in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.6.0)
Requirement already satisfied: mmengine-lite in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.10.3)
Requirement already satisfied: numpy in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (1.26.4)
Collecting peft<=0.9.0 (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0)
Downloading https://mirrors.aliyun.com/pypi/packages/08/87/3e7eb34ac06d3f4ac72e2302e9e69bef12247a8a627c59a4d8a498135727/peft-0.9.0-py3-none-any.whl (190 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 190.9/190.9 kB 3.9 MB/s eta 0:00:00
Requirement already satisfied: pillow in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (10.3.0)
Requirement already satisfied: protobuf in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (4.25.3)
Requirement already satisfied: pydantic>2.0.0 in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (2.6.4)
Requirement already satisfied: pynvml in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (11.5.0)
Requirement already satisfied: safetensors in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.4.2)
Requirement already satisfied: sentencepiece in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.2.0)
Requirement already satisfied: shortuuid in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (1.0.13)
Requirement already satisfied: tiktoken in d:\program\anaconda\envs\lmdeploy\lib\site-packages (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0) (0.6.0)
Collecting torch<=2.1.2,>=2.0.0 (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0)
Downloading https://mirrors.aliyun.com/pypi/packages/16/bf/2ba0f0f7c07b9a14c027e181e44c58824e13f7352607ed32db18321599a2/torch-2.1.2-cp310-cp310-win_amd64.whl (192.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 192.3/192.3 MB 7.3 MB/s eta 0:00:00
Collecting transformers<=4.38.2,>=4.33.0 (from lmdeploy==0.3.0->lmdeploy[all]==0.3.0)
Downloading https://mirrors.aliyun.com/pypi/packages/b6/4d/fbe6d89fde59d8107f0a02816c4ac4542a8f9a85559fdf33c68282affcc1/transformers-4.38.2-py3-none-any.whl (8.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8.5/8.5 MB 7.7 MB/s eta 0:00:00
INFO: pip is looking at multiple versions of lmdeploy to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement triton<=2.2.0,>=2.1.0 (from lmdeploy) (from versions: none)
ERROR: No matching distribution found for triton<=2.2.0,>=2.1.0

lmdeploy教程疑问 - KV Cache量化和W4A16量化怎么叠加？

lmdeploy教程量化部分分别介绍了如何做KV Cache量化和W4A16量化，两者结果都得到turbomind格式的模型。
但怎么把这两者结合起来？比如在KV Cache量化的结果上做W4A16量化。
lmdeploy lite calibrate和lmdeploy lite auto_awq都𣎴接受turbomind格式的模型，该如何去叠加？

另外，如果想把量化后的模型和别人共享，怎么把turbomind格式的转换成hugging face格式的？

作业四参考答案里，最后调用的web_demo.py路径不对

答案中的路径：/InternLM/web_demo.py
实际的路径：/InternLM/chat/web_demo.py

internlm / tutorial Goto Github PK

tutorial's Introduction

InternLM

Introduction

News

Model Zoo

Performance

Objective Evaluation

Alignment Evaluation

Requirements

Usages

Import from Transformers

Import from ModelScope

Dialogue

Deployment

200K-long-context Inference

Agent

Fine-tuning

Evaluation

Objective Evaluation

Long-Context Evaluation (Needle in a Haystack)

Data Contamination Assessment

Agent Evaluation

Subjective Evaluation

Contribution

License

Citation

tutorial's People

Contributors

Stargazers

Watchers

Forkers

tutorial's Issues

Discussed in #62

Scheduler & Optimizer

Save

Evaluate the generation performance during the training

PART 1 Settings

Model

Data

parallel

Scheduler & Optimizer

Save

Evaluate the generation performance during the training

PART 2 Model & Tokenizer

PART 3 Dataset & Dataloader

PART 4 Scheduler & Optimizer

optimizer

learning policy

More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501

train, val, test setting

PART 5 Runtime

Log the dialogue periodically during the training process, optional

configure default hooks

configure environment

set visualizer

set log level

load from which checkpoint

whether to resume training from the loaded checkpoint

Defaults to use random seed and disable deterministic

set log processor

before_train: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (NORMAL ) DatasetInfoHook (NORMAL ) EvaluateChatHook (VERY_LOW ) CheckpointHook

before_train_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (NORMAL ) DistSamplerSeedHook

before_train_iter: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook

after_train_iter: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (NORMAL ) EvaluateChatHook (BELOW_NORMAL) LoggerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook

after_train_epoch: (NORMAL ) IterTimerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook

before_val: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) DatasetInfoHook

before_val_epoch: (NORMAL ) IterTimerHook

before_val_iter: (NORMAL ) IterTimerHook

after_val_iter: (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook

after_val_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook (LOW ) ParamSchedulerHook (VERY_LOW ) CheckpointHook

after_val: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) EvaluateChatHook

after_train: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) EvaluateChatHook (VERY_LOW ) CheckpointHook

before_test: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) DatasetInfoHook

before_test_epoch: (NORMAL ) IterTimerHook

before_test_iter: (NORMAL ) IterTimerHook

after_test_iter: (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook

after_test_epoch: (VERY_HIGH ) RuntimeInfoHook (NORMAL ) IterTimerHook (BELOW_NORMAL) LoggerHook

after_test: (VERY_HIGH ) RuntimeInfoHook

after_run: (BELOW_NORMAL) LoggerHook

Defaults to use random seed and disable `deterministic`

before_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DatasetInfoHook
(NORMAL ) EvaluateChatHook
(VERY_LOW ) CheckpointHook

before_train_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) DistSamplerSeedHook

before_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook

after_train_iter:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(NORMAL ) EvaluateChatHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_train_epoch:
(NORMAL ) IterTimerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

before_val:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook

before_val_epoch:
(NORMAL ) IterTimerHook

before_val_iter:
(NORMAL ) IterTimerHook

after_val_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_val_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW ) ParamSchedulerHook
(VERY_LOW ) CheckpointHook

after_val:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) EvaluateChatHook

after_train:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) EvaluateChatHook
(VERY_LOW ) CheckpointHook

before_test:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) DatasetInfoHook

before_test_epoch:
(NORMAL ) IterTimerHook

before_test_iter:
(NORMAL ) IterTimerHook

after_test_iter:
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_test_epoch:
(VERY_HIGH ) RuntimeInfoHook
(NORMAL ) IterTimerHook
(BELOW_NORMAL) LoggerHook

after_test:
(VERY_HIGH ) RuntimeInfoHook

after_run:
(BELOW_NORMAL) LoggerHook