thudm / codegeex2 Goto Github PK

View Code? Open in Web Editor NEW

7.1K 61.0 500.0 1.79 MB

CodeGeeX2: A More Powerful Multilingual Code Generation Model

Home Page: https://codegeex.cn

License: Apache License 2.0

Python 91.59% Shell 8.41%

code code-generation pretrained-models tool

codegeex2's Introduction

🏠 主页｜🛠 插件 VS Code, Jetbrains｜🤗 模型下载｜📄 论文｜👋 加入微信开发者交流群

Read this in English
日本語で読む
Lire en Français

CodeGeeX2: 更强大的多语言代码生成模型

CodeGeeX2 是多语言代码生成模型 CodeGeeX (KDD’23) 的第二代模型。不同于一代 CodeGeeX（完全在国产华为昇腾芯片平台训练），CodeGeeX2 是基于 ChatGLM2 架构加入代码预训练实现，得益于 ChatGLM2 的更优性能，CodeGeeX2 在多项指标上取得性能提升（+107% > CodeGeeX；仅60亿参数即超过150亿参数的 StarCoder-15B 近10%），更多特性包括：

更强大的代码能力：基于 ChatGLM2-6B 基座语言模型，CodeGeeX2-6B 进一步经过了 600B 代码数据预训练，相比一代模型，在代码能力上全面提升，HumanEval-X 评测集的六种编程语言均大幅提升 (Python +57%, C++ +71%, Java +54%, JavaScript +83%, Go +56%, Rust +321%)，在Python上达到 35.9% 的 Pass@1 一次通过率，超越规模更大的 StarCoder-15B。
更优秀的模型特性：继承 ChatGLM2-6B 模型特性，CodeGeeX2-6B 更好支持中英文输入，支持最大 8192 序列长度，推理速度较一代 CodeGeeX-13B 大幅提升，量化后仅需6GB显存即可运行，支持轻量级本地化部署。
更全面的AI编程助手：CodeGeeX插件（VS Code, Jetbrains）后端升级，支持超过100种编程语言，新增上下文补全、跨文件补全等实用功能。结合 Ask CodeGeeX 交互式AI编程助手，支持中英文对话解决各种编程问题，包括且不限于代码解释、代码翻译、代码纠错、文档生成等，帮助程序员更高效开发。
更开放的协议：CodeGeeX2-6B 权重对学术研究完全开放，填写登记表申请商业使用。

使用教程

AI编程助手

我们开发了支持 VS Code、 IntelliJ IDEA、PyCharm、GoLand、WebStorm、Android Studio 等IDE的 CodeGeeX 插件。在插件中，可以更直接地体验到 CodeGeeX2 模型在代码生成与补全、添加注释、代码翻译及技术问答方面的能力为开发效率带来的提升。欢迎在IDE中下载 CodeGeeX 插件获得更加全面的AI编程体验，详情见CodeGeeX主页。

快速开始

使用`transformers`快速调用CodeGeeX2-6B：

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()

# remember adding a language tag for better performance
prompt = "# language: Python\n# write a bubble sort function\n"
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_length=256, top_k=1)
response = tokenizer.decode(outputs[0])

>>> print(response)
# language: Python
# write a bubble sort function


def bubble_sort(list):
    for i in range(len(list) - 1):
        for j in range(len(list) - 1):
            if list[j] > list[j + 1]:
                list[j], list[j + 1] = list[j + 1], list[j]
    return list


print(bubble_sort([5, 2, 1, 8, 4]))

启动 Gradio DEMO：

python ./demo/run_demo.py

usage: run_demo.py [-h] [--model-path MODEL_PATH] [--example-path EXAMPLE_PATH] [--quantize QUANTIZE]
                   [--chatglm-cpp] [--fastllm] [--n-gpus N_GPUS] [--gpu GPU] [--cpu] [--auth] [--username yourname]
                   [--password yourpassword]
                   [--port PORT] [--listen ADDRESS]

# 若要启用身份验证，请先启用--auth，然后定义--username与--password，如：
python run_demo.py --auth --username user --password password  # 若要监听所有地址请指定 --listen 0.0.0.0

支持使用 ChatGLM.cpp 量化推理加速：

python ./demo/run_demo.py --quantize 4 --chatglm-cpp

启动FAST API:

python ./demo/fastapicpu.py
usage: fastapicpu.py [-h] [--model-path MODEL_PATH] [--listen ADDRESS] [--port PORT] [--workders NUM] [--cpu] [--half] [--quantize QUANTIZE] [--chatglm-cpp]
# --cpu启用cpu --half启用.half()

支持使用 ChatGLM.cpp 量化推理加速，同样添加 --quantize 4 --chatglm-cpp 参数即可。

API使用示例

curl -X POST "http://127.0.0.1:7860" \
    -H 'Content-Type: application/json' \
    -d '{"lang": "Python", "prompt": "# Write a quick sort function"}'

❗️请注意：

CodeGeeX2-6B 是一个基座代码生成模型，不具备聊天能力。请前往插件中体验更全面的 Ask CodeGeeX 聊天功能。
在使用 CodeGeeX2-6B 的补全功能时，输入prompt需要遵循特定的格式以获得最好的效果。比如需要在开头加入编程语言标签（# language: Python，请查看完整语言列表），以注释的形式写prompt等。参考run_demo.py中的处理。
如果显卡不支持bfloat16格式，将会输出错误的内容，需要将模型转换成float16格式：
```
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).half().cuda()
```

如果需要使用多显卡加载模型,可以将以下代码：

tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()

替换为

def get_model():
    tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
    from gpus import load_model_on_gpus
    # gpus文件在demo文件夹中
    model = load_model_on_gpus("THUDM/codegeex2-6b", num_gpus=2)
    model = model.eval()
    return tokenizer, model

tokenizer, model = get_model()

代码能力评测

CodeGeeX2 作为一个多语言代码生成基座模型，代码能力较上一代大幅提升，以下是在 HumanEval，HumanEval-X, DS1000 基准上的评测结果（评价指标 Pass@k 定义与论文中一致）：

HumanEval (Pass@1,10,100)

Model	Pass@1	Pass@10	Pass@100
CodeGen-16B-multi	19.2	34.6	55.2
CodeGeeX-13B	22.9	39.6	60.9
Codex-12B	28.8	46.8	72.3
CodeT5Plus-16B-mono	30.9	51.6	76.7
Code-Cushman-001	33.5	54.3	77.4
LLaMA-65B	23.7	-	79.3
LLaMA2-70B	29.9	-	-
CodeGen2.5-7B-mono	33.4	58.4	82.7
StarCoder-15B	33.2	61.0	84.7
CodeGeeX2-6B	35.9	62.6	88.3

Pass@1 使用 n=20, t=0.2, top_p=0.95；Pass@10,Pass@100 使用 n=200, t=0.8, top_p=0.95。

HumanEval-X (Pass@1)

Model	Python	C++	Java	JavaScript	Go	Rust	Overall
CodeGen-16B-multi	19.2	18.1	15.0	18.4	13.0	1.8	14.2
CodeGeeX-13B	22.9	17.1	20.0	17.6	14.4	4.3	16.0
Replit-code-v1-3B	22.0	20.1	20.1	20.1	12.2	8.6	17.2
CodeGen2.5-7B-multi	30.6	24.3	29.0	27.5	18.9	20.1	25.1
StarCoder-15B	35.5	28.2	31.5	33.2	21.3	17.8	27.9
CodeGeeX2-6B	35.9	29.3	30.8	32.2	22.5	18.1	28.1

Pass@1 使用 n=20, t=0.2, top_p=0.95。

以上结果可使用脚本scripts/run_humanevalx.sh复现。环境配置和说明参见评测环境。

DS1000 (Pass@1)

Model	Matplotlib	Numpy	Pandas	Pytorch	SciPy	Scikit-learn	TensorFlow	Overall
# Samples	155	220	291	68	106	115	45	1000
CodeGen-16B-Mono	31.7	10.9	3.4	7.0	9.0	10.8	15.2	11.7
code-cushman-001	40.7	21.8	7.9	12.4	11.3	18.0	12.2	18.1
Codex-001	41.8	26.6	9.4	9.7	15.0	18.5	17.2	20.2
CodeGeeX2-6B	40.5	25.5	14.5	17.3	19.3	24.0	23.0	23.1
StarCoder-15B	51.7	29.7	11.4	21.4	20.2	29.5	24.5	26.0
Codex-002	57.0	43.1	26.5	41.8	31.8	44.8	39.3	39.2

Pass@1 使用 n=40, t=0.2, top_p=0.5。

以上结果可使用DS1000评测代码复现。

量化推理性能

CodeGeeX2 与上一代相比，对部署更加友好。得益于使用 Multi-Query Attention 和 Flash Attention，推理速度更快，且量化后仅需6GB显存即可运行：

量化

Model	FP16/BF16	INT8	INT4
CodeGeeX-13B	26.9 GB	14.7 GB	-
CodeGeeX2-6B	13.1 GB	8.2 GB	5.5 GB

基于 PyTorch 2.0 测试，利用torch.nn.functional.scaled_dot_product_attention实现高效的 Attention 计算。

推理

Model	推理速度 (字符/秒)
CodeGeeX-13B	32
CodeGeeX2-6B	94

batch_size=1, max_length=2048，均使用加速框架，测试硬件为GeForce RTX-3090。

协议

本仓库的代码依照 Apache-2.0 协议开源，模型的权重的使用则需要遵循 Model License。CodeGeeX2-6B 权重对学术研究完全开放，填写登记表申请商业使用。

引用

如果觉得我们的工作有帮助，欢迎引用以下论文：

@inproceedings{zheng2023codegeex,
      title={CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X},
      author={Qinkai Zheng and Xiao Xia and Xu Zou and Yuxiao Dong and Shan Wang and Yufei Xue and Zihan Wang and Lei Shen and Andi Wang and Yang Li and Teng Su and Zhilin Yang and Jie Tang},
      booktitle={KDD},
      year={2023}
}

codegeex2's People

Contributors

Stargazers

Watchers

Forkers

geekcheng cifangyiquan audiebant linjie830914 linsnowx newlxj magicwang1111 zzzzzzzztx xinxiangbobby 44510 expert68 zengxishenggmail zppcgithub simson2010 pfxjacky shendanjun ai-awe zirenlegend mjpcasey jiutian12 gaoxiaoen einzbernvl zxc524580210 petercao mikejin5c jangocheng kioco jerryyin777 yexseven cnsdytzy eltociear laurenhaniza raisyatapriska azurazahraya badrinarania winning1120xx ferdaamalia dmn-tsk sasha010203 vital121 joonedaia1 joonemaman1 lyonru29 jaygith dxin-code hhy5277 jiancheng-ai pterameta songzhaoliang arechen asdhashd111 yibit lu-lucifer linguo123 ai-chef chayedan77 ahlfors guorunfa irtb kyleclk nomore22 yytwgithub redlegenddev dst1213 devenlu liangofthechen dxin123456 whnet rinaloving huggingtech codingonion itsharex guoliang88 nupri09 vishnu-c-r yab hitech777 hehehe159 piaolingxue bawa74090 pandong83 lpy1 weicaijiang l5276261 rizi960 donjuanplatinum neverstoplearn ghisarenjani anjarsanirena gmpdtd95 knightcn1983 xjbclz liukunda iambigfaner joncv weihangtan hipoooop ddawx123 jacky68147527 leesinebee

codegeex2's Issues

idea插件使用BUG

在一个项目时正常
在多个项目使用时回出现选中的代码出现在别的项目的codegeex对话框中的问题

场景
1.开启项目A，打开codegeek，选中代码，选中代码会出现在项目A对应的codegeex对话框中
2.开启项目B，打开codegeek，选中代码，选中代码会出现在项目B对应的codegeex对话框中
3.回到项目A，选中代码，选中代码，选中代码会出现在项目B对应的codegeex对话框中
4.回到项目B，选中代码，选中代码，选中代码会出现在项目B对应的codegeex对话框中
之后无论在哪里选中代码选中的代码都会出现在后打开的项目中的对话框中，包括ask translation 等功能

Is there a problem with test_string under `evaluation/evaluation.py`

for example, the java unittest complete code is like below under evaluation/evaluation.py

        elif language_type == "java":
            test_string = prompt + code + "\n" + test

but it seems that codegeex2 generate code (ref to run_demo.py) has including the prompt. So the snippet should be

        elif language_type == "java":
            test_string = code + "\n" + test

文档上写mac 可以使用int4量化后的模型，但是实际并不行

量化依赖cuda，但是mac 上没有这个。
难受-_-

做指令微调是否有一些最佳实践？

想基于CodeGeeX2做一些内部代码风格的微调，是否能给一些建议，诸如：数据格式、数据内容、微调方式和参数等。

[Help]2023.08.03 请问CodeGeeX2什么时候出微调的功能？

提问

Q1：现在(2023.08.03)是否支持微调？
Q2：如果不支持，大概计划什么时候出微调功能？

请问如何使用上下文补全能力？

文档中介绍CodeGeex2支持上下文补全，即FIM。示例中测试Humaneva是CasualLM，请问要测试FIM需要如何构造Input_ids、attention_mask？能否给出示例代码？

请问关于sql生成的prompt格式应该是怎么样的？

如题，想要测试sql生成的能力，推荐以什么样形式的prompt把表结构（表名称，字段，字段类型）输入给模型？
例如有一张财产表（property）,需要找到钱最多的人，如下输入会输出一些无关信息

# -- language: SQL\n# table name: property_info. columns: id, name, bank_number, deposit, gender. According to the table infomation above, find the people with most money

输出
# Write your MySQL query statement below\nselect name, bank_number, deposit from property_info where gender = 'male' order by deposit desc limit 1;

想咨询一下训练时的prompt格式时怎样的？

CodeGeeX2 plugin

CodeGeeX2 VSCode 插件还没开放吗？搜了下好像只有CodeGeeX

未来会考虑支持neovim插件吗？

demo报错Connection timed out

python ./demo/run_demo.py后，运行2到3个请求后台就报错了。
Running on local URL: http://0.0.0.0:8032 *** Failed to connect to ec2.gradio.app:22: [Errno 110] Connection timed out
run_demo.py修改如下：
demo.queue().launch(share=True, inbrowser=True, server_name="0.0.0.0", server_port=8032)

12G显存执行失败直接python ./demo/run_demo.py 什么都没有改

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:13<00:00, 1.98s/it]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ X:\post\InvokeAI-installer-v2.3.5.post2\pythonProject\code1\demo\run_demo.py:11 in │
│ │
│ 8 from transformers import AutoTokenizer, AutoModel │
│ 9 │
│ 10 tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True) │
│ ❱ 11 model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True).to('cuda │
│ 12 model = model.eval() │
│ 13 │
│ 14 examples = [] │
│ │
│ C:\Python311\Lib\site-packages\transformers\modeling_utils.py:1902 in to │
│ │
│ 1899 │ │ │ │ " model has already been set to the correct devices and casted to the co │
│ 1900 │ │ │ ) │
│ 1901 │ │ else: │
│ ❱ 1902 │ │ │ return super().to(*args, **kwargs) │
│ 1903 │ │
│ 1904 │ def half(self, *args): │
│ 1905 │ │ # Checks if the model has been loaded in 8-bit │
│ │
│ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:1152 in to │
│ │
│ 1149 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │
│ 1150 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │
│ 1151 │ │ │
│ ❱ 1152 │ │ return self._apply(convert) │
│ 1153 │ │
│ 1154 │ def register_full_backward_pre_hook( │
│ 1155 │ │ self, │
│ │
│ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:802 in _apply │
│ │
│ 799 │ def _apply(self, fn, recurse=True): │
│ 800 │ │ if recurse: │
│ 801 │ │ │ for module in self.children(): │
│ ❱ 802 │ │ │ │ module._apply(fn) │
│ 803 │ │ │
│ 804 │ │ def compute_should_use_set_data(tensor, tensor_applied): │
│ 805 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │
│ │
│ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:802 in _apply │
│ │
│ 799 │ def _apply(self, fn, recurse=True): │
│ 800 │ │ if recurse: │
│ 801 │ │ │ for module in self.children(): │
│ ❱ 802 │ │ │ │ module._apply(fn) │
│ 803 │ │ │
│ 804 │ │ def compute_should_use_set_data(tensor, tensor_applied): │
│ 805 │ │ │ if torch._has_compatible_shallow_copy_type(tensor, tensor_applied): │
│ │
│ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:825 in _apply │
│ │
│ 822 │ │ │ # track autograd history of param_applied, so we have to use │
│ 823 │ │ │ # with torch.no_grad(): │
│ 824 │ │ │ with torch.no_grad(): │
│ ❱ 825 │ │ │ │ param_applied = fn(param) │
│ 826 │ │ │ should_use_set_data = compute_should_use_set_data(param, param_applied) │
│ 827 │ │ │ if should_use_set_data: │
│ 828 │ │ │ │ param.data = param_applied │
│ │
│ C:\Python311\Lib\site-packages\torch\nn\modules\module.py:1150 in convert │
│ │
│ 1147 │ │ │ if convert_to_format is not None and t.dim() in (4, 5): │
│ 1148 │ │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() els │
│ 1149 │ │ │ │ │ │ │ non_blocking, memory_format=convert_to_format) │
│ ❱ 1150 │ │ │ return t.to(device, dtype if t.is_floating_point() or t.is_complex() else No │
│ 1151 │ │ │
│ 1152 │ │ return self._apply(convert) │
│ 1153 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB. GPU 0 has a total capacty of 12.00 GiB of which 0 bytes is free. Of the allocated memory 11.16 GiB is allocated by PyTorch, and 1.31 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting
max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

run_demo.py程序在Windows环境下报错，example_inputs.jsonl无法正常加载。

运行环境：Windows11 x64，Python 3.11，Torch 2.0 cuda 118
执行：python run_demo.py后，模型加载正常，读取example_inputs.jsonl报编码错误：

python my_web_demo.py
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [00:07<00:00, 1.10s/it]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ D:\AITest\CodeGeeX2\my_web_demo.py:17 in │
│ │
│ 14 │
│ 15 examples = [] │
│ 16 with open(os.path.join(os.path.split(os.path.realpath(file))[0], "CodeGeeX2-example_ │
│ ❱ 17 │ for line in f: │
│ 18 │ │ examples.append(list(json.loads(line).values())) │
│ 19 │
│ 20 LANGUAGE_TAG = { │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 81: illegal multibyte sequence

查阅相关资料后，在读取example_inputs.jsonl文件语句，添加：encoding='utf-8'，如下：
........"example_inputs.jsonl"), "r", encoding='utf-8') as f:
程序读取并执行正常，中英文提问示例也显示正常。

但这个时候，点击英文示例，提交后，模型推理输出正常，没问题。
点击中文的示例，提交后，模型推理后出现混乱；而依据示例，在输入框，重复示例内容直接手工中文输入，则正常。
怀疑是通过点击示例提交并传输给模型的内容，存在编码转换问题，估计Unix、Linux平台没事，Windows平台有这问题，请大侠百忙中给分析一下，给些好的解决建议，谢谢！

模型支持代码纠错吗

codegeex2模型支持代码纠错吗，是什么样结构的prompt呢

不知道为什么无法生成代码

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:2 for open-end generation.

如何使用cpu运行

我像chatglm2-6b 那样使用 .float() 加载模型，但是失败了。请问 CodeGeeX2 是否可以 cpu 运行，如何操作？

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver
from http://www.nvidia.com/Download/index.aspx

1 有int4量化后的模型资源吗

vs code插件无法生成代码和输入任何内容，一直转圈

本来正常用着，突然就无法生成了，一直转圈，重启vscode也不行，是什么原因呢？

代码还是模型抽风了

自问自答，输出不断

Android Studio Giraffe | 2022.3.1 不兼容

java.lang.NullPointerException: Cannot invoke "com.intellij.ui.jcef.JBCefBrowser.getCefBrowser()" because "ai.codegeex.plugin.toolWindow.AskCodegeexContent.jbCefBrowser" is null
at ai.codegeex.plugin.toolWindow.AskCodegeexContent.setSelectionChanged(AskCodegeexContent.java:744)
at ai.codegeex.plugin.editor.CodegeexEditorListener$CodegeexSelectionListener.selectionChanged(CodegeexEditorListener.java:134)
at com.intellij.openapi.editor.impl.SelectionModelImpl.broadcastSelectionEvent(SelectionModelImpl.java:78)
at com.intellij.openapi.editor.impl.SelectionModelImpl.fireSelectionChanged(SelectionModelImpl.java:72)
at com.intellij.openapi.editor.impl.CaretImpl.lambda$removeSelection$7(CaretImpl.java:1204)
at com.intellij.openapi.editor.impl.CaretModelImpl.doWithCaretMerging(CaretModelImpl.java:419)
at com.intellij.openapi.editor.impl.CaretImpl.removeSelection(CaretImpl.java:1194)
at com.intellij.openapi.editor.SelectionModel.removeSelection(SelectionModel.java:202)
at com.intellij.openapi.editor.SelectionModel.removeSelection(SelectionModel.java:193)
at com.intellij.openapi.editor.impl.EditorImpl.processMouseReleased(EditorImpl.java:2412)
at com.intellij.openapi.editor.impl.EditorImpl$MyMouseAdapter.lambda$runMouseReleasedCommand$1(EditorImpl.java:4002)
at com.intellij.openapi.command.impl.CoreCommandProcessor.executeCommand(CoreCommandProcessor.java:219)
at com.intellij.openapi.command.impl.CoreCommandProcessor.executeCommand(CoreCommandProcessor.java:174)
at com.intellij.openapi.editor.impl.EditorImpl$MyMouseAdapter.runMouseReleasedCommand(EditorImpl.java:4004)
at com.intellij.openapi.editor.impl.EditorImpl$MyMouseAdapter.mouseReleased(EditorImpl.java:3887)
at java.desktop/java.awt.Component.processMouseEvent(Unknown Source)
at java.desktop/javax.swing.JComponent.processMouseEvent(Unknown Source)
at java.desktop/java.awt.Component.processEvent(Unknown Source)
at java.desktop/java.awt.Container.processEvent(Unknown Source)
at java.desktop/java.awt.Component.dispatchEventImpl(Unknown Source)
at java.desktop/java.awt.Container.dispatchEventImpl(Unknown Source)
at java.desktop/java.awt.Component.dispatchEvent(Unknown Source)
at java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
at java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
at java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
at java.desktop/java.awt.Container.dispatchEventImpl(Unknown Source)
at java.desktop/java.awt.Window.dispatchEventImpl(Unknown Source)
at java.desktop/java.awt.Component.dispatchEvent(Unknown Source)
at java.desktop/java.awt.EventQueue.dispatchEventImpl(Unknown Source)
at java.desktop/java.awt.EventQueue$3.run(Unknown Source)
at java.desktop/java.awt.EventQueue$3.run(Unknown Source)
at java.base/java.security.AccessController.doPrivileged(Unknown Source)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.desktop/java.awt.EventQueue$4.run(Unknown Source)
at java.desktop/java.awt.EventQueue$4.run(Unknown Source)
at java.base/java.security.AccessController.doPrivileged(Unknown Source)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(Unknown Source)
at java.desktop/java.awt.EventQueue.dispatchEvent(Unknown Source)
at com.intellij.ide.IdeEventQueue.defaultDispatchEvent(IdeEventQueue.java:909)
at com.intellij.ide.IdeEventQueue.dispatchMouseEvent(IdeEventQueue.java:831)
at com.intellij.ide.IdeEventQueue._dispatchEvent(IdeEventQueue.java:753)
at com.intellij.ide.IdeEventQueue.lambda$dispatchEvent$5(IdeEventQueue.java:437)
at com.intellij.openapi.progress.impl.CoreProgressManager.computePrioritized(CoreProgressManager.java:787)
at com.intellij.ide.IdeEventQueue.lambda$dispatchEvent$6(IdeEventQueue.java:436)
at com.intellij.openapi.application.TransactionGuardImpl.performActivity(TransactionGuardImpl.java:113)
at com.intellij.ide.IdeEventQueue.performActivity(IdeEventQueue.java:615)
at com.intellij.ide.IdeEventQueue.lambda$dispatchEvent$7(IdeEventQueue.java:434)
at com.intellij.openapi.application.impl.ApplicationImpl.runIntendedWriteActionOnCurrentThread(ApplicationImpl.java:838)
at com.intellij.ide.IdeEventQueue.dispatchEvent(IdeEventQueue.java:480)
at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(Unknown Source)
at java.desktop/java.awt.EventDispatchThread.run(Unknown Source)

插件可以调用本地的codegeex2-6b模型吗？

试了下idea里的插件，必须要登陆才能用？可以直接访问本地的模型吗？可能是插件还没开发这个功能，未来会做吗？

是否可以将几个主流开发语言分别做一个更小的模型

CodeGeeX2-6b 虽然体积小了一些，但同时支持多种语言感觉还是有点浪费，是否可以按某个开发语言单独训练一个。
这样可以更小一些，对配置的要求可能会更低一些吧。
作为笔记本使用者，目前还没有换电脑的想法，希望可以体验到更小更快的代码生成模型

请问已经公布离线部署与finetune的教程了么？

请问有做过sft版本的对比吗

目前模型是基于 ChatGLM2-6B 基座语言模型继续代码预训练的，那么目前模型对指令的理解如何呢，是不是做instruction tunning会更好呢。

一直显示内部错误

我用的是mac vscode插件，远程ssh登陆linux系统，在login后一直显示“内部错误”。有大神知道这是什么原因吗？

请问能支持eclipse吗

请问训练代码开源吗？想训练写业务代码该如何实现。

RuntimeError: Unknown platform: darwin

envs:

Mac studio M2pro
Python 3.11.4

code

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = model.eval()

# remember adding a language tag for better performance
prompt = "# language: Python\n# write a bubble sort function\n"
inputs = tokenizer.encode(prompt, return_tensors="pt")
outputs = model.generate(inputs, max_length=256, top_k=1)
response = tokenizer.decode(outputs[0])
print(response)

error

  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "ChatGLM2-6B-main/venv/lib/python3.11/site-packages/cpm_kernels/__init__.py", line 1, in <module>
    from . import library
  File "ChatGLM2-6B-main/venv/lib/python3.11/site-packages/cpm_kernels/library/__init__.py", line 1, in <module>
    from . import nvrtc
  File "ChatGLM2-6B-main/venv/lib/python3.11/site-packages/cpm_kernels/library/nvrtc.py", line 5, in <module>
    nvrtc = Lib("nvrtc")
            ^^^^^^^^^^^^
  File "ChatGLM2-6B-main/venv/lib/python3.11/site-packages/cpm_kernels/library/base.py", line 59, in __init__
    raise RuntimeError("Unknown platform: %s" % sys.platform)
RuntimeError: Unknown platform: darwin

需要加强C#语言支持

CodeGeeX2没有描述对C#语言的提升程度，只描述了对C++、Python、Java、JavaScript、Go、Rust有许多加强，希望能同时加入对C#语言的能力描述。

int4模型fastapi demo生成的代码不完整。

GPU A10
启动方式 python3 demo/fastapicpu.py

## cURL
curl -X "POST" "http://<host>:<port>" \
     -H 'Content-Type: application/json' \
     -d $'{
  "lang": "C",
  "prompt": "// Write a quick sort function"
}'

一种可能的返回值

{
  "response": [
    "// language: C\n// Write a quick sort function\n\n#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <math.h>\n#include <time.h>\n#include <ctype.h>\n#include <limits.h>\n#include <float.h>\n\n#define MAX_NUM 100000\n\nint cmp(const void",
    [
      [
        "// language: C\n// Write a quick sort function",
        "// language: C\n// Write a quick sort function\n\n#include <stdio.h>\n#include <stdlib.h>\n#include <string.h>\n#include <math.h>\n#include <time.h>\n#include <ctype.h>\n#include <limits.h>\n#include <float.h>\n\n#define MAX_NUM 100000\n\nint cmp(const void"
      ]
    ]
  ],
  "lang": "C",
  "status": 200,
  "time": "2023-08-08 18:19:37"
}

另外一种可能的返回值

{
  "response": [
    "// language: C\n// Write a quick sort function",
    [
      [
        "// language: C\n// Write a quick sort function",
        "// language: C\n// Write a quick sort function"
      ]
    ]
  ],
  "lang": "C",
  "status": 200,
  "time": "2023-08-08 18:29:13"
}

模型文件在哪里下载，国内网盘

huggingface下载速度太慢了

请问是否可以微调自己业务代码

生成代码体验很不错，请问是否可以结合自己的业务训练自己的定制化模型呢

Bugfix error: 'NoneType' object has no attribute 'status_code'

curl 'https://wudao.aminer.cn/os/api/api/v2/multilingual_code/bugfix' \
  -H 'Accept: application/json, text/plain, */*' \
  -H 'Accept-Language: zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7' \
  -H 'Authorization: null' \
  -H 'Cache-Control: no-cache' \
  -H 'Connection: keep-alive' \
  -H 'Content-Type: application/json;charset=UTF-8' \
  -H 'Origin: https://codegeex.cn' \
  -H 'Pragma: no-cache' \
  -H 'Referer: https://codegeex.cn/' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Site: cross-site' \
  -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36' \
  -H 'sec-ch-ua: "Not/A)Brand";v="99", "Google Chrome";v="115", "Chromium";v="115"' \
  -H 'sec-ch-ua-mobile: ?0' \
  -H 'sec-ch-ua-platform: "macOS"' \
  --data-raw '{"prompt":"awefwafwaef","lang":"python","n":1,"stop":[]}' \
  --compressed

性能拉跨

推理速度超级拉跨

两张A6000推理一个python语言的‘写一个hello world’，用了200秒。

推理质量拉跨

同样的问题停不下来，不但写了一个hello world，还写了一个类，又写了一个类，不停的写没有要求的东西。

IDEA 2023.2 不兼容

java.lang.NullPointerException: Cannot invoke "com.intellij.openapi.editor.Editor.getDocument()" because "editor" is null
at ai.codegeex.plugin.lang.agent.AgentCodegeexService.loginInteractive(AgentCodegeexService.java:287)
at ai.codegeex.plugin.github.CodegeexService.lambda$showLoginNotification$0(CodegeexService.java:49)
at com.intellij.notification.NotificationAction.lambda$createSimpleExpiring$2(NotificationAction.java:62)
at com.intellij.notification.NotificationAction$Simple.actionPerformed(NotificationAction.java:96)
at com.intellij.notification.NotificationAction.actionPerformed(NotificationAction.java:33)
at com.intellij.openapi.actionSystem.ex.ActionUtil.doPerformActionOrShowPopup(ActionUtil.java:339)
at com.intellij.openapi.actionSystem.ex.ActionUtil.lambda$performActionDumbAwareWithCallbacks$4(ActionUtil.java:313)
at com.intellij.openapi.actionSystem.ex.ActionUtil.performDumbAwareWithCallbacks(ActionUtil.java:362)
at com.intellij.openapi.actionSystem.ex.ActionUtil.performActionDumbAwareWithCallbacks(ActionUtil.java:313)
at com.intellij.openapi.fileEditor.impl.IdeUiServiceImpl.performActionDumbAwareWithCallbacks(IdeUiServiceImpl.java:114)
at com.intellij.notification.Notification.fire(Notification.java:283)
at com.intellij.notification.impl.NotificationsManagerImpl.lambda$createAction$17(NotificationsManagerImpl.java:919)
at com.intellij.ui.components.labels.LinkLabel.doClick(LinkLabel.java:175)
at com.intellij.ui.components.labels.LinkLabel.doClick(LinkLabel.java:389)
at com.intellij.ui.components.labels.LinkLabel$MyMouseHandler.mouseReleased(LinkLabel.java:362)
at java.desktop/java.awt.Component.processMouseEvent(Component.java:6657)
at java.desktop/javax.swing.JComponent.processMouseEvent(JComponent.java:3385)
at java.desktop/java.awt.Component.processEvent(Component.java:6422)
at java.desktop/java.awt.Container.processEvent(Container.java:2266)
at java.desktop/java.awt.Component.dispatchEventImpl(Component.java:5027)
at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2324)
at java.desktop/java.awt.Component.dispatchEvent(Component.java:4855)
at java.desktop/java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4954)
at java.desktop/java.awt.LightweightDispatcher.processMouseEvent(Container.java:4581)
at java.desktop/java.awt.LightweightDispatcher.dispatchEvent(Container.java:4522)
at java.desktop/java.awt.Container.dispatchEventImpl(Container.java:2310)
at java.desktop/java.awt.Window.dispatchEventImpl(Window.java:2808)
at java.desktop/java.awt.Component.dispatchEvent(Component.java:4855)
at java.desktop/java.awt.EventQueue.dispatchEventImpl(EventQueue.java:791)
at java.desktop/java.awt.EventQueue$3.run(EventQueue.java:740)
at java.desktop/java.awt.EventQueue$3.run(EventQueue.java:734)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:97)
at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:764)
at java.desktop/java.awt.EventQueue$4.run(EventQueue.java:762)
at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
at java.base/java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:86)
at java.desktop/java.awt.EventQueue.dispatchEvent(EventQueue.java:761)
at com.intellij.ide.IdeEventQueue.defaultDispatchEvent(IdeEventQueue.kt:685)
at com.intellij.ide.IdeEventQueue.dispatchMouseEvent(IdeEventQueue.kt:633)
at com.intellij.ide.IdeEventQueue._dispatchEvent(IdeEventQueue.kt:588)
at com.intellij.ide.IdeEventQueue.access$_dispatchEvent(IdeEventQueue.kt:67)
at com.intellij.ide.IdeEventQueue$dispatchEvent$processEventRunnable$1$1$1.compute(IdeEventQueue.kt:369)
at com.intellij.ide.IdeEventQueue$dispatchEvent$processEventRunnable$1$1$1.compute(IdeEventQueue.kt:368)
at com.intellij.openapi.progress.impl.CoreProgressManager.computePrioritized(CoreProgressManager.java:787)
at com.intellij.ide.IdeEventQueue$dispatchEvent$processEventRunnable$1$1.invoke(IdeEventQueue.kt:368)
at com.intellij.ide.IdeEventQueue$dispatchEvent$processEventRunnable$1$1.invoke(IdeEventQueue.kt:363)
at com.intellij.ide.IdeEventQueueKt.performActivity$lambda$1(IdeEventQueue.kt:992)
at com.intellij.openapi.application.TransactionGuardImpl.performActivity(TransactionGuardImpl.java:113)
at com.intellij.ide.IdeEventQueueKt.performActivity(IdeEventQueue.kt:992)
at com.intellij.ide.IdeEventQueue.dispatchEvent$lambda$7(IdeEventQueue.kt:363)
at com.intellij.openapi.application.impl.ApplicationImpl.runIntendedWriteActionOnCurrentThread(ApplicationImpl.java:861)
at com.intellij.ide.IdeEventQueue.dispatchEvent(IdeEventQueue.kt:405)
at java.desktop/java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:207)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:128)
at java.desktop/java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:117)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:113)
at java.desktop/java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:105)
at java.desktop/java.awt.EventDispatchThread.run(EventDispatchThread.java:92)

关于提问次数限制的问题

今天用着用着说有提问次数上限，需要邀请新用户后解除限制，我邀请了新用户后还是不能使用，是功能不完善么？

使用Java语言编写冒泡排序，出现问题

我尝试使用Python、Java、Javascript、Ruby、Go、PHP语言编写冒泡排序。我按照README使用每个语言各自注释风格编写prompt。只有Java语言出现问题，其他语言正常。代码如下：

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/codegeex2-6b", trust_remote_code=True, device='cuda')
model = model.eval()

# remember adding a language tag for better performance
prompt = "// language: Java\n//write a bubble sort function\n"
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_length=256, top_k=1)
response = tokenizer.decode(outputs[0])

print(response)

结果为没有任何输出

[2023-07-29 03:42:17,286] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████| 7/7 [00:28<00:00,  4.03s/it]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
// language: Java
//write a bubble sort function



此处省略几十行空格

不知道有人是否遇到相同的问题

提问次数已达到当日上限。邀请新用户成功安装注册CodeGeeX，即可无限使用问答功能。点击查看您的邀请码及邀请方式。

RuntimeError: CUDA error: device-side assert triggered

Hi Team，

I meet the error during run rum_demo.py.

OS Environment:
Centos
python version:
Python 3.10.10 (main, Mar 21 2023, 18:45:11) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

StartUp method:
nohup python run_demo.py

Error info in nohup.out
Traceback (most recent call last):
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/gradio/routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/gradio/blocks.py", line 1392, in process_api
result = await self.call_function(
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/gradio/blocks.py", line 1097, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/gradio/utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "/home/llm/app/CodeGeeX2-6B/source/CodeGeeX2/run_demo_CodeGeeX2.py", line 117, in predict
set_random_seed(seed)
File "/home/llm/app/CodeGeeX2-6B/source/CodeGeeX2/run_demo_CodeGeeX2.py", line 104, in set_random_seed
torch.manual_seed(seed)
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/torch/random.py", line 40, in manual_seed
torch.cuda.manual_seed_all(seed)
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/torch/cuda/random.py", line 113, in manual_seed_all
_lazy_call(cb, seed_all=True)
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/torch/cuda/init.py", line 183, in _lazy_call
callable()
File "/home/llm/miniconda3/envs/CodeGeeX2_env/lib/python3.10/site-packages/torch/cuda/random.py", line 111, in cb
default_generator.manual_seed(seed)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I try the method as following:
add CUDA_LAUNCH_BLOCKING=1 in run_demo.py

I am not sure the reason and the method is correct or not??
If correct, could I pull the PR?

BestRegards
Yazhou

在pycharm中搜索不到

在pycharm中搜索不到，请问是否需要限定对应的pycharm版本？谢谢！

请问如何生成其他语言的代码？

我运行huggingface上的示例代码，我在prompt中选择了其他语言，比如：SQL、Java，但是生成的还是Python代码

求batch方式推理的代码

报错：Login your CodeGeeX account to get started

jetbrains的插件，为啥登录了还是提示要登录？

Why is CodeGeeX2-6b much slower than ChatGLM2-6b

As the title says, on the same graphics card (3090), CodeGeeX2-6b is much slower than ChatGLM-6b. According to the official demo, I would like to know if there are any tricks.

trained ds

Hi!
May you elaborate and the datasets used to train the model?

Thanks for your important work!

vscode上ssh远程Linux时，无法登陆

[Unexpected token '<', "<html> <h"... is not valid JSON]启动run_demo.py运行时出现此报错

Unexpected token '<', "<html> <h"... is not valid JSON

环境	配置
错误信息	Unexpected token '<', " <h"... is not valid JSON
推理设备	CPU
操作系统	linux
使用脚本	run_demo.py

[Bug] Mac M2Max 推理异常/Inference Error

Environment

- OS: macos Ventura 13.2.1
- Python: 3.11
- Transformers: 4.30.2
- PyTorch: 2.0.1
- CUDA Support: False

Current Behavior

使用 Mac M2Max 进行推理异常：

内存最高吃到94G；
要求 Java 语言，推理结果 Python；
简单prompt推理时长超长（几十秒到3分钟）；
复杂prompt经常不会出结果（10分钟）；
错误的、重复的推理结果。

代码 demo

model_path = "/xxxxxxxxx"
model_id = 'ZhipuAI/codegeex2-6b'

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, torch_dtype=torch.bfloat16, trust_remote_code=True).half().to("mps")
model = model.eval()
# remember adding a language tag for better performance
prompt = "// language: java\n// 使用Mybatis-plus 的分页查询用户\n"
# prompt = "language: Python\n# write a bubble sort function\n"
# prompt = "language: Java\n# write a bubble sort function\n"
# prompt = "language: Java\n# 使用Mybatis-plus 的分页查询用户\n"
#prompt = "# language: Java\n# 使用Mybatis-plus 写一个关于【商城Service】的业务代码，商城的 Service 命名为 StoreService.工具：1. 字符串处理使用hutool的StrUtils; 2. 抛异常使用hutool的Assert; 3. 业务异常使用 BizException; 实体类有字段 ：String id;String name;Double price;String type;业务-【新增商品】，业务规则：1. 必填名称；2. 价格必须大于100；3. 如果商品类型为 '001'，价格必须大于200;4. 商品名称不能重复。\n"
# prompt = "# language：Java\n# 写一个冒泡排序函数"
inputs = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_length=888)
response = tokenizer.decode(outputs[0])
print(response)

具体案例如下：

Case 1：官方 demo 的 prompt = “# language: Python\n# write a bubble sort function\n”

可推理出结果，时间大约10秒。

Case 2：官方 demo 改为 java prompt = “# language: Java\n# write a bubble sort function\n”

写出了Python, 时间大约10秒。

language: Java
# write a bubble sort function
def bubble_sort(arr):
    for i in range(len(arr) - 1):
        for j in range(len(arr) - 1):
            if arr[j] > arr[j + 1]:
                arr[j], arr[j + 1] = arr[j + 1], arr[j]
    return arr
print(bubble_sort([1, 2, 3, 4, 5, 6, 7, 8, 9, 10]))

Case 3：prompt = “# language：Java\n# 写一个冒泡排序函数”

推理出结果了，但是要求是 Java，写成了 Python，而且中文有大量冗余。

# language：Java
# 写一个冒泡排序函数
# 冒泡排序：
# 1.比较相邻的元素。如果第一个比第二个大，就交换他们两个。
# 2.对每一对相邻元素作同样的工作，从开始第一对到结尾的最后一对。这步做完后，最后的元素会是最大的数。
# 3.针对所有的元素重复以上的步骤，除了最后一个。
# 4.持续每次对越来越少的元素重复上面的步骤，直到没有任何一对数字需要比较。
# 冒泡排序的原理：
# 1.比较相邻的元素。如果第一个比第二个大，就交换他们两个。
# 2.对每一对相邻元素作同样的工作，从开始第一对到结尾的最后一对。这步做完后，最后的元素会是最大的数。
# 3.针对所有的元素重复以上的步骤，除了最后一个。
# 4.持续每次对越来越少的元素重复上面的步骤，直到没有任何一对数字需要比较。
# 冒泡排序的代码实现：
def bubble_sort(alist):
    for i in range(len(alist) - 1, 0, -1):
        for j in range(i):
            if alist[j] > alist[j + 1]:
                alist[j], alist[j + 1] = alist[j + 1], alist[j]
    return alist
alist = [54, 26, 93, 17, 77, 31, 44, 55, 20]
print(bubble_sort(alist))

Case4：prompt = "# language：Java\n# 冒泡排序“

内存吃到90G，3分钟不出结果

Case5：prompt = "使用Mybatis-plus 的分页查询用户"

idea 插件：表现正常
本地执行：内存吃到90G, 70%卡死，30%可出结果并且结果正常。

Case6：带有大量上下文，有业务场景的长 prompt

prompt = "# language: Java\n# 使用Mybatis-plus 写一个关于【商城Service】的业务代码，商城的 Service 命名为 StoreService.工具：1. 字符串处理使用hutool的StrUtils; 2. 抛异常使用hutool的Assert; 3. 业务异常使用 BizException; 实体类有字段：String id;String name;Double price;String type;业务-【新增商品】，业务规则：1. 必填名称；2. 价格必须大于100；3. 如果商品类型为 '001'，价格必须大于200;4. 商品名称不能重复。\n"

idea 的插件：表现良好，
网友（cuda 3090）：基本秒出结果，执行结果正常。
本地执行时：内存吃到94G，最长执行了10分钟不出结果，只出过一次结果，显示如下：

Loading checkpoint shards: 100%|██████████| 7/7 [00:05<00:00,  1.35it/s]
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
/Users/nacol/Projects/llm/CodeGeeX2/venv/lib/python3.11/site-packages/transformers/generation/utils.py:2419: UserWarning: MPS: no support for int64 min/max ops, casting it to int32 (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/ReduceOps.mm:1271.)
  if unfinished_sequences.max() == 0:
# language: Java
# 使用Mybatis-plus 写一个关于【商城Service】的业务代码，商城的 Service 命名为 StoreService.工具：1. 字符串处理使用hutool的StrUtils; 2. 抛异常使用hutool的Assert; 3. 业务异常使用 BizException; 实体类有字段 ：String id;String name;Double price;String type;业务-【新增商品】，业务规则：1. 必填名称；2. 价格必须大于100；3. 如果商品类型为 '001'，价格必须大于200;4. 商品名称不能重复。
import cn.hutool.core.util.StrUtil;
import cn.hutool.json.JSONObject;
import cn.hutool.json.JSONUtil;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.log.Log;
import cn.hutool.log.LogFactory;
import cn.hutool.

后续

2023.08.03 16:00

按照下面两位朋友的建议

【Stanislas0】:prompt需要使用相应语言的注释符号，Python用"# [prompt]"，Java则应该用“// [prompt]”。也可以加一些关键字来引导模型生成函数或类，比如Java用“// [prompt]\npublic class”
【vaxilicaihouxian】:你要不试试在你的prompt最后再加一个\n ？我发现只有一个\n结尾的时候经常出现重复生成同样代码的问题

已将prompt修改如下：

# prompt = "// language: java\n// 使用Mybatis-plus 的分页查询用户\n\n"
# prompt = "# language: Python\n# write a bubble sort function\n\n"
prompt = "// language: Java\n// write a bubble sort function\n\n"
# prompt = "// Language: Java\n// 使用Mybatis-plus 的分页查询用户\n\npublic class\n\n"
# prompt = "// language: Java\n// 使用Mybatis-plus 写一个关于【商城Service】的业务代码，商城的 Service 命名为 StoreService.工具：1. 字符串处理使用hutool的StrUtils; 2. 抛异常使用hutool的Assert; 3. 业务异常使用 BizException; 实体类有字段 ：String id;String name;Double price;String type;业务-【新增商品】，业务规则：1. 必填名称；2. 价格必须大于100；3. 如果商品类型为 '001'，价格必须大于200;4. 商品名称不能重复。\n\n"
# prompt = "// language：Java\n/ 写一个冒泡排序函数\n\n"

解决了：要求 java 输出 python 的问题；

未解决：推理出慢、推理卡死、高内存、错误的推理结果等问题

prompt = "// language: Java\n// write a bubble sort function\n\npublic class"

该prompt跑了5次，每次3分钟以上，内存持续上涨到40G手动停止，未的出推理结果。