Git Product home page Git Product logo

chatgptbook's Introduction

我是刘聪NLP

  • 🔭 南京某创业公司,首席算法架构师
  • 🌱 研究方向:QA系统(向量检索、阅读理解)、文本生成(问题生成、对话生成、摘要生成)、预训练语言模型等
  • 💬 微信:logCong
  • 📫 问题:知乎 @刘聪NLP
  • 😄 微信公众号:公众号『 NLP工作站 』
  • 👯 希望可以多多跟大家交流,欢迎关注我的知乎,欢迎添加微信!

chatgptbook's People

Contributors

liucongg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chatgptbook's Issues

使用readme 里的docker 镜像跑LLMPreProj 这个工程,会报这个错误呢,辛苦帮忙看啥为啥呢?

[2023-10-08 14:58:00,215] [INFO] [launch.py:162:main] dist_world_size=1
[2023-10-08 14:58:00,215] [INFO] [launch.py:164:main] Setting CUDA_VISIBLE_DEVICES=0
[2023-10-08 14:58:01,975] [INFO] [comm.py:652:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl

initializing model parallel with size 1
Traceback (most recent call last):
File "pretrain_model.py", line 694, in
main()
File "pretrain_model.py", line 644, in main
initialize_distributed(args)
File "pretrain_model.py", line 113, in initialize_distributed
set_deepspeed_activation_checkpointing(args)
File "pretrain_model.py", line 73, in set_deepspeed_activation_checkpointing
deepspeed.checkpointing.configure(mpu, deepspeed_config=args.deepspeed_config, num_checkpoints=args.num_layers)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 887, in configure
_configure_using_config_file(deepspeed_config, mpu=mpu)
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/activation_checkpointing/checkpointing.py", line 804, in _configure_using_config_file
config = DeepSpeedConfig(config, mpu=mpu).activation_checkpointing_config
File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/config.py", line 728, in init
config_decoded = base64.urlsafe_b64decode(config.strip()).decode('utf-8')
File "/opt/conda/lib/python3.8/base64.py", line 133, in urlsafe_b64decode
return b64decode(s)
File "/opt/conda/lib/python3.8/base64.py", line 87, in b64decode
return binascii.a2b_base64(s)
binascii.Error: Incorrect padding
[2023-10-08 14:58:03,235] [INFO] [launch.py:318:sigkill_handler] Killing subprocess 1179
[2023-10-08 14:58:03,236] [ERROR] [launch.py:324:sigkill_handler] ['/opt/conda/bin/python', '-u', 'pretrain_model.py', '--local_rank=0'] exits with return code = 1

RewardModel计算两个response之间的差异部分疑问

RewardModel计算两个response之间的差异:end_ind的计算是通过end_ind = max(one_ind, two_ind)算的,为什么不是直接比较one_input_idstwo_input_ids差异的最后一个值,也就是check_divergence[-1]来获得。

ChatGLM预训练问题

老师您好,最近读了您的《ChatGPT原理与实战》一书,收益颇丰,想和您请教一以下,我目前想以ChatGLM为基座模型构建一个领域大语言模型,由于数据的特殊性,所以需要在进行有监督微调前,需要做一步领域数据注入。那么ChatGLM是否可以使用您在这本书第六章所给出的实例去做二次预训练呢?

在九天毕晟上开了一个GPU运行环境,跑train程序,最后报错,什么原因?

Traceback (most recent call last):
File "/root/aaa/ChatGPTBook/PromptProj/train.py", line 261, in
main()
File "/root/aaa/ChatGPTBook/PromptProj/train.py", line 257, in main
train(model, device, train_data, test_data, args, tokenizer)
File "/root/aaa/ChatGPTBook/PromptProj/train.py", line 140, in train
model_to_save.save_pretrained(output_dir)
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2376, in save_pretrained
safe_save_file(shard, os.path.join(save_directory, shard_file), metadata={"format": "pt"})
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/safetensors/torch.py", line 281, in save_file
serialize_file(_flatten(tensors), filename, metadata=metadata)
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/safetensors/torch.py", line 460, in _flatten
shared_pointers = _find_shared_tensors(tensors)
File "/opt/conda/envs/pytorch1.8/lib/python3.9/site-packages/safetensors/torch.py", line 72, in _find_shared_tensors
if v.device != torch.device("meta") and storage_ptr(v) != 0 and storage_size(v) != 0:
RuntimeError: Expected one of cpu, cuda, xpu, mkldnn, opengl, opencl, ideep, hip, msnpu, xla, vulkan device type at start of device string: meta

本书第18页,第7行,应为“解码类模型”

原文:在ChatGPT各项任务表现优异的当下,编码类模型变成最为火热的模型,将有更多从业者投入到相关模型的设计优化中。
个人认为,本段描述的是GPT等此类解码类模型也可以做语义分析任务,从而吸引更多从业者进行研究,所以这里更改为“解码类模型变成最为火热的模型”更为合适。

关于《ChatGPT原理与实战》代码的数据问题

刘老师,
      您好, 今日因为工作的需要, 拜读了您最近的著作《ChatGPT原理与实战》。在书中, 您系统地为梳理了ChatGPT相关的模型和算法技术, 并且对比了不同模型的特征和优劣, 这让我很全面地认识了ChatGPT 与其有关的预训练模型和大语言模型相关的知识。很钦佩您丰富的学识和深厚的实践, 同时, 我也为您的严谨和细致所感动。
      然而, 在通过对您提供的资料进行练习时, 发现您提供的数据好像有缺失, 因此希望能从您这里得到。
      首先, 对于您提供的第三章《3.5 基于夸夸闲聊数据的UniLM模型实战》部分的数据的链接, 无法访问。推测是因为数据中存在的敏感词汇所致。或许对此部分数据做压缩后上传, 便会解决链接失效的问题。
| data |百度云 |w8bd|
      第二, 我在对您的第五章中《5.4 基于提示的文本情感分析实战》(PromptProj)的代码进行调试时, 发现代码运行到data_set.py第50行时, 缺失data/train.json 文件, 导致运行时出现报错。
(报错)
      基于此, 想向您请求获取这两部分的数据, 不知可否麻烦您在百忙之中对邮件中的问题加以阅览, 并给出回复。
谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.