Git Product home page Git Product logo

linksoul-ai / llasm Goto Github PK

View Code? Open in Web Editor NEW
477.0 12.0 47.0 3.33 MB

第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。

Home Page: https://github.com/LinkSoul-AI/LLaSM

License: Apache License 2.0

Python 100.00%

llasm's Introduction

LLaSM: Large Language and Speech Model

开源,可商用的中英文双语语音-语言助手 LLaSM 以及中英文语音 SFT 数据集 LLaSM-Audio-Instructions,第一个支持中英文语音-文本多模态对话的开源可商用对话模型。

模型框架

Framework

基础演示

Base Demo

在线试玩

Talk is cheap, Show you the Demo.

论文

资源下载

环境安装

# clone the repository
git clone https://github.com/LinkSoul-AI/LLaSM
cd LLaSM

# install package
conda create -n llasm python=3.10 -y
conda activate llasm
pip install --upgrade pip
pip install -e .

快速测试

export LLASM_DEVICE="cuda:0"
python infer.py \
    --input_audio_file PATH/TO/YOUR/AUDIO \
    --llasm_model PATH/TO/LLaSM/MODEL \
    --llasm_audio_tower PATH/TO/WHISPER/MODEL \
    --llm_type "Chinese_llama2" or "baichuan" \

TODO

  • 如何训练
  • int4 量化
  • docker 部署

相关项目

项目协议

Apache-2.0 license

Citation

如果您发现我们的工作和此仓库有用,欢迎给一个星星 ⭐ 鼓励我们一下 🍺:

@misc{shu2023llasm,
      title={LLaSM: Large Language and Speech Model}, 
      author={Yu Shu and Siwei Dong and Guangyao Chen and Wenhao Huang and Ruihua Zhang and Daochen Shi and Qiqi Xiang and Yemin Shi},
      year={2023},
      eprint={2308.15930},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

微信交流群

微信交流群

llasm's People

Contributors

eltociear avatar s1w3 avatar shiyemin avatar tabbbsy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llasm's Issues

所有模型都准备好了,运行infer.py时报错

命令:python infer.py --input_audio_file d:/tmp/output.wav --llasm_model F:/models/LLaSM-Cllama2 --llasm_audio_tower F:/models/whisper-medium --llm_type "Chinese_llama2"
报错:RuntimeError: mat1 and mat2 shapes cannot be multiplied (64x1024 and 1280x4096)

是哪里的问题?求解答,谢谢!~

关于预训练模型llama

请问百度网盘下载地址的是本文方法的预训练模型吗?还是原生的开源llama?谢谢。

whisper已经下载到本地了,就是找不到加载位置

'(MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /openai/whisper-large-v2/resolve/main/config.json (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7fd9399957b0>, 'Connection to huggingface.co timed out. (connect timeout=10)'))"), '(Request ID: 7b68228e-e6a6-4ecd-bc7e-3affc92e3ac3)')' thrown while requesting HEAD https://huggingface.co/openai/whisper-large-v2/resolve/main/config.json
Traceback (most recent call last):
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/transformers/utils/hub.py", line 417, in cached_file
resolved_file = hf_hub_download(
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1291, in hf_hub_download
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/work/liukai/LLaSM/infer.py", line 122, in
main(args)
File "/work/liukai/LLaSM/infer.py", line 46, in main
model = LlaaaLlamaForCausalLM.from_pretrained(
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File "/work/liukai/LLaSM/llasm.py", line 160, in init
self.model = LlaaaLlamaModel(config)
File "/work/liukai/LLaSM/llasm.py", line 41, in init
self.audio_tower = [load_whisper(config.mm_audio_tower)]
File "/work/liukai/LLaSM/llasm.py", line 28, in load_whisper
model = WhisperModel.from_pretrained(audio_tower_name)
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2325, in from_pretrained
config, model_kwargs = cls.config_class.from_pretrained(
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 590, in from_pretrained
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 617, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 672, in _get_config_dict
resolved_config_file = cached_file(
File "/root/anaconda3/envs/llasm/lib/python3.10/site-packages/transformers/utils/hub.py", line 452, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like openai/whisper-large-v2 is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.