Git Product home page Git Product logo

audiogpt's Introduction

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

arXiv GitHub Stars visitors Hugging Face

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Currently not every model has repository.

Speech

Task Supported Foundation Models Status
Text-to-Speech FastSpeech, SyntaSpeech, VITS Yes (WIP)
Style Transfer GenerSpeech Yes
Speech Recognition whisper, Conformer Yes
Speech Enhancement ConvTasNet Yes (WIP)
Speech Separation TF-GridNet Yes (WIP)
Speech Translation Multi-decoder WIP
Mono-to-Binaural NeuralWarp Yes

Sing

Task Supported Foundation Models Status
Text-to-Sing DiffSinger, VISinger Yes (WIP)

Audio

Task Supported Foundation Models Status
Text-to-Audio Make-An-Audio Yes
Audio Inpainting Make-An-Audio Yes
Image-to-Audio Make-An-Audio Yes
Sound Detection Audio-transformer Yes
Target Sound Detection TSDNet Yes
Sound Extraction LASSNet Yes

Talking Head

Task Supported Foundation Models Status
Talking Head Synthesis GeneFace Yes (WIP)

Acknowledgement

We appreciate the open source of the following projects:

ESPNetNATSpeechVisual ChatGPTHugging FaceLangChainStable Diffusion

audiogpt's People

Contributors

a-quarter-mile avatar ftshijt avatar lmzjms avatar miletaa avatar moonintheriver avatar peppapiggeee avatar rayeren avatar rongjiehuang avatar simpleoier avatar yangdongchao avatar yerfor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audiogpt's Issues

Can run in cpu????

python audio-chatgpt.py
Initializing AudioGPT
Initializing Make-An-Audio to cpu
LatentDiffusion_audio: Running in eps-prediction mode
DiffusionWrapper has 160.22 M params.
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 106, 106) = 44944 dimensions.
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
TextEncoder comes with 111.32 M params.
Traceback (most recent call last):
File "audio-chatgpt.py", line 1378, in
bot = ConversationBot()
File "audio-chatgpt.py", line 1057, in init
self.t2a = T2A(device="cpu")
File "audio-chatgpt.py", line 144, in init
self.sampler = self._initialize_model('text_to_audio/Make_An_Audio/configs/text_to_audio/txt2audio_args.yaml', 'text_to_audio/Make_An_Audio/useful_ckpts/ta40multi_epoch=000085.ckpt', device=device)
File "audio-chatgpt.py", line 150, in _initialize_model
model.load_state_dict(torch.load(ckpt, map_location='cpu')["state_dict"], strict=False)
File "/root/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/root/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

talking head status

Amazing work, would like to know status of talking head powered by audio as described in your paper, is it currently possible to test or not. thanks again for the cool work!

ImportError: cannot import name dataclass_transform

fresh install following run.md

python audio-chatgpt.py
Traceback (most recent call last):
  File "audio-chatgpt.py", line 9, in <module>
    import gradio as gr
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/gradio/__init__.py", line 3, in <module>
    import gradio.components as components
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/gradio/components.py", line 32, in <module>
    from fastapi import UploadFile
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fastapi/__init__.py", line 7, in <module>
    from .applications import FastAPI as FastAPI
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fastapi/applications.py", line 15, in <module>
    from fastapi import routing
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fastapi/routing.py", line 22, in <module>
    from fastapi import params
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fastapi/params.py", line 4, in <module>
    from pydantic.fields import FieldInfo, Undefined
  File "pydantic/__init__.py", line 2, in init pydantic.__init__
  File "pydantic/dataclasses.py", line 39, in init pydantic.dataclasses
    # +=========+=========================================+
ImportError: cannot import name dataclass_transform

cannot found https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text

<n/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/config.yaml https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/swa.pth
--2024-06-24 14:59:41-- https://huggingface.co/AIGC-Audio/AudioGPT/blob/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/config.yaml
Resolving huggingface.co (huggingface.co)... 168.143.162.58, 2a03:2880:f11c:8183:face:b00c:0:25de
Connecting to huggingface.co (huggingface.co)|168.143.162.58|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.
--2024-06-24 14:59:42-- https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/swa.pth
Connecting to huggingface.co (huggingface.co)|168.143.162.58|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.
(base) [root@VM-231-58-tencentos audiocaps_cntrstv_cnn14rnn_trm]# wget https://huggingface.co/AIGC-Audio/AudioGPT/blob/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/config.yaml
--2024-06-24 14:59:51-- https://huggingface.co/AIGC-Audio/AudioGPT/blob/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/config.yaml
Resolving huggingface.co (huggingface.co)... 168.143.162.58, 2a03:2880:f11c:8183:face:b00c:0:25de
Connecting to huggingface.co (huggingface.co)|168.143.162.58|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.
(base) [root@VM-231-58-tencentos audiocaps_cntrstv_cnn14rnn_trm]# wget https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/swa.pth
--2024-06-24 15:00:01-- https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/swa.pth
Resolving huggingface.co (huggingface.co)... 168.143.162.58, 2a03:2880:f11c:8183:face:b00c:0:25de
Connecting to huggingface.co (huggingface.co)|168.143.162.58|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.

No module named 'ldm'

File "audio-chatgpt.py", line 24, in
from ldm.util import instantiate_from_config
ModuleNotFoundError: No module named 'ldm'

also no module utils.hparams and audio_infer.utils

Bark audio model and talking head additions

  • Would be amazing if you can:
  1. Turn the "talking head" images into animated gifs lip-sync'ed to the wav audio generated by tts using Bark (Bark is currently the best and most realstic/emotion-driven audio model that is free to use, even better than the best commercial closed source model Eleven Labs)
  2. Then generating an mp4 from the combination of animated gif and wav audio on the fly, replacing the starting-point animated gif on the screen.

This can be done by integrating code from one of the following choices:

Bark oobabooga tts extention:
https://github.com/wsippel/bark_tts

RuntimeError: CUDA error: invalid device ordinal

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

`+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... Off | 00000000:07:00.0 Off | 0 |
| N/A 36C P0 24W / 250W | 4MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

`

Good

👋 👋😊

Error - requirements.txt - opencv-contrib-python==4.3.0.36 not available

The requirements.txt (line 37) asks for opencv-contrib-python==4.3.0.36 but since that version has been yanked and not available anymore, it gives me an error:

ERROR: Could not find a version that satisfies the requirement opencv-contrib-python==4.3.0.36
...
ERROR: No matching distribution found for opencv-contrib-python==4.3.0.36

Would it be sufficient to just use the latest opencv-contrib-python?

_pickle.UnpicklingError: pickle data was truncated

After hours of struggling with the installation I am getting this error. Any solution plz

(audiogpt) C:\sd\AudioGPT>python audio-chatgpt.py
Initializing AudioGPT
Initializing T2I to cuda:0
C:\Users\nitin\miniconda3\envs\audiogpt\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
unet\diffusion_pytorch_model.safetensors not found
Initializing ImageCaptioning to cuda:0
Initializing Make-An-Audio to cuda:0
LatentDiffusion_audio: Running in eps-prediction mode
DiffusionWrapper has 160.22 M params.
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 106, 106) = 44944 dimensions.
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
C:\Users\nitin\miniconda3\envs\audiogpt\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████| 48.0/48.0 [00:00<?, ?B/s]
config.json: 100%|████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 36.6kB/s]
vocab.txt: 100%|█████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 531kB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 735kB/s]
model.safetensors: 100%|████████████████████████████████████████████████████████████| 440M/440M [00:35<00:00, 12.3MB/s]
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
TextEncoder comes with 111.32 M params.
Traceback (most recent call last):
  File "audio-chatgpt.py", line 1377, in <module>
    bot = ConversationBot()
  File "audio-chatgpt.py", line 1057, in __init__
    self.t2a = T2A(device="cuda:0")
  File "audio-chatgpt.py", line 144, in __init__
    self.sampler = self._initialize_model('text_to_audio/Make_An_Audio/configs/text_to_audio/txt2audio_args.yaml', 'text_to_audio/Make_An_Audio/useful_ckpts/ta40multi_epoch=000085.ckpt', device=device)
  File "audio-chatgpt.py", line 150, in _initialize_model
    model.load_state_dict(torch.load(ckpt, map_location='cpu')["state_dict"], strict=False)
  File "C:\Users\nitin\miniconda3\envs\audiogpt\lib\site-packages\torch\serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\nitin\miniconda3\envs\audiogpt\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: pickle data was truncated


Build and run steps

Congrats on releasing this amazing project! Could you please put clear steps on how to build and run these pre-trained models, and hopefully that is feasible on my regular laptop.

KeyError: binary_data_dir error when trying to launch

I installed all the files and even downloaded the whole repo from Hugging Face, but I get "KeyError: binary_data_dir" when I try to launch the pythons script.

C:\audiogpt2\NeuralSeq\inference\tts\base_tts_infer.py:20 in __init__                            │
│                                                                                                  │
│    17 │   │   │   device = 'cuda' if torch.cuda.is_available() else 'cpu'                        │
│    18 │   │   self.hparams = hparams                                                             │
│    19 │   │   self.device = device                                                               │
│ ❱  20 │   │   self.data_dir = hparams['binary_data_dir']                                         │
│    21 │   │   self.preprocessor, self.preprocess_args = load_data_preprocessor()                 │
│    22 │   │   self.ph_encoder, self.word_encoder = self.preprocessor.load_dict(self.data_dir)    │
│    23 │   │   self.ds_cls = FastSpeechWordDataset                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'binary_data_dir'

Real-time Audio -> Text

I'm looking for a solution that can ultimately listen to an audio-out device (audio in is possible but I dislike that route for users), and run speaker diarization so split voices hopefully in "real time" .

Real time audio translation to voice

It would be great to have live : speech to text - translate - text to speech, while having natural diction without breaks.
-> How to stream a text stream that transcribes itself as it goes along and have a speaker speak fluently ?

I see several opportunities:

  • Webinar in several languages
  • Live show
  • Translation of sermons in church for missionaries or visitors...
    etc...

In other world Speech to Speech real time !

Can't install espnet

During pip install -r requirements.txt, I get this error:

ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI.
espnet depends on torch-complex@ git+https://github.com/kamo-naoyuki/pytorch_complex.git 

However, manually installing torch-complex with pip install torch-complex, prints:

Requirement already satisfied: torch-complex in /home/alejandro/mambaforge/envs/audiogpt/lib/python3.8/site-packages (0.4.3)
Requirement already satisfied: numpy in /home/alejandro/mambaforge/envs/audiogpt/lib/python3.8/site-packages (from torch-complex) (1.22.4)

Failed building wheel

hii,
when i am installing the packages, i am facing the errors while building the wheels,
ERROR: Could not build wheels for pyworld, webrtcvad, ctc-segmentation, pycocotools, which is required to install pyproject.toml-based projects

can u please help me with it.
thank you

Multilingual Question

Hello, awesome tool guys/gals,
I would like to know if there will be any multilingual support in the future,
I want to use it to get german speech to text, is it possible?
Thanks in advance,

Generate a singing voice prompt ALWAYS generates the same voice singing the same song

I gave it different lyrics, and even tried uploading a different audio file etc. But prompts that involve generating a singing voice always produce the exact same output as:

Generate a piece of singing voice. Text sequence is 小酒窝长睫毛AP是你最美的记号. Note sequence is C#4/Db4 | F#4/Gb4 | G#4/Ab4 | A#4/Bb4 F#4/Gb4 | F#4/Gb4 C#4/Db4 | C#4/Db4 | rest | C#4/Db4 | A#4/Bb4 | G#4/Ab4 | A#4/Bb4 | G#4/Ab4 | F4 | C#4/Db4. Note duration sequence is 0.407140 | 0.376190 | 0.242180 | 0.509550 0.183420 | 0.315400 0.235020 | 0.361660 | 0.223070 | 0.377270 | 0.340550 | 0.299620 | 0.344510 | 0.283770 | 0.323390 | 0.360340.

Either something is hardcoded somewhere or the model has been overfitted.

Missing libSM.so.6

While following run.md to install and run the program on a Debian VM with NVIDIA A100 GPU, the last step gives a missing library error (below)

Google searches show that the missing library comes from OpenCV. pip install opencv-python says the requirements are already satisfied.

This is the error:

(audiogpt) root@debian-gpu4gb:~/AudioGPT# python audio-chatgpt.py  

root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.   

def backtrace(trace: np.ndarray): Traceback (most recent call last):   

File "audio-chatgpt.py", line 34, in <module>     
from audio_infer.pytorch.models import PVT   

File "/root/AudioGPT/audio_detection/audio_infer/pytorch/models.py", line 25, in <module>    from mmdet.utils import get_root_logger   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmdet/__init__.py", line 2, in <module>     import mmcv   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/__init__.py", line 4, in <module>     from .fileio import *   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/fileio/__init__.py", line 2, in <module>     from .file_client import BaseStorageBackend, FileClient   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/fileio/file_client.py", line 15, in <module>     from mmcv.utils.misc import has_method   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/utils/__init__.py", line 40, in <module>     from .env import collect_env   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/utils/env.py", line 9, in <module>     import cv2   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/cv2/__init__.py", line 5, in <module>     from .cv2 import * 

ImportError: libSM.so.6: cannot open shared object file: No such file or directory 

(audiogpt) root@debian-gpu4gb:~/AudioGPT# 

question about Image-to-Audio

Does this model could generate the music according to image content?

If the person in the picture is resting, it will generate a brisk song. If the people in the picture are exercising, the more exciting song is generated

Where is the portaspeech module?

This line here throws:

No module named 'modules.portaspeech'
File "/workspaces/AudioGPT/NeuralSeq/inference/tts/PortaSpeech.py", line 4, in
from modules.portaspeech.portaspeech import PortaSpeech
File "/workspaces/AudioGPT/audio-chatgpt.py", line 277, in init
from inference.tts.PortaSpeech import TTSInference
File "/workspaces/AudioGPT/audio-chatgpt.py", line 1058, in init
self.tts = TTS(device="cpu")
File "/workspaces/AudioGPT/audio-chatgpt.py", line 1377, in
bot = ConversationBot()
ModuleNotFoundError: No module named 'modules.portaspeech'

I see PR #15 by @Rongjiehuang deleted the module.

Was this a mistake?

Issue running the app

After installing all necessary packages and dependencies including "mmdet", still keep getting the following error:
Traceback (most recent call last):
File "/Users/kc/AudioGPT/audio-chatgpt.py", line 34, in
from audio_infer.pytorch.models import PVT
File "/Users/kc/AudioGPT/audio_detection/audio_infer/pytorch/models.py", line 25, in
from mmdet.utils import get_root_logger
ImportError: cannot import name 'get_root_logger' from 'mmdet.utils' (/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/mmdet/utils/init.py)

Requires CUDA

I am trying to run it on a MacBook Air M1.

After installing and trying to run, I get the error:

| /opt/homebrew/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/cuda/__init__.py:211 in  │
│ _lazy_init                                                                                       │
│                                                                                                  │
│   208 │   │   │   │   "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "        │
│   209 │   │   │   │   "multiprocessing, you must use the 'spawn' start method")                  │
│   210 │   │   if not hasattr(torch._C, '_cuda_getDeviceCount'):                                  │
│ ❱ 211 │   │   │   raise AssertionError("Torch not compiled with CUDA enabled")                   │
│   212 │   │   if _cudart is None:                                                                │
│   213 │   │   │   raise AssertionError(                                                          │
│   214 │   │   │   │   "libcudart functions unavailable. It looks like you have a broken build?   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: Torch not compiled with CUDA enabled

What is the Scope of this Project.

I am not good in English and so I want to know that what is this tool Actually used for? I want to contribute. If this is being build for "Talking" with ChatGPT. Please hop me in. I am a Beginner on Github and Python and want to expand my Skills. Please let me know.

Thanks

CUDA kernel erros

CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. How i can fix this. I can not execute audio-gpt.py successfully.

Dependencies in `requirements.txt` have module conflicts.

Background

Dependencies in requirements.txt have module conflicts.

Description

There are multiple dependencies mentioned in the requirements.txt file(the -> means the indirect dependencies):

opencv-contrib-python
basicsr->opencv-python
albumentations->opencv-python-headless
invisible-watermark->opencv-python
mmcv->opencv-python
qudida->opencv-python-headless

The official spec mentioned that the opencv-python package is for the desktop environment, while opencv-python-headless is for the server environment. The documentation also states that these packages cannot be installed simultaneously (the exact wording is: “There are four different packages (see options 1, 2, 3, and 4 below) and you should SELECT ONLY ONE OF THEM.”). This is because they both use the same module name cv2.

During the installation process using pip, the package installed later will override the cv2 module from the previously installed package (specifically, the modules within the cv2 folders that exist in both packages). Furthermore, the dependency graph even includes different versions of these two packages. It is certain that the common files with the same path in these two packages contain different contents. Therefore, there may be functional implications when using them. However, without analyzing the specific code and function call hierarchy of this project, it can be stated that issues related to overwriting and module conflicts do exist.

Steps to Reproduce

pip install -r requirements.txt

Desired Change

Indeed, it is not an ideal behavior for modules to be overwritten, even if they are not actively used or if the overwritten module is the one being called. It introduces uncertainty and can cause issues in the long run, especially if there are changes or updates to the overwritten modules in future development. It is generally recommended to avoid such conflicts and ensure that only the necessary and compatible dependencies are declared in the requirements to maintain a stable and predictable environment for the project.

We believe that although this project can only modify direct dependencies and indirect dependencies are a black box, it is possible to add additional explanations rather than directly declaring both conflicting packages in the requirements.txt file. Or maybe you can check the dependencies and remove the redundant dependencies from the requirements.txt.

Adding extra explanations or documentation about the potential conflicts and the need to choose only one of the conflicting packages can help developers understand the issue and make informed decisions. Including a clear instruction or warning in the project’s documentation can guide users to choose the appropriate package based on their specific requirements.

Mac with lacks Nvidia graphics capabilities : AssertionError: Torch not compiled with CUDA enabled

Hi there,

"I am utilizing a Macintosh computer, which lacks Nvidia graphics capabilities. Could someone kindly provide instructions on how to execute tasks using the CPU? Additionally, I am curious if there exists an alternative to CUDA. I've observed that stable diffusion functions smoothly on the CPU, whereas AudiioGPT seems to encounter issues in that regard.

Steps followed :

create a new environment

conda create -n audiogpt python=3.8

prepare the basic environments

pip install -r requirements.txt

download the foundation models you need

bash download.sh

prepare your private openAI private key

export OPENAI_API_KEY={Your_Private_Openai_Key}

Start AudioGPT !

python audio-chatgpt.py


(audiogpt) Micky@Micky-iMac AudioGPT % python audio-chatgpt.py
/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def backtrace(trace: np.ndarray):
Initializing AudioGPT
Initializing T2I to cuda:0
Traceback (most recent call last):
File "audio-chatgpt.py", line 1379, in
bot = ConversationBot()
File "audio-chatgpt.py", line 1057, in init
self.t2i = T2I(device="cuda:0")
File "audio-chatgpt.py", line 116, in init
self.pipe.to(device)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 681, in to
module.to(torch_device, torch_dtype)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1749, in to
return super().to(*args, **kwargs)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 927, in to
return self._apply(convert)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 602, in _apply
param_applied = fn(param)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/cuda/init.py", line 211, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
(audiogpt) Micky@Mickys-iMac AudioGPT %

ModuleNotFoundError: No module named 'ldm.util'; 'ldm' is not a package

When I was installing this project, I reported an error message:
(audiogpt) administrator@zhaowt:~/AudioGPT$ python audio-chatgpt.py
[2023-04-27 09:31:28] audio-chatgpt.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: Error when calling Cognitive Face API:
status_code: 401
code: 401
message: Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.

[2023-04-27 09:31:28] audio-chatgpt.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: img_url:https://raw.githubusercontent.com/Microsoft/Cognitive-Face-Windows/master/Data/detection1.jpg
[2023-04-27 09:31:29] audio-chatgpt.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: Error when calling Cognitive Face API:
status_code: 401
code: 401
message: Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.

[2023-04-27 09:31:29] audio-chatgpt.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: img_url:/data1/mingmingzhao/label/data_sets_teacher_1w/47017613_1510574400_out-video-jzc70f41fa6f7145b4b66738f81f082b65_f_1510574403268_t_1510575931221.flv_0001.jpg
[]
Traceback (most recent call last):
File "audio-chatgpt.py", line 26, in
from ldm.util import instantiate_from_config
ModuleNotFoundError: No module named 'ldm.util'; 'ldm' is not a package


Do any friends know how to solve this problem?

KeyError: 'binary_data_dir'

class BaseTTSInfer:
def init(self, hparams, device=None):
if device is None:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
self.hparams = hparams
self.device = device
self.data_dir = hparams['binary_data_dir']

in this line appear this bug:"KeyError: 'binary_data_dir'", I try many times but it not work

What is a Model assignment in code?

I read your paper, and it was a very interesting read. As a result, I summarized my review of the paper on my Notion page. However, I had a question that came up while doing so. Could you provide some insight into what the "Model Assignment" refers to in the actual code of AudioGPT?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.