aigc-audio / audiogpt Goto Github PK

View Code? Open in Web Editor NEW

9.9K 131.0 855.0 23.56 MB

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Home Page: https://huggingface.co/spaces/AIGC-Audio/AudioGPT

License: Other

Python 99.78% Shell 0.22%

audio gpt music sound speech talking-head

audiogpt's Introduction

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

We provide our implementation and pretrained models as open source in this repository.

Get Started

Please refer to run.md

Capabilities

Here we list the capability of AudioGPT at this time. More supported models and tasks are coming soon. For prompt examples, refer to asset.

Currently not every model has repository.

Speech

Task	Supported Foundation Models	Status
Text-to-Speech	FastSpeech, SyntaSpeech, VITS	Yes (WIP)
Style Transfer	GenerSpeech	Yes
Speech Recognition	whisper, Conformer	Yes
Speech Enhancement	ConvTasNet	Yes (WIP)
Speech Separation	TF-GridNet	Yes (WIP)
Speech Translation	Multi-decoder	WIP
Mono-to-Binaural	NeuralWarp	Yes

Sing

Task	Supported Foundation Models	Status
Text-to-Sing	DiffSinger, VISinger	Yes (WIP)

Audio

Task	Supported Foundation Models	Status
Text-to-Audio	Make-An-Audio	Yes
Audio Inpainting	Make-An-Audio	Yes
Image-to-Audio	Make-An-Audio	Yes
Sound Detection	Audio-transformer	Yes
Target Sound Detection	TSDNet	Yes
Sound Extraction	LASSNet	Yes

Talking Head

Task	Supported Foundation Models	Status
Talking Head Synthesis	GeneFace	Yes (WIP)

Acknowledgement

We appreciate the open source of the following projects:

ESPNet NATSpeech Visual ChatGPT Hugging Face LangChain Stable Diffusion

audiogpt's People

Contributors

Stargazers

Watchers

Forkers

cat-stack-boop guangkechen aceluodan robinatp techthiyanes ishine stracerxx roobenk johndpope i-z-z-y fortressrain thomasczer paramedick fwl2000 oceanumeric mediaeater mec-is vamoko jobilert henryhesz lightshifted rongjiehuang cooltim66 phoebussi eltociear studiousdog d3p10y wensiyuansix qalabeabbas49 lwppwl farmingtong 8095 shen-dongming d-mad roman-212 wikipedia2008 gryhkn bingtian88 alirezat96 jesselau76 pamddg droc12 ririkoa lostelement ericyue hongwen-sun tvbboy2015 kanehui ai-awe yuniancong raoxb cellinlab hay-man bigfootcn 4everjazz wangmenghen qraccess hhy5277 catkin-xx masemxiao iamkomen nasa03 rooben-me tonywhite11 e-kiss-me lycokie youngjuene icepro blaine888 dayif777 identyanonim wesratke deivisateivis9994 patryk-s-w 0xforked drpariah mbaneshi xiaoyi8383 jakubhalik archrootboot woniesong92 givingjan vaderiote mkami2828 joe12801 filexft mazidefied vpn-v2ray rock3tup hongbopeng sravani-tumuluri echomaster drcdebtosh jaogatao baymax88 tim204 aparajeya-brarista vlnk2023 vpegasus tallbandz

audiogpt's Issues

Can run in cpu????

python audio-chatgpt.py
Initializing AudioGPT
Initializing Make-An-Audio to cpu
LatentDiffusion_audio: Running in eps-prediction mode
DiffusionWrapper has 160.22 M params.
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 106, 106) = 44944 dimensions.
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight']

This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
TextEncoder comes with 111.32 M params.
Traceback (most recent call last):
File "audio-chatgpt.py", line 1378, in
bot = ConversationBot()
File "audio-chatgpt.py", line 1057, in init
self.t2a = T2A(device="cpu")
File "audio-chatgpt.py", line 144, in init
self.sampler = self._initialize_model('text_to_audio/Make_An_Audio/configs/text_to_audio/txt2audio_args.yaml', 'text_to_audio/Make_An_Audio/useful_ckpts/ta40multi_epoch=000085.ckpt', device=device)
File "audio-chatgpt.py", line 150, in _initialize_model
model.load_state_dict(torch.load(ckpt, map_location='cpu')["state_dict"], strict=False)
File "/root/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/root/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

talking head status

Amazing work, would like to know status of talking head powered by audio as described in your paper, is it currently possible to test or not. thanks again for the cool work!

Is it convenient to take a look at your pip list?

Is it convenient to take a look at your pip list? I have trouble in installing packages as your run.md. Dependency conflicts issues are often found.

ImportError: cannot import name dataclass_transform

fresh install following run.md

python audio-chatgpt.py
Traceback (most recent call last):
  File "audio-chatgpt.py", line 9, in <module>
    import gradio as gr
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/gradio/__init__.py", line 3, in <module>
    import gradio.components as components
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/gradio/components.py", line 32, in <module>
    from fastapi import UploadFile
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fastapi/__init__.py", line 7, in <module>
    from .applications import FastAPI as FastAPI
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fastapi/applications.py", line 15, in <module>
    from fastapi import routing
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fastapi/routing.py", line 22, in <module>
    from fastapi import params
  File "/anaconda/envs/azureml_py38/lib/python3.8/site-packages/fastapi/params.py", line 4, in <module>
    from pydantic.fields import FieldInfo, Undefined
  File "pydantic/__init__.py", line 2, in init pydantic.__init__
  File "pydantic/dataclasses.py", line 39, in init pydantic.dataclasses
    # +=========+=========================================+
ImportError: cannot import name dataclass_transform

cannot found https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text

<n/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/config.yaml https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/swa.pth
--2024-06-24 14:59:41-- https://huggingface.co/AIGC-Audio/AudioGPT/blob/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/config.yaml
Resolving huggingface.co (huggingface.co)... 168.143.162.58, 2a03:2880:f11c:8183:face:b00c:0:25de
Connecting to huggingface.co (huggingface.co)|168.143.162.58|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.
--2024-06-24 14:59:42-- https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/swa.pth
Connecting to huggingface.co (huggingface.co)|168.143.162.58|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.
(base) [root@VM-231-58-tencentos audiocaps_cntrstv_cnn14rnn_trm]# wget https://huggingface.co/AIGC-Audio/AudioGPT/blob/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/config.yaml
--2024-06-24 14:59:51-- https://huggingface.co/AIGC-Audio/AudioGPT/blob/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/config.yaml
Resolving huggingface.co (huggingface.co)... 168.143.162.58, 2a03:2880:f11c:8183:face:b00c:0:25de
Connecting to huggingface.co (huggingface.co)|168.143.162.58|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.
(base) [root@VM-231-58-tencentos audiocaps_cntrstv_cnn14rnn_trm]# wget https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/swa.pth
--2024-06-24 15:00:01-- https://huggingface.co/AIGC-Audio/AudioGPT/resolve/main/audio_to_text/audiocaps_cntrstv_cnn14rnn_trm/swa.pth
Resolving huggingface.co (huggingface.co)... 168.143.162.58, 2a03:2880:f11c:8183:face:b00c:0:25de
Connecting to huggingface.co (huggingface.co)|168.143.162.58|:443... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.

Portaspeech removed but reference still in main python file

https://github.com/AIGC-Audio/AudioGPT/blob/a674543c537bb3a60b8f521df76b97a627c1c379/audio-chatgpt.py#LL275C2-L275C2

With portaspeedch being deleted, this code will not work in the main program.

ModuleNotFoundError: No module named 'modules.portaspeech'

i can no find this files.

No module named 'ldm'

File "audio-chatgpt.py", line 24, in
from ldm.util import instantiate_from_config
ModuleNotFoundError: No module named 'ldm'

also no module utils.hparams and audio_infer.utils

no ldm.data package

There is no ldm.data package in git. Help fix it? Thanks.

best practise to run project in windows machine

the machine si running windows11 with rtx 4050 card.
should I run it in wsl or directly in windows?
Thanks

Bark audio model and talking head additions

Would be amazing if you can:

Turn the "talking head" images into animated gifs lip-sync'ed to the wav audio generated by tts using Bark (Bark is currently the best and most realstic/emotion-driven audio model that is free to use, even better than the best commercial closed source model Eleven Labs)
Then generating an mp4 from the combination of animated gif and wav audio on the fly, replacing the starting-point animated gif on the screen.

This can be done by integrating code from one of the following choices:

from this one-click install GUI: https://www.youtube.com/watch?v=f_NUZDBiaZg
or using Sadtalker: https://www.youtube.com/watch?v=aJIq_UoZv24
or this google colab python code below (supports 30+ languages):
https://spltech.co.uk/using-wav2lip-and-google-cloud-wavenet-to-create-voice-overs-in-more-than-30-languages/
or using VideoReTalking：Audio-based Lip Synchronization for Talking Head Video
https://colab.research.google.com/github/vinthony/video-retalking/blob/main/quick_demo.ipynb
Demo https://www.youtube.com/watch?v=CgZVKSkdtRo

Bark oobabooga tts extention:
https://github.com/wsippel/bark_tts

how to run this project without GPU,I need help!

RuntimeError: CUDA error: invalid device ordinal

RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Good

👋 👋😊

downloading cmudict on Mac

a pip install cmudict operates, but it isn't used, so you have to manually download it.

see this post in SO: this is craziness, but it did download.
gunthercox/ChatterBot#930 (comment)

in the ui go to all packages and select cmudict (with the spacebar).

Error - requirements.txt - opencv-contrib-python==4.3.0.36 not available

The requirements.txt (line 37) asks for opencv-contrib-python==4.3.0.36 but since that version has been yanked and not available anymore, it gives me an error:

ERROR: Could not find a version that satisfies the requirement opencv-contrib-python==4.3.0.36
...
ERROR: No matching distribution found for opencv-contrib-python==4.3.0.36

Would it be sufficient to just use the latest opencv-contrib-python?

_pickle.UnpicklingError: pickle data was truncated

After hours of struggling with the installation I am getting this error. Any solution plz

(audiogpt) C:\sd\AudioGPT>python audio-chatgpt.py
Initializing AudioGPT
Initializing T2I to cuda:0
C:\Users\nitin\miniconda3\envs\audiogpt\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
unet\diffusion_pytorch_model.safetensors not found
Initializing ImageCaptioning to cuda:0
Initializing Make-An-Audio to cuda:0
LatentDiffusion_audio: Running in eps-prediction mode
DiffusionWrapper has 160.22 M params.
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 106, 106) = 44944 dimensions.
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
making attention of type 'vanilla' with 256 in_channels
C:\Users\nitin\miniconda3\envs\audiogpt\lib\site-packages\huggingface_hub\file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
tokenizer_config.json: 100%|████████████████████████████████████████████████████████████████| 48.0/48.0 [00:00<?, ?B/s]
config.json: 100%|████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 36.6kB/s]
vocab.txt: 100%|█████████████████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 531kB/s]
tokenizer.json: 100%|████████████████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 735kB/s]
model.safetensors: 100%|████████████████████████████████████████████████████████████| 440M/440M [00:35<00:00, 12.3MB/s]
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.weight', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
TextEncoder comes with 111.32 M params.
Traceback (most recent call last):
  File "audio-chatgpt.py", line 1377, in <module>
    bot = ConversationBot()
  File "audio-chatgpt.py", line 1057, in __init__
    self.t2a = T2A(device="cuda:0")
  File "audio-chatgpt.py", line 144, in __init__
    self.sampler = self._initialize_model('text_to_audio/Make_An_Audio/configs/text_to_audio/txt2audio_args.yaml', 'text_to_audio/Make_An_Audio/useful_ckpts/ta40multi_epoch=000085.ckpt', device=device)
  File "audio-chatgpt.py", line 150, in _initialize_model
    model.load_state_dict(torch.load(ckpt, map_location='cpu')["state_dict"], strict=False)
  File "C:\Users\nitin\miniconda3\envs\audiogpt\lib\site-packages\torch\serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "C:\Users\nitin\miniconda3\envs\audiogpt\lib\site-packages\torch\serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: pickle data was truncated

links in readme are not not working

Build and run steps

Congrats on releasing this amazing project! Could you please put clear steps on how to build and run these pre-trained models, and hopefully that is feasible on my regular laptop.

Clarify

KeyError: binary_data_dir error when trying to launch

I installed all the files and even downloaded the whole repo from Hugging Face, but I get "KeyError: binary_data_dir" when I try to launch the pythons script.

C:\audiogpt2\NeuralSeq\inference\tts\base_tts_infer.py:20 in __init__                            │
│                                                                                                  │
│    17 │   │   │   device = 'cuda' if torch.cuda.is_available() else 'cpu'                        │
│    18 │   │   self.hparams = hparams                                                             │
│    19 │   │   self.device = device                                                               │
│ ❱  20 │   │   self.data_dir = hparams['binary_data_dir']                                         │
│    21 │   │   self.preprocessor, self.preprocess_args = load_data_preprocessor()                 │
│    22 │   │   self.ph_encoder, self.word_encoder = self.preprocessor.load_dict(self.data_dir)    │
│    23 │   │   self.ds_cls = FastSpeechWordDataset                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
KeyError: 'binary_data_dir'

Real-time Audio -> Text

I'm looking for a solution that can ultimately listen to an audio-out device (audio in is possible but I dislike that route for users), and run speaker diarization so split voices hopefully in "real time" .

Real time audio translation to voice

It would be great to have live : speech to text - translate - text to speech, while having natural diction without breaks.
-> How to stream a text stream that transcribes itself as it goes along and have a speaker speak fluently ?

I see several opportunities:

Webinar in several languages
Live show
Translation of sermons in church for missionaries or visitors...
etc...

In other world Speech to Speech real time !

need credentials

need credentials?what's the username and password?

Can't install espnet

During pip install -r requirements.txt, I get this error:

ERROR: Packages installed from PyPI cannot depend on packages which are not also hosted on PyPI.
espnet depends on torch-complex@ git+https://github.com/kamo-naoyuki/pytorch_complex.git

However, manually installing torch-complex with pip install torch-complex, prints:

Requirement already satisfied: torch-complex in /home/alejandro/mambaforge/envs/audiogpt/lib/python3.8/site-packages (0.4.3)
Requirement already satisfied: numpy in /home/alejandro/mambaforge/envs/audiogpt/lib/python3.8/site-packages (from torch-complex) (1.22.4)

Failed building wheel

hii,
when i am installing the packages, i am facing the errors while building the wheels,
ERROR: Could not build wheels for pyworld, webrtcvad, ctc-segmentation, pycocotools, which is required to install pyproject.toml-based projects

can u please help me with it.
thank you

Multilingual Question

Hello, awesome tool guys/gals,
I would like to know if there will be any multilingual support in the future,
I want to use it to get german speech to text, is it possible?
Thanks in advance,

Generate a singing voice prompt ALWAYS generates the same voice singing the same song

I gave it different lyrics, and even tried uploading a different audio file etc. But prompts that involve generating a singing voice always produce the exact same output as:

Generate a piece of singing voice. Text sequence is 小酒窝长睫毛AP是你最美的记号. Note sequence is C#4/Db4 | F#4/Gb4 | G#4/Ab4 | A#4/Bb4 F#4/Gb4 | F#4/Gb4 C#4/Db4 | C#4/Db4 | rest | C#4/Db4 | A#4/Bb4 | G#4/Ab4 | A#4/Bb4 | G#4/Ab4 | F4 | C#4/Db4. Note duration sequence is 0.407140 | 0.376190 | 0.242180 | 0.509550 0.183420 | 0.315400 0.235020 | 0.361660 | 0.223070 | 0.377270 | 0.340550 | 0.299620 | 0.344510 | 0.283770 | 0.323390 | 0.360340.

Either something is hardcoded somewhere or the model has been overfitted.

Música

Completely local version?

Any way we can use this without api key? Maybe gpt4all?

Can you please tell me where to insert openai api?

Missing libSM.so.6

While following run.md to install and run the program on a Debian VM with NVIDIA A100 GPU, the last step gives a missing library error (below)

Google searches show that the missing library comes from OpenCV. pip install opencv-python says the requirements are already satisfied.

This is the error:

(audiogpt) root@debian-gpu4gb:~/AudioGPT# python audio-chatgpt.py  

root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.   

def backtrace(trace: np.ndarray): Traceback (most recent call last):   

File "audio-chatgpt.py", line 34, in <module>     
from audio_infer.pytorch.models import PVT   

File "/root/AudioGPT/audio_detection/audio_infer/pytorch/models.py", line 25, in <module>    from mmdet.utils import get_root_logger   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmdet/__init__.py", line 2, in <module>     import mmcv   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/__init__.py", line 4, in <module>     from .fileio import *   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/fileio/__init__.py", line 2, in <module>     from .file_client import BaseStorageBackend, FileClient   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/fileio/file_client.py", line 15, in <module>     from mmcv.utils.misc import has_method   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/utils/__init__.py", line 40, in <module>     from .env import collect_env   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/mmcv/utils/env.py", line 9, in <module>     import cv2   

File "/root/miniconda3/envs/audiogpt/lib/python3.8/site-packages/cv2/__init__.py", line 5, in <module>     from .cv2 import * 

ImportError: libSM.so.6: cannot open shared object file: No such file or directory 

(audiogpt) root@debian-gpu4gb:~/AudioGPT#

opencv-contrib-python==4.3.0.36 is yanked

i run the command pip install requirements.txt
and got an error:
ERROR: No matching distribution found for opencv-contrib-python==4.3.0.36

question about Image-to-Audio

Does this model could generate the music according to image content?

If the person in the picture is resting, it will generate a brisk song. If the people in the picture are exercising, the more exciting song is generated

Where is the portaspeech module?

This line here throws:

No module named 'modules.portaspeech'
File "/workspaces/AudioGPT/NeuralSeq/inference/tts/PortaSpeech.py", line 4, in
from modules.portaspeech.portaspeech import PortaSpeech
File "/workspaces/AudioGPT/audio-chatgpt.py", line 277, in init
from inference.tts.PortaSpeech import TTSInference
File "/workspaces/AudioGPT/audio-chatgpt.py", line 1058, in init
self.tts = TTS(device="cpu")
File "/workspaces/AudioGPT/audio-chatgpt.py", line 1377, in
bot = ConversationBot()
ModuleNotFoundError: No module named 'modules.portaspeech'

I see PR #15 by @Rongjiehuang deleted the module.

Was this a mistake?

Issue running the app

After installing all necessary packages and dependencies including "mmdet", still keep getting the following error:
Traceback (most recent call last):
File "/Users/kc/AudioGPT/audio-chatgpt.py", line 34, in
from audio_infer.pytorch.models import PVT
File "/Users/kc/AudioGPT/audio_detection/audio_infer/pytorch/models.py", line 25, in
from mmdet.utils import get_root_logger
ImportError: cannot import name 'get_root_logger' from 'mmdet.utils' (/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/mmdet/utils/init.py)

No such file or directory: 'useful_ckpts/CLAP/CLAP_weights_2022.pth'

When running the Python script in Windows (and WSL2), I keep getting this error:

FileNotFoundError: [Errno 2] No such file or directory: 'useful_ckpts/CLAP/CLAP_weights_2022.pth'

Those files don't exist in my installation. How do I get them?

Requires CUDA

I am trying to run it on a MacBook Air M1.

After installing and trying to run, I get the error:

| /opt/homebrew/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/cuda/__init__.py:211 in  │
│ _lazy_init                                                                                       │
│                                                                                                  │
│   208 │   │   │   │   "Cannot re-initialize CUDA in forked subprocess. To use CUDA with "        │
│   209 │   │   │   │   "multiprocessing, you must use the 'spawn' start method")                  │
│   210 │   │   if not hasattr(torch._C, '_cuda_getDeviceCount'):                                  │
│ ❱ 211 │   │   │   raise AssertionError("Torch not compiled with CUDA enabled")                   │
│   212 │   │   if _cudart is None:                                                                │
│   213 │   │   │   raise AssertionError(                                                          │
│   214 │   │   │   │   "libcudart functions unavailable. It looks like you have a broken build?   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: Torch not compiled with CUDA enabled

Generating a single music note doesn't seem to work

Is there a way to prompt the ai to generate for example a single piano note? It always generates a long piece of music.

What is the Scope of this Project.

I am not good in English and so I want to know that what is this tool Actually used for? I want to contribute. If this is being build for "Talking" with ChatGPT. Please hop me in. I am a Beginner on Github and Python and want to expand my Skills. Please let me know.

Thanks

Cann't open huggingface page

the hugging face webpage cannot be opened: https://huggingface.co/spaces/AIGC-Audio/AudioGPT

CUDA kernel erros

CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1. How i can fix this. I can not execute audio-gpt.py successfully.

HuggingFace Models are missing

Most of the HuggingFace pretrained models can no longer be downloaded.

Dependencies in `requirements.txt` have module conflicts.

Background

Dependencies in requirements.txt have module conflicts.

Description

There are multiple dependencies mentioned in the requirements.txt file(the -> means the indirect dependencies):

opencv-contrib-python
basicsr->opencv-python
albumentations->opencv-python-headless
invisible-watermark->opencv-python
mmcv->opencv-python
qudida->opencv-python-headless

The official spec mentioned that the opencv-python package is for the desktop environment, while opencv-python-headless is for the server environment. The documentation also states that these packages cannot be installed simultaneously (the exact wording is: “There are four different packages (see options 1, 2, 3, and 4 below) and you should SELECT ONLY ONE OF THEM.”). This is because they both use the same module name cv2.

During the installation process using pip, the package installed later will override the cv2 module from the previously installed package (specifically, the modules within the cv2 folders that exist in both packages). Furthermore, the dependency graph even includes different versions of these two packages. It is certain that the common files with the same path in these two packages contain different contents. Therefore, there may be functional implications when using them. However, without analyzing the specific code and function call hierarchy of this project, it can be stated that issues related to overwriting and module conflicts do exist.

Steps to Reproduce

pip install -r requirements.txt

Desired Change

Indeed, it is not an ideal behavior for modules to be overwritten, even if they are not actively used or if the overwritten module is the one being called. It introduces uncertainty and can cause issues in the long run, especially if there are changes or updates to the overwritten modules in future development. It is generally recommended to avoid such conflicts and ensure that only the necessary and compatible dependencies are declared in the requirements to maintain a stable and predictable environment for the project.

We believe that although this project can only modify direct dependencies and indirect dependencies are a black box, it is possible to add additional explanations rather than directly declaring both conflicting packages in the requirements.txt file. Or maybe you can check the dependencies and remove the redundant dependencies from the requirements.txt.

Adding extra explanations or documentation about the potential conflicts and the need to choose only one of the conflicting packages can help developers understand the issue and make informed decisions. Including a clear instruction or warning in the project’s documentation can guide users to choose the appropriate package based on their specific requirements.

Mac with lacks Nvidia graphics capabilities : AssertionError: Torch not compiled with CUDA enabled

Hi there,

"I am utilizing a Macintosh computer, which lacks Nvidia graphics capabilities. Could someone kindly provide instructions on how to execute tasks using the CPU? Additionally, I am curious if there exists an alternative to CUDA. I've observed that stable diffusion functions smoothly on the CPU, whereas AudiioGPT seems to encounter issues in that regard.

Steps followed :

create a new environment

conda create -n audiogpt python=3.8

prepare the basic environments

pip install -r requirements.txt

download the foundation models you need

bash download.sh

prepare your private openAI private key

export OPENAI_API_KEY={Your_Private_Openai_Key}

Start AudioGPT !

python audio-chatgpt.py

(audiogpt) Micky@Micky-iMac AudioGPT % python audio-chatgpt.py
/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def backtrace(trace: np.ndarray):
Initializing AudioGPT
Initializing T2I to cuda:0
Traceback (most recent call last):
File "audio-chatgpt.py", line 1379, in
bot = ConversationBot()
File "audio-chatgpt.py", line 1057, in init
self.t2i = T2I(device="cuda:0")
File "audio-chatgpt.py", line 116, in init
self.pipe.to(device)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 681, in to
module.to(torch_device, torch_dtype)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1749, in to
return super().to(*args, **kwargs)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 927, in to
return self._apply(convert)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 602, in _apply
param_applied = fn(param)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/nn/modules/module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/Users/Micky/anaconda3/envs/audiogpt/lib/python3.8/site-packages/torch/cuda/init.py", line 211, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
(audiogpt) Micky@Mickys-iMac AudioGPT %

ModuleNotFoundError: No module named 'ldm.util'; 'ldm' is not a package

When I was installing this project, I reported an error message：
(audiogpt) administrator@zhaowt:~/AudioGPT$ python audio-chatgpt.py
[2023-04-27 09:31:28] audio-chatgpt.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: Error when calling Cognitive Face API:
status_code: 401
code: 401
message: Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.

[2023-04-27 09:31:28] audio-chatgpt.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: img_url:https://raw.githubusercontent.com/Microsoft/Cognitive-Face-Windows/master/Data/detection1.jpg
[2023-04-27 09:31:29] audio-chatgpt.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: Error when calling Cognitive Face API:
status_code: 401
code: 401
message: Access denied due to invalid subscription key or wrong API endpoint. Make sure to provide a valid key for an active subscription and use a correct regional API endpoint for your resource.

[2023-04-27 09:31:29] audio-chatgpt.py - - - - - - - - - - - - - - - - - - - - - eprint(line:60) :: img_url:/data1/mingmingzhao/label/data_sets_teacher_1w/47017613_1510574400_out-video-jzc70f41fa6f7145b4b66738f81f082b65_f_1510574403268_t_1510575931221.flv_0001.jpg
[]
Traceback (most recent call last):
File "audio-chatgpt.py", line 26, in
from ldm.util import instantiate_from_config
ModuleNotFoundError: No module named 'ldm.util'; 'ldm' is not a package

Do any friends know how to solve this problem?

KeyError: 'binary_data_dir'

class BaseTTSInfer:
def init(self, hparams, device=None):
if device is None:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
self.hparams = hparams
self.device = device
self.data_dir = hparams['binary_data_dir']

in this line appear this bug:"KeyError: 'binary_data_dir'", I try many times but it not work

What is a Model assignment in code?

I read your paper, and it was a very interesting read. As a result, I summarized my review of the paper on my Notion page. However, I had a question that came up while doing so. Could you provide some insight into what the "Model Assignment" refers to in the actual code of AudioGPT?

aigc-audio / audiogpt Goto Github PK

audiogpt's Introduction

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

Get Started

Capabilities

Speech

Sing

Audio

Talking Head

Acknowledgement

audiogpt's People

Contributors

Stargazers

Watchers

Forkers

audiogpt's Issues

Background

Description

Steps to Reproduce

Desired Change

Steps followed :

create a new environment

prepare the basic environments

download the foundation models you need

prepare your private openAI private key

Start AudioGPT !

Recommend Projects

Recommend Topics

Recommend Org