c0untfloyd / bark-gui Goto Github PK

View Code? Open in Web Editor NEW

This project forked from suno-ai/bark

629.0 629.0 58.0 7.75 MB

🔊 Text-Prompted Generative Audio Model with Gradio

License: MIT License

Python 59.32% Jupyter Notebook 38.90% Batchfile 1.36% Dockerfile 0.42%

ai bark generative-audio tts

bark-gui's People

Contributors

Stargazers

Watchers

bark-gui's Issues

readme.md encoding causing install to crash.

I had to load up the README.md into vs code and save it as utf-8 to get the 'pip install .' installer to run.

Install error

FileNotFoundError: [WinError 2] 系统找不到指定的文件。: 'bark-gui'

OS：Windows 11 PRO X64

This works great. I couldn't get the original to work on my machine, Yours worked perfectly. Can I use it in API Mode? Like AUTOMATIC1111/stable-diffusion-webui? Feed it the text string from another app, and get the voice back?

Where is the 1 click installer?

I appreciate you making this available as it is an awesome application.
However I'm unsure where to find the 1 click installer for windows.

Also is there a preferred location where this should go, so that it gets linked to CUDA etc.
ie C:\UsersABC\Bark

or if I want to link it to oobabooga would it be better in that path?

Thanks and best regards

Getting AttributeError while running StartBark.bat on Windows 11

Traceback (most recent call last):
File "C:\projects\bark-gui\webui.py", line 295, in
gr.Markdown(f"### {APPTITLE}")
File "C:\Python39\lib\site-packages\gradio\components.py", line 5931, in init
IOComponent.init(
File "C:\Python39\lib\site-packages\gradio\components.py", line 215, in init
else self.postprocess(initial_value)
File "C:\Python39\lib\site-packages\gradio\components.py", line 5950, in postprocess
return self.md.render(unindented_y)
File "C:\Python39\lib\site-packages\markdown_it\main.py", line 267, in render
return self.renderer.render(self.parse(src, env), self.options, env)
File "C:\Python39\lib\site-packages\markdown_it\main.py", line 252, in parse
self.core.process(state)
File "C:\Python39\lib\site-packages\markdown_it\parser_core.py", line 32, in process
rule(state)
File "C:\Python39\lib\site-packages\markdown_it\rules_core\linkify.py", line 30, in linkify
raise ModuleNotFoundError("Linkify enabled but not installed.")
ModuleNotFoundError: Linkify enabled but not installed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\projects\bark-gui\webui.py", line 408, in
barkgui.close()
File "C:\Python39\lib\site-packages\gradio\blocks.py", line 1389, in exit
self.config = self.get_config_file()
File "C:\Python39\lib\site-packages\gradio\blocks.py", line 1356, in get_config_file
props = block.get_config() if hasattr(block, "get_config") else {}
File "C:\Python39\lib\site-packages\gradio\components.py", line 5954, in get_config
"value": self.value,
AttributeError: 'Markdown' object has no attribute 'value'

Generated voice is weird, Error

here is some voile clone attempt of a person with some weird assent

Text prompt
Now what i like you to do [clears throat] is grab -- some paint brush [laughs] and start painting just like me

sound file
http://sndup.net/n9ty

Error:
Generating Text (1/1) -> custom\MeMyselfAndI (Seed 2155879416):Now what i like you to do [clears throat] is grab -- some paint brush [laughs] and start painting just like me
2023-06-16 01:28:55 | ERROR | asyncio | Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "D:\bark_win\installer_files\env\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "D:\bark_win\installer_files\env\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
2023-06-16 01:28:55 | ERROR | asyncio | Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "D:\bark_win\installer_files\env\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "D:\bark_win\installer_files\env\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

How it happened
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
which i didn't do .. actually it stopped abruptly and displayed this line

ImportError: cannot import name 'pack' from 'einops' and 2 more

Thanks @C0untFloyd for resolving the "Pytorch_seed" issue! Now encountered the following after the issue above is resolved (forgot to mention I'm running StartBark.bat on Windows 10),

Traceback (most recent call last):
File "webui.py", line 21, in
from cloning.clonevoice import clone_voice
File "C:\Users\J\bark-gui-main\cloning\clonevoice.py", line 4, in
from bark.hubert.pre_kmeans_hubert import CustomHubert
File "C:\Users\J\bark-gui-main\bark\hubert\pre_kmeans_hubert.py", line 14, in
from einops import pack, unpack
ImportError: cannot import name 'pack' from 'einops' (C:\Users\J\AppData\Local\Programs\Python\Python38\lib\site-packages\einops_init_.py)

Checked and the einops has been installed already. Thx!

ModuleNotFoundError: No module named 'gradio'

install was successful all modules are installed D:\bark_win\installer_files\env\Lib\site-packages including gradio but when i clicked on StartBark i got this error

DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
from distutils.command.check import check
Traceback (most recent call last):
File "D:\bark_win\bark-gui\webui.py", line 5, in
import gradio as gr
ModuleNotFoundError: No module named 'gradio'

ModuleNotFoundError on Windows

I just updated my version of the project (currently at 664e2d8) and when I tried to run bark I got the following error when installing requirements:

Successfully installed bark-ui-enhanced-0.7.0 Traceback (most recent call last): File "C:\Users\user\dev\bark_win\bark-gui\webui.py", line 5, in <module> import gradio as gr File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\gradio\__init__.py", line 3, in <module> import gradio.components as components File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\gradio\components.py", line 26, in <module> import altair as alt File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\altair\__init__.py", line 607, in <module> from .vegalite import * File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\altair\vegalite\__init__.py", line 2, in <module> from .v5 import * File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\altair\vegalite\v5\__init__.py", line 2, in <module> from .schema import * File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\altair\vegalite\v5\schema\__init__.py", line 2, in <module> from .core import * File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\altair\vegalite\v5\schema\core.py", line 4, in <module> from altair.utils.schemapi import SchemaBase, Undefined, _subclasses File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\altair\utils\__init__.py", line 1, in <module> from .core import ( File "C:\Users\user\AppData\Roaming\Python\Python310\site-packages\altair\utils\core.py", line 15, in <module> import pandas as pd ModuleNotFoundError: No module named 'pandas'

I have tried installing pandas directly but pip says the dependency is already installed.

Windows 10

Cloned voice/custom prompt with "use coarse history" failing to generate voice

I used Bark-GUI to clone a prompt from an audio sample, that worked great. When I try to create speech from text using the custom voice I get the following error, I am able to create arbitrary audio from text using the pre-built prompts. This error only happens when I have "use coarse history" checked. Possibly this is a bark-gui problem.

Generating Text (1/1) -> custom\MeMyselfAndI:Hello Sir, How can I help you today?
Traceback (most recent call last):
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\venv\Lib\site-packages\gradio\routes.py", line 399, in run_predict
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\venv\Lib\site-packages\gradio\blocks.py", line 1299, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\venv\Lib\site-packages\gradio\blocks.py", line 1022, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\venv\Lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\venv\Lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\venv\Lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\venv\Lib\site-packages\gradio\helpers.py", line 588, in tracked_fn
response = fn(*args)
^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\webui.py", line 114, in generate_text_to_speech
audio_array = generate_audio(text, selected_speaker, text_temp, waveform_temp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\bark\api.py", line 113, in generate_audio
out = semantic_to_waveform(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\bark\api.py", line 54, in semantic_to_waveform
coarse_tokens = generate_coarse(
^^^^^^^^^^^^^^^^
File "C:\Users\toor\Desktop\Ai\seait_installers_version_0.1.4\bark-gui\bark\generation.py", line 592, in generate_coarse
round(x_coarse_history.shape[-1] / len(x_semantic_history), 1)
AssertionError

Can we get a full huggingface package as a 7z, post install?

Living in a place with rural internet means that having to batch script install things has an extremely high probability of just breaking due to speed randomly :(

3 possible improvements

Hi there ! Great work, I really like it, but I have 3 question for possible improvements, like in StableDiffusion by Automatic1111.

Local models storage. Now all models (OS windows) are downloading to Users/.cache folder. It will be great to change it to local in repository folder. For example bark/models
Add run vram options -medram, -lowram
Local cuda support without cuda install

ModuleNotFoundError: No module named 'pytorch_seed'

Hi I'm a newbie here.

Followed all the steps but still getting this, anyone knows what's going on? Tried google but can't even find anything related to 'pytorch_seed'.

Traceback (most recent call last):
File "webui.py", line 11, in
import pytorch_seed
ModuleNotFoundError: No module named 'pytorch_seed'

Please help, thanks!

No gradio module after clean install

Hi ! I have a problem with clean installation by one click installer.
There is "No gradio module" exception :(
tried v 0.7.0, 0.7.1

No module named 'bark.clonevoice'

No module named 'bark.clonevoice'. I wonder how I can fix it. Thanks!

Assertion Error when generating custom voice

I uploaded a clip of me talking for around 1 minutes and 20 seconds. I uploaded the script, then I tried to generate a clip saying "Hello, I am an AI."

It failed and I received this error message:

Generating Text (1/1) -> custom\me1:Hello, I am an AI..
Traceback (most recent call last):
File "J:\bark-gui\installer\installer_files\env\lib\site-packages\gradio\routes.py", line 399, in run_predict
output = await app.get_blocks().process_api(
File "J:\bark-gui\installer\installer_files\env\lib\site-packages\gradio\blocks.py", line 1299, in process_api
result = await self.call_function(
File "J:\bark-gui\installer\installer_files\env\lib\site-packages\gradio\blocks.py", line 1022, in call_function
prediction = await anyio.to_thread.run_sync(
File "J:\bark-gui\installer\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "J:\bark-gui\installer\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "J:\bark-gui\installer\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "J:\bark-gui\installer\installer_files\env\lib\site-packages\gradio\helpers.py", line 588, in tracked_fn
response = fn(*args)
File "J:\bark-gui\installer\bark-gui\webui.py", line 114, in generate_text_to_speech
audio_array = generate_audio(text, selected_speaker, text_temp, waveform_temp)
File "J:\bark-gui\installer\bark-gui\bark\api.py", line 113, in generate_audio
out = semantic_to_waveform(
File "J:\bark-gui\installer\bark-gui\bark\api.py", line 54, in semantic_to_waveform
coarse_tokens = generate_coarse(
File "J:\bark-gui\installer\bark-gui\bark\generation.py", line 592, in generate_coarse
round(x_coarse_history.shape[-1] / len(x_semantic_history), 1)
AssertionError

Special tags not working

No special tags are working if prompts are directly inputted, not even singing. Only read out the content without special tags. XML format is better, still only some worked.

Anyone encountering the same thing and knows how this can be solved? Thx.

Change Default Port

How is this done? Thank you!

No module named Fairseq

This is not an error from your installer, but probably a glitch at my end.

As noted previously the installer works great and opens the gui which is really nice to use.
However when I later try to run the gui using StartBark.bat I get an error:

File "C:\Users\ABC\Bark-tts\bark_win\bark-gui\bark\hubert\pre_kmeans_hubert.py", line 16, in
import fairseq
ModuleNotFoundError: No module named 'fairseq'

I know the module is in there somewhere, because it comes up on the install, however
I don't know how to get the gui activation to find the module.

At present I activate the gui by simply reinstalling it again using windows_run.bat
which works fine, but seems un-necessary.
Putting this out there in case others have a similar issue.

Any suggestions would be appreciated - and thanks in advance.

Prompt steps say something less

It is best to create a folder with an AI or other name on the SSD hard disk, enter this folder under window11, enter cmd or Powershell in the address bar, and enter in the black command line
git clone https://github.com/C0untFloyd/bark-gui
You will download a bark-gui folder to the AI folder you created,
At this time, you need to enter bark-gui from the command line or enter cmd in the address bar again under the bark-gui folder in windows, and then execute
pip install .
Otherwise, it will prompt that the .py file cannot be found.
pip install gradio
pip install soundfile
If you have installed stablediffusion, you may have a window environment, pytorch2.0, do not run windows_start.bat directly, you will download and install all of them once, this script is not smart enough, and it is not very good for novices. The author's hints are not easy for me as a novice to understand, and I wasted a lot of time. No conversational scripts are installed. I hope these mistakes will help others.

Could you make preloading models as option ?

Hi !
I have a suggestion: could you make models preloading as optional ?
I'm trying to install UI but without any positive results and each times it downloaded models (11 gb)

Import error when running after a fresh install

ImportError: cannot import name 'deprecated' from 'typing_extensions'
typing_extensions needs to be updated from 4.4 to 4.7.1 to fix it.

Using Curl possibility?

I really am liking what you have here!

I was wondering if there is a way we might be able to use CURL after launching the webgui.py to also simulate a REST type call?
(I'm a super n00b haha)

Just playing around I got something like this:

$ curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" -H "Host: 127.0.0.1" -H "Origin: http://127.0.0.1:7860" -H "Sec-WebSocket-Key: 4lgr09kc4AzoVnxzSr64UA==" -H "Sec-WebSocket-Version: 13" http://127.0.0.1:7860/queue/join 

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: 0qVPWFBZ4HMS+aOLQhY0EgTvl+0=
date: Sun, 30 Apr 2023 17:10:00 GMT
server: uvicorn

�{"msg": "send_hash"}��

But really I am trying to accomplish something like to get the payload response:

curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" -H "Host: 127.0.0.1" -H "Origin: http://127.0.0.1:7860" -H "Sec-WebSocket-Key: 4lgr09kc4AzoVnxzSr64UA==" -H "Sec-WebSocket-Version: 13" http://127.0.0.1:7860/queue/join -s -X POST -H "Content-Type: application/json" -d "{"prompt":"Hello World!"}" http://127.0.0.1:7860

In any case, having a lot of fun with this. Super new stuff I am sure will mature quickly! :)

history prompt not found

An error is reported when generating audio：ValueError: history prompt not found。

Unable to enablemps

Thank you for putting this together.

Testing on a MAC M2 and was not able to enablemps

Also needed to do a pip install cchardet

let me know if not trying correctly

Stuck at downloading model

I am stuck at

Preloading Models
Downloading text suno/bark remote model file https://huggingface.co/suno/bark/resolve/main/text_2.pt text_2.pt to ./models

I have like 500MBit/s max Download speed, but this never finishes or shows any status.

I am on Windows 10 using the latest release 1 click installer

Any help would be appreciated

No audio I/O backend is available

When cloning voice, this error raising^
" File "X:\LC\bark-gui\venv\lib\site-packages\torchaudio\backend\no_backend.py", line 16, in load
raise RuntimeError("No audio I/O backend is available.")
RuntimeError: No audio I/O backend is available. "
How to fix it?

A suggestion for a simple workaround on bad generations for lengthy sentences

Firstly, I would like to congratulate @C0untFloyd for the great work on this GUI, which is the fastest one I've used so far in terms of generation processing speed for some reason, and the UI itself is very pleasant and easy to use.

However, we all know bark can be incredibly annoying with its hallucinations and bad audio generation, which unfortunately is an occurrence that is far from uncommon.

So I have a suggestions to improve the overall experience: an option to allow users to manually decide if each generated sample from a lengthy sentence needs to be re-generated, or if it is good to go, and if the user agrees, proceed to the next queued item from the batch and repeat this process, until we get a final audio output merging all of the user-aproved samples.

This is a a simple idea that I believe do not require huge changes on the UI and the backend logic, but would greatly improve the user experience.

Let's discuss good alternatives to overcome the glaring issues Bark unfortunately has.

docker version

Please create a docker compose file that we can use.

Btw, nice work, sir.

How to train the voice

Hi, it's indeed a huge improvement. But I wonder how to train the custom voice. I clicked "Generate" and it gave an error. Do I need to upload the dataset myself? Thanks!

Failed building wheel for scikit_learn

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for scikit_learn
Failed to build scikit_learn
ERROR: Could not build wheels for scikit_learn, which is required to install pyproject.toml-based projects

can trainning ours own chinese datasets？

Installer not downloading text_2.pt, can't use bark-gui.

The latest version installer is always stuck at downloading text_2.pt model from huggingface.

If i close it and download the model myself to put it in the model folder of bark GUI and then try to run bark GUI, i get an error:

I suppose after downloading the model the installer is supposed to do more things but since i close it because it's stuck there, i end with a non-working bark-gui.

Is there any solution to this?

clonevoice can not find tokenizer.pth

clonevoice raise this error，but in the directory I find en_tokenizer.pth， Is this a mistake？

"No cuda device detected, fallback to CPU!" despite cuda functioning elsewhere?

"No GPU being used. Careful, inference might be very slow!"

For whatever reason, this one doesn't seem to play along with my cuda/gpu (4070ti)

Invoke AI (an image generator with webui) and Oobabooga (text generator with the same kind of webui?) have no troubles running on the gpu on this system, and they use cuda as well, so i can't believe cuda is not setup correctly, perhaps it's a version issue?

So since its just this one having some issues, I don't have much of an idea of what to share though to help debug it so i'll just share this output for nvcc version

F:\AItools\AudioGens\bark-gui> nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Mon_Apr__3_17:36:15_Pacific_Daylight_Time_2023 Cuda compilation tools, release 12.1, V12.1.105 Build cuda_12.1.r12.1/compiler.32688072_0

and nvidia-smi minus the processes as i doubt they matter here
PS F:\AItools\AudioGens\bark-gui> nvidia-smi
Mon May 1 01:00:00 2023

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.50                 Driver Version: 535.50       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti   WDDM  | 00000000:01:00.0  On |                  N/A |
|  0%   30C    P8               5W / 285W |    615MiB / 12282MiB |      5%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

The main difference i can think of is the methods of installation, although similar, this one had the fewest steps and was fastest, therefore it's possible it's not as thorough potentially?
if you have an answer for this behaviour, perhaps it would be nice to add a short section on setting up cuda to work with it as nobody seems to have anything written up on it on any fork of bark.

don't hesitate to ask for further information i'm willing to help debug this as it would benefit all if this kind of issue is available for all to see and learn from

No module named 'build.lib'

Hi, your recent update was amazing! It fixed the noise problem when using short prompts. However, when I'm on Colab, there is no module named 'build.lib'. So I replaced "from build.lib.bark.api import save_as_prompt" in webui.py with "from bark.api import save_as_prompt" and it worked out fine. I wonder if there is some specific reason why you used 'build.lib'. Also, will the above change of code cause any other problems? Thanks! It's really a great work.

Good day, any help with cloning voice?

example is created, but is not working, i have a 39 seconds sample, and text for it.

Refresh button

I am suggesting adding a refresh button for reloading prompt files.
Please take a look at this suggestion and leave comments if you have any suggestions.

Backgrounds

I wanted to load a custom prompt file made from the 'clone voice' tab and create a custom voice, but just after generating the clone voice prompt file, I could not find the file in the TTS tab.
So the refresh button would enhance the user experience.

Details

The stable-diffusion-webui provides the refresh button like below codes

https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/394ffa7b0a7fff3ec484bcd084e673a8b301ccc8/modules/ui_common.py#L224

and it might be added in this part

bark-gui/webui.py

Lines 340 to 342 in 3ac2500

 with gr.Column(): 

 gr.Markdown("[Voice Prompt Library](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c)") 

 speaker = gr.Dropdown(speakers_list, value=speakers_list[0], label="Voice")

by making a function of this part

bark-gui/webui.py

Lines 282 to 292 in 3ac2500

 for root, dirs, files in os.walk("./bark/assets/prompts"): 

 for file in files: 

 if file.endswith(".npz"): 

 pathpart = root.replace("./bark/assets/prompts", "") 

 name = os.path.join(pathpart, file[:-4]) 

 if name.startswith("/") or name.startswith("\\"): 

 name = name[1:] 

 speakers_list.append(name) 

 speakers_list = sorted(speakers_list, key=lambda x: x.lower()) 

 speakers_list.insert(0, 'None')

AssertionError while using Clone Voice grenerated file

I input a .wav file and generated .npz with the corresponding text content
I have the following problem when using my file

Version Info:
Launching Bark GUI - please edit windows_run.bat to customize commandline arguments
Check for Updates? [y/n]n
smallmodels=True
enablemps=False
offloadcpu=False
forcecpu=False
autolaunch=True

Preloading Models

Loading text_small model from ./models\text.pt to cuda
Loading coarse_small model from ./models\coarse.pt to cuda
Loading fine_small model from ./models\fine.pt to cuda
Launching Bark UI Enhanced v0.4.6 Server

AssertionError
Generating Text (1/1) -> custom\MeMyselfAndI (Seed 4124450773):WOMAN: I would like an oatmilk latte please. MAN: Wow, that's expensive!Traceback (most recent call last): File "E:\bark-gui\installer_files\env\lib\site-packages\gradio\routes.py", line 414, in run_predict output = await app.get_blocks().process_api( File "E:\bark-gui\installer_files\env\lib\site-packages\gradio\blocks.py", line 1320, in process_api result = await self.call_function( File "E:\bark-gui\installer_files\env\lib\site-packages\gradio\blocks.py", line 1048, in call_function prediction = await anyio.to_thread.run_sync( File "E:\bark-gui\installer_files\env\lib\site-packages\anyio\to_thread.py", line 31, in run_sync return await get_asynclib().run_sync_in_worker_thread( File "E:\bark-gui\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread return await future File "E:\bark-gui\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run result = context.run(func, *args) File "E:\bark-gui\installer_files\env\lib\site-packages\gradio\helpers.py", line 589, in tracked_fn response = fn(*args) File "E:\bark-gui\bark-gui\webui.py", line 91, in generate_text_to_speech audio_array = generate_with_settings(text_prompt=text, voice_name=selected_speaker, semantic_temp=text_temp, coarse_temp=waveform_temp, eos_p=eos_prob) File "E:\bark-gui\bark-gui\bark\api.py", line 19, in generate_with_settings x_coarse_gen = generate_coarse( File "E:\bark-gui\bark-gui\bark\generation.py", line 603, in generate_coarse round(x_coarse_history.shape[-1] / len(x_semantic_history), 1) AssertionError

New Bark Version out today

That is all feel free to close this issue see -> https://github.com/suno-ai/bark

Add automatic split to the audio when using the Voice Swap feature

Greetings,

Something I noticed is that the Voice Swap feature tends to not work well with lengthy audio (audio that surpasses Bark's 13 seconds time limit), because the voices start to change past that time and starts sound nothing like the intended voice that is being cloned. Ideally, there should be a system in place that:

1 - Detects if the audio has more than 13 seconds
2 - If so cut the audio in the first silent pause before reaching the 13 seconds, and define it as a starting point for the other audio split, repeat and do the same for the rest of the audio in case the fragments surpass the 13 seconds limit
3 - merge all audio fragments into one

I think these changes should provide results that better respect the chosen speaker model.

Improvement request: Network service accessibility

Hi, amazing work you're doing with your development, really appreciate your contribution for the opensource and AI communities.

I love the idea of having the UI to access Bark's features, but it would be awesome if you could add some network availability through some flags like adding --listen and --port, just as we seen in a couple other frontends.

Food for thought :)

Thanks for all the work and commitment!

Chinese audio for less than 15 seconds

I used a Chinese speech model to generate audio, and the generated audio can never exceed 15 seconds. How can I solve this problem?

Training a language

Hello and thanks for your hard work!

I've been messing around with bark and its pretty rad.
However I wonder how could I be able to train in the spanish language as a tokenizer model for the voice cloning mode?

So far the prepare data set tab has errored out and there doesn't seem to be any guide to train in this repo.

Is there any guide I can follow or what would be the steps to train a new tokenizer? I already have a bunch of spanish books ready to dl as plain text

Related error trace

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "D:\Animax\vclone\bark_win\installer_files\env\lib\site-packages\gradio\routes.py", line 437, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\Animax\vclone\bark_win\installer_files\env\lib\site-packages\gradio\blocks.py", line 1352, in process_api
    result = await self.call_function(
  File "D:\Animax\vclone\bark_win\installer_files\env\lib\site-packages\gradio\blocks.py", line 1077, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Animax\vclone\bark_win\installer_files\env\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\Animax\vclone\bark_win\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "D:\Animax\vclone\bark_win\installer_files\env\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "D:\Animax\vclone\bark_win\installer_files\env\lib\site-packages\gradio\helpers.py", line 602, in tracked_fn
    response = fn(*args)
  File "D:\Animax\vclone\bark_win\bark-gui\webui.py", line 192, in training_prepare
    prepare_semantics_from_text()
TypeError: prepare_semantics_from_text() missing 1 required positional argument: 'num_generations'

Not work very well with short prompts

Hi there, I found that this repo doesn't work very well with short prompts in two or three words like "hello there". I wonder if this is related to some parameters you've set. Thanks! It's still an amazing fork of Bark!

[Suggestion] Support for .srt

Support for .srt
It would be cool to have support for .srt format as text, which would be voiced depending on the timings.

error StartBark.bat

Error in install:

 File "C:\Users\green\AppData\Local\Temp\pip-build-env-z0acumc2\overlay\Lib\site-packages\setuptools\msvc.py", line 168, in _msvc14_get_vc_env
      raise distutils.errors.DistutilsPlatformError(
  distutils.errors.DistutilsPlatformError: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for scikit-learn
Failed to build scikit-learn
ERROR: Could not build wheels for scikit-learn, which is required to install pyproject.toml-based projects
Traceback (most recent call last):
File "C:\Users\green\Desktop\bark_win\bark-gui\webui.py", line 5, in
import gradio as gr
ModuleNotFoundError: No module named 'gradio'

Done!
Pressione qualquer tecla para continuar. . .
...
AND
...
Error in StartBartk.bat:

raceback (most recent call last):
File "C:\Users\green\Desktop\bark-gui-main\webui.py", line 5, in
import gradio as gr
ModuleNotFoundError: No module named 'gradio'
Pressione qualquer tecla para continuar. . .

Apple GPU

It seems to be running on CPU only on my Macbook Pro M1 Max, although it is stated that apple GPUs are supported.

How can I configure it to run on the GPU?

Requirements missing funcy

Latest install and StartBark.bat.
Complains about missing funcy on startup.
pip install funcy gets it working again.

Possible fairseq problems, container building errors

I am running the container version of this on a Linux host. The current Dockerfile doesn't build, so I made some small edits. Pip complains if the dependencies are installed on top of system packages, and while I'm comfortable with Docker I'm not very familiar with Python pkg management. I added python3-venv to the system package list, then changed the dependencies install steps to use a virtual environment and then installing so that I could get a build going.

FROM debian:stable

# Install system packages
RUN DEBIAN_FRONTEND=noninteractive apt update && apt install -y \
        git \
        python3-venv \
        pip \
        ffmpeg

# Create non-root user
RUN useradd -m -d /bark bark

# Run as new user
USER bark
WORKDIR /bark

# Clone git repo
RUN git clone https://github.com/C0untFloyd/bark-gui 

# Switch to git directory
WORKDIR /bark/bark-gui

# Append pip bin path to PATH
ENV PATH=$PATH:/bark/.local/bin

# Install dependancies
RUN python3 -m venv .venv &&\
        . .venv/bin/activate &&\
        pip install . &&\
        pip install -r requirements.txt

# List on all addresses, since we are in a container.
RUN sed -i "s/server_name: ''/server_name: 0.0.0.0/g" ./config.yaml

# Suggested volumes
VOLUME /bark/bark-gui/assets/prompts/custom
VOLUME /bark/bark-gui/models
VOLUME /bark/.cache/huggingface/hub

# Default port for web-ui
EXPOSE 7860/tcp

# Start script
CMD . .venv/bin/activate && python3 webui.py

This got a successful build, but the actual error I get on run is this.

Traceback (most recent call last):
  File "/bark/bark-gui/webui.py", line 21, in <module>
    from cloning.clonevoice import clone_voice
  File "/bark/bark-gui/cloning/clonevoice.py", line 4, in <module>
    from bark.hubert.pre_kmeans_hubert import CustomHubert
  File "/bark/bark-gui/bark/hubert/pre_kmeans_hubert.py", line 16, in <module>
    import fairseq
  File "/bark/bark-gui/.venv/lib/python3.11/site-packages/fairseq/__init__.py", line 20, in <module>
    from fairseq.distributed import utils as distributed_utils
  File "/bark/bark-gui/.venv/lib/python3.11/site-packages/fairseq/distributed/__init__.py", line 7, in <module>
    from .fully_sharded_data_parallel import (
  File "/bark/bark-gui/.venv/lib/python3.11/site-packages/fairseq/distributed/fully_sharded_data_parallel.py", line 10, in <module>
    from fairseq.dataclass.configs import DistributedTrainingConfig
  File "/bark/bark-gui/.venv/lib/python3.11/site-packages/fairseq/dataclass/__init__.py", line 6, in <module>
    from .configs import FairseqDataclass
  File "/bark/bark-gui/.venv/lib/python3.11/site-packages/fairseq/dataclass/configs.py", line 1104, in <module>
    @dataclass
     ^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 1220, in dataclass
    return wrap(cls)
           ^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 1210, in wrap
    return _process_class(cls, init, repr, eq, order, unsafe_hash,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 958, in _process_class
    cls_fields.append(_get_field(cls, name, type, kw_only))
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/dataclasses.py", line 815, in _get_field
    raise ValueError(f'mutable default {type(f.default)} for field '
ValueError: mutable default <class 'fairseq.dataclass.configs.CommonConfig'> for field common is not allowed: use default_factory

Is this an error on my python install, or something else?

	with gr.Column():
	gr.Markdown("[Voice Prompt Library](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c)")
	speaker = gr.Dropdown(speakers_list, value=speakers_list[0], label="Voice")

	for root, dirs, files in os.walk("./bark/assets/prompts"):
	for file in files:
	if file.endswith(".npz"):
	pathpart = root.replace("./bark/assets/prompts", "")
	name = os.path.join(pathpart, file[:-4])
	if name.startswith("/") or name.startswith("\\"):
	name = name[1:]
	speakers_list.append(name)

	speakers_list = sorted(speakers_list, key=lambda x: x.lower())
	speakers_list.insert(0, 'None')

c0untfloyd / bark-gui Goto Github PK

bark-gui's People

Contributors

Stargazers

Watchers

Forkers

bark-gui's Issues

Backgrounds

Details

Recommend Projects

Recommend Topics

Recommend Org