resemble-ai / resemble-enhance Goto Github PK

View Code? Open in Web Editor NEW

1.2K 16.0 117.0 21 KB

AI powered speech denoising and enhancement

Home Page: https://huggingface.co/spaces/ResembleAI/resemble-enhance

License: MIT License

Python 100.00%

denoise speech-denoising speech-enhancement speech-processing

resemble-enhance's Introduction

Resemble Enhance

EnhanceVideo.mp4

Resemble Enhance is an AI-powered tool that aims to improve the overall quality of speech by performing denoising and enhancement. It consists of two modules: a denoiser, which separates speech from a noisy audio, and an enhancer, which further boosts the perceptual audio quality by restoring audio distortions and extending the audio bandwidth. The two models are trained on high-quality 44.1kHz speech data that guarantees the enhancement of your speech with high quality.

Usage

Installation

Install the stable version:

pip install resemble-enhance --upgrade

Or try the latest pre-release version:

pip install resemble-enhance --upgrade --pre

Enhance

resemble_enhance in_dir out_dir

Denoise only

resemble_enhance in_dir out_dir --denoise_only

Web Demo

We provide a web demo built with Gradio, you can try it out here, or also run it locally:

python app.py

Train your own model

Data Preparation

You need to prepare a foreground speech dataset and a background non-speech dataset. In addition, you need to prepare a RIR dataset (examples).

data
├── fg
│   ├── 00001.wav
│   └── ...
├── bg
│   ├── 00001.wav
│   └── ...
└── rir
    ├── 00001.npy
    └── ...

Training

Denoiser Warmup

Though the denoiser is trained jointly with the enhancer, it is recommended for a warmup training first.

python -m resemble_enhance.denoiser.train --yaml config/denoiser.yaml runs/denoiser

Enhancer

Then, you can train the enhancer in two stages. The first stage is to train the autoencoder and vocoder. And the second stage is to train the latent conditional flow matching (CFM) model.

Stage 1

python -m resemble_enhance.enhancer.train --yaml config/enhancer_stage1.yaml runs/enhancer_stage1

Stage 2

python -m resemble_enhance.enhancer.train --yaml config/enhancer_stage2.yaml runs/enhancer_stage2

Blog

Learn more on our website!

resemble-enhance's People

Contributors

Stargazers

Watchers

Forkers

gevmin94 render-ai sdbds timdesrochers touristshaun kustomzone eos21 liuguoyou daswer123 ishine shaun95 amanaryan007 hopperrr jurisgpt mechanicss videofeedback newoneincntk alexwd26 arthurdamasio st3alth jeffery-work aufr33 bigfly-nb wangqun010101 soi-20 yashlanjewar20 azraelkuan techthiyanes pete1313 ww516617119 andupotorac camenduru lamquangtuong robinysh m3m012y jeffara archkik jags111 jmaigc chuwoo cellinlab ydf feiwei9696 t-bagwell tanyatang suryatmodulus hotliu grahlnn iseeyo f901107 b08240 keyman9848 wyn314 jimjonesbabyfreshout litagin02 navezjt guhfgf alexdemartos princetrunks davidmartinrius timn2008 w4l6 lanlve thelustriva haoheliu aixingxy dalilll ganjunhong runngezhang-jx warrensbox user01 zhongshijun kakiac threadabort cvcuiwei dwash96 dmvevents oza75 tsavpyn abhinaypoloju cooperos shimomurakei nikhilkrsharma angelomedeiros arghyadipbiswas yacine-cherif reflextechnologies arunbanswal danieljbk nick088official virtualflo shenkailai kaioluanro kennytat illuminoplanet live0717 1-1-2 green-li abhijitha18 titusfx

resemble-enhance's Issues

Windows posixpath error

Setup a test environment (voc_resembleenhance) and started the UI. Weeb page opens. I select a noisy wav and it opens and plays OK. Then when I click Submit the UI shows Error in both output spots. I get this posixpath error on the command line. Linux specific? Any fix for Windows?

Traceback (most recent call last):
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\queueing.py", line 456, in call_prediction
    output = await route_utils.call_process_api(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\blocks.py", line 1522, in process_api
    result = await self.call_function(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\blocks.py", line 1144, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\anyio\_backends\_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\utils.py", line 674, in wrapper
    response = f(*args, **kwargs)
  File "D:\Tests\Resemble Enhance\app.py", line 24, in _fn
    wav1, new_sr = denoise(dwav, sr, device)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Tests\Resemble Enhance\resemble_enhance\enhancer\inference.py", line 29, in denoise
    enhancer = load_enhancer(run_dir, device)
  File "D:\Tests\Resemble Enhance\resemble_enhance\enhancer\inference.py", line 17, in load_enhancer
    hp = HParams.load(run_dir)
  File "D:\Tests\Resemble Enhance\resemble_enhance\hparams.py", line 109, in load
    hps.append(cls.from_yaml(run_dir / "hparams.yaml"))
  File "D:\Tests\Resemble Enhance\resemble_enhance\hparams.py", line 94, in from_yaml
    return cls(**dict(OmegaConf.merge(cls(), OmegaConf.load(path))))
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\omegaconf\omegaconf.py", line 190, in load
    obj = yaml.load(f, Loader=get_yaml_loader())
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\__init__.py", line 81, in load
    return loader.get_single_data()
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 60, in construct_document
    for dummy in generator:
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 413, in construct_yaml_map
    value = self.construct_mapping(node)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\omegaconf\_utils.py", line 151, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 218, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 143, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\omegaconf\_utils.py", line 183, in <lambda>
    lambda loader, node: pathlib.PosixPath(*loader.construct_sequence(node)),
  File "D:\Python\lib\pathlib.py", line 962, in __new__
    raise NotImplementedError("cannot instantiate %r on your system"
NotImplementedError: cannot instantiate 'PosixPath' on your system
Traceback (most recent call last):
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\queueing.py", line 456, in call_prediction
    output = await route_utils.call_process_api(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\blocks.py", line 1522, in process_api
    result = await self.call_function(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\blocks.py", line 1144, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\anyio\_backends\_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\anyio\_backends\_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\utils.py", line 674, in wrapper
    response = f(*args, **kwargs)
  File "D:\Tests\Resemble Enhance\app.py", line 24, in _fn
    wav1, new_sr = denoise(dwav, sr, device)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\Tests\Resemble Enhance\resemble_enhance\enhancer\inference.py", line 29, in denoise
    enhancer = load_enhancer(run_dir, device)
  File "D:\Tests\Resemble Enhance\resemble_enhance\enhancer\inference.py", line 17, in load_enhancer
    hp = HParams.load(run_dir)
  File "D:\Tests\Resemble Enhance\resemble_enhance\hparams.py", line 109, in load
    hps.append(cls.from_yaml(run_dir / "hparams.yaml"))
  File "D:\Tests\Resemble Enhance\resemble_enhance\hparams.py", line 94, in from_yaml
    return cls(**dict(OmegaConf.merge(cls(), OmegaConf.load(path))))
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\omegaconf\omegaconf.py", line 190, in load
    obj = yaml.load(f, Loader=get_yaml_loader())
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\__init__.py", line 81, in load
    return loader.get_single_data()
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 60, in construct_document
    for dummy in generator:
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 413, in construct_yaml_map
    value = self.construct_mapping(node)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\omegaconf\_utils.py", line 151, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 218, in construct_mapping
    return super().construct_mapping(node, deep=deep)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 143, in construct_mapping
    value = self.construct_object(value_node, deep=deep)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\yaml\constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\omegaconf\_utils.py", line 183, in <lambda>
    lambda loader, node: pathlib.PosixPath(*loader.construct_sequence(node)),
  File "D:\Python\lib\pathlib.py", line 962, in __new__
    raise NotImplementedError("cannot instantiate %r on your system"
NotImplementedError: cannot instantiate 'PosixPath' on your system

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\queueing.py", line 501, in process_events
    response = await self.call_prediction(awake_events, batch)
  File "D:\Tests\Resemble Enhance\voc_resembleenhance\lib\site-packages\gradio\queueing.py", line 465, in call_prediction
    raise Exception(str(error) if show_error else None) from error

Any paper on your method?

Hi, thanks for your work!
Did you publish any paper describing your enhance method(CFM, DiffWave-like WN)?

API for denoise only?

Hi, thanks for this project, it works perfectly fine on my GPU device. But I just found out that it seems can only handle English audio files, so I think it will be better if we have an API to denoise only?

Windows: Errors while installing

I tried using the enhancer on a bark audio file on hugging face, and it works wonders!

Trying to pip install it, I get this error:

>>> ERROR
1
  error: subprocess-exited-with-error

  python setup.py egg_info did not run successfully.
  exit code: 1

  [15 lines of output]
  test.c
  LINK : fatal error LNK1181: cannot open input file 'aio.lib'
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "C:\Users\45239\AppData\Local\Temp\pip-install-phvt4rr_\deepspeed_96322bcee24e46919b71c01d496a21e7\setup.py", line 182, in <module>
      abort(f"Unable to pre-compile {op_name}")
    File "C:\Users\45239\AppData\Local\Temp\pip-install-phvt4rr_\deepspeed_96322bcee24e46919b71c01d496a21e7\setup.py", line 52, in abort
      assert False, msg
  AssertionError: Unable to pre-compile async_io
  DS_BUILD_OPS=1
   [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
   [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
   [WARNING]  One can disable async_io with DS_BUILD_AIO=0
   [ERROR]  Unable to pre-compile async_io
  [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

Encountered error while generating package metadata.

See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Command line error

I'm using this great app on an Apple M1.

I used a virtual environment to run it in Python 3.10. After it has been installed the Web App works great. But when I try to run the command line version I get a cascade of errors.

This occurs on the dev and main branch, in Python 3.10 and 3.11

(my_env) degner@Davids-MacBook-Pro-2 Desktop % resemble-enhance /Users/degner/Desktop/To\ Clean /Users/degner/Desktop/Cleaned
[2024-03-02 00:32:41,743] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to mps (auto detect)
[2024-03-02 00:32:41,850] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
CUDA is not available but --device is set to cuda, using CPU instead
Processing /Users/degner/Desktop/Cleaned/Main - synced 2 - Cut down 2 copy.wav:   0%|                                                                                                                                                                                         | 0/2 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/degner/Desktop/my_env/bin/resemble-enhance", line 8, in <module>
    sys.exit(main())
  File "/Users/degner/Desktop/my_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/degner/Desktop/my_env/lib/python3.10/site-packages/resemble_enhance/enhancer/__main__.py", line 110, in main
    hwav, sr = enhance(
  File "/Users/degner/Desktop/my_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/degner/Desktop/my_env/lib/python3.10/site-packages/resemble_enhance/enhancer/inference.py", line 39, in enhance
    enhancer = load_enhancer(run_dir, device)
  File "/Users/degner/Desktop/my_env/lib/python3.10/site-packages/resemble_enhance/enhancer/inference.py", line 20, in load_enhancer
    state_dict = torch.load(path, map_location="cpu")["module"]
  File "/Users/degner/Desktop/my_env/lib/python3.10/site-packages/torch/serialization.py", line 1028, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/Users/degner/Desktop/my_env/lib/python3.10/site-packages/torch/serialization.py", line 1246, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.
(my_env) degner@Davids-MacBook-Pro-2 Desktop %

在推理完成后发现几乎没有变化，和hugging face的结果差距很大

感谢作者开源了这么伟大的作品！
请问作者是有在web demo上增加额外的操作吗？例如AGC之类的，还是我的参数没有设置对，我都是用的默认的参数；

Pretrained Model

Do we have the plan for releasing pretrained models?

How to export models to ONNX

Thank you for your work, the performance of this model is quite good. I would like to deploy and use it. Is there a way to export it to ONNX?

Web Application throws Errors on Submit

Hi,

I am running the Web App (app.yml) and whenever I´ve uploaded a file and click on submit I can see the following output at the webserver:

Traceback (most recent call last):
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/gradio/queueing.py", line 456, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/gradio/blocks.py", line 1522, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/gradio/blocks.py", line 1144, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/gradio/utils.py", line 674, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/app.py", line 24, in _fn
    wav1, new_sr = denoise(dwav, sr, device)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/resemble_enhance/enhancer/inference.py", line 29, in denoise
    enhancer = load_enhancer(run_dir, device)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/resemble_enhance/enhancer/inference.py", line 16, in load_enhancer
    run_dir = download()
              ^^^^^^^^^^
  File "/opt/resemble-enhance/resemble_enhance/enhancer/download.py", line 40, in download
    run_command(["git", "-C", str(REPO_DIR), "lfs", "pull"], "Failed to pull latest changes, please try again.")
  File "/opt/resemble-enhance/resemble_enhance/enhancer/download.py", line 17, in run_command
    raise RuntimeError(msg) from e
RuntimeError: Failed to pull latest changes, please try again.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/gradio/queueing.py", line 501, in process_events
    response = await self.call_prediction(awake_events, batch)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/resemble-enhance/.venv/lib/python3.11/site-packages/gradio/queueing.py", line 465, in call_prediction
    raise Exception(str(error) if show_error else None) from error
Exception: None

How can I solve this?
Thanks in advance.

run the project meets errors

I use two methods to try to run the project

I download the project latest source code,and I use pip download all of dependencies,but i meet the error.BTW,the environment is python3.9 and i use Conda virtual environment

(voice_enhance) root@jack-B450M-S2H:~/project/resemble-enhance# python3 app.py 
Traceback (most recent call last):
  File "/root/project/resemble-enhance/app.py", line 5, in <module>
    from resemble_enhance.enhancer.inference import denoise, enhance
  File "/root/project/resemble-enhance/resemble_enhance/enhancer/inference.py", line 7, in <module>
    from ..inference import inference
  File "/root/project/resemble-enhance/resemble_enhance/inference.py", line 11, in <module>
    from .hparams import HParams
  File "/root/project/resemble-enhance/resemble_enhance/hparams.py", line 36, in <module>
    class HParams:
  File "/root/project/resemble-enhance/resemble_enhance/hparams.py", line 105, in HParams
    def load(cls, run_dir, yaml: Path | None = None):
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'
(voice_enhance) root@jack-B450M-S2H:~/project/resemble-enhance# ls
LICENSE  README.md  app.py  config  packages.txt  pyproject.toml  requirements.txt  resemble_enhance  setup.py
(voice_enhance) root@jack-B450M-S2H:~/project/resemble-enhance# resemble-enhance 
Traceback (most recent call last):
  File "/root/anaconda3/envs/voice_enhance/bin/resemble-enhance", line 33, in <module>
    sys.exit(load_entry_point('resemble-enhance==0.0.1', 'console_scripts', 'resemble-enhance')())
  File "/root/anaconda3/envs/voice_enhance/bin/resemble-enhance", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/importlib/metadata.py", line 86, in load
    module = import_module(match.group('module'))
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/enhancer/__main__.py", line 10, in <module>
    from .inference import denoise, enhance
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/enhancer/inference.py", line 6, in <module>
    from ..inference import inference
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/inference.py", line 11, in <module>
    from .hparams import HParams
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/hparams.py", line 36, in <module>
    class HParams:
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/hparams.py", line 105, in HParams
    def load(cls, run_dir, yaml: Path | None = None):
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

I use the pip install the package named resemble-enhance,and when I use it to enhance a wav file,it meets errors.

(base) root@jack-B450M-S2H:/mnt/disk1/test/audio# resemble-enhance input/ output/
[2024-02-08 08:40:35,596] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
CUDA is not available but --device is set to cuda, using CPU instead
Processing output/part2.wav:   0%|                                                                                                                          | 0/1 [04:29<?, ?it/s]
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/root/anaconda3/lib/python3.11/http/client.py", line 1286, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/root/anaconda3/lib/python3.11/http/client.py", line 1332, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/root/anaconda3/lib/python3.11/http/client.py", line 1281, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/root/anaconda3/lib/python3.11/http/client.py", line 1041, in _send_output
    self.send(msg)
  File "/root/anaconda3/lib/python3.11/http/client.py", line 979, in send
    self.connect()
  File "/root/anaconda3/lib/python3.11/http/client.py", line 1451, in connect
    super().connect()
  File "/root/anaconda3/lib/python3.11/http/client.py", line 945, in connect
    self.sock = self._create_connection(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/socket.py", line 851, in create_connection
    raise exceptions[0]
  File "/root/anaconda3/lib/python3.11/socket.py", line 836, in create_connection
    sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/anaconda3/bin/resemble-enhance", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/resemble_enhance/enhancer/__main__.py", line 110, in main
    hwav, sr = enhance(
               ^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/resemble_enhance/enhancer/inference.py", line 39, in enhance
    enhancer = load_enhancer(run_dir, device)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/resemble_enhance/enhancer/inference.py", line 16, in load_enhancer
    run_dir = download()
              ^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/site-packages/resemble_enhance/enhancer/download.py", line 27, in download
    torch.hub.download_url_to_file(url, str(path))
  File "/root/anaconda3/lib/python3.11/site-packages/torch/hub.py", line 620, in download_url_to_file
    u = urlopen(req)
        ^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/anaconda3/lib/python3.11/urllib/request.py", line 1351, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out

btw, i download the resemble-enhance package in a virtual environment with python3.10 because the python3.9 will return other error

(voice_enhance) root@jack-B450M-S2H:~/project/resemble-enhance# resemble-enhance 
Traceback (most recent call last):
  File "/root/anaconda3/envs/voice_enhance/bin/resemble-enhance", line 33, in <module>
    sys.exit(load_entry_point('resemble-enhance==0.0.1', 'console_scripts', 'resemble-enhance')())
  File "/root/anaconda3/envs/voice_enhance/bin/resemble-enhance", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/importlib/metadata.py", line 86, in load
    module = import_module(match.group('module'))
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/enhancer/__main__.py", line 10, in <module>
    from .inference import denoise, enhance
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/enhancer/inference.py", line 6, in <module>
    from ..inference import inference
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/inference.py", line 11, in <module>
    from .hparams import HParams
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/hparams.py", line 36, in <module>
    class HParams:
  File "/root/anaconda3/envs/voice_enhance/lib/python3.9/site-packages/resemble_enhance-0.0.1-py3.9.egg/resemble_enhance/hparams.py", line 105, in HParams
    def load(cls, run_dir, yaml: Path | None = None):
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

processing one file

Is it possible to process one specific file in a directory with many files?

Singing Voice Enhancement

Hi, thank you for the brilliant work! I just have a question on singing voice enhancement. I tried the serivce out with a piece of clean, studio-recorded singing voice to see if the model will give me an almostly identical wave. Because in my case, I cannot determine if the input waves are already clean or not, and I need it for both singing voice and speech to get finally clean and intelligeable results. It seemed that the clean singing voice is partially eliminated by the model, which introduces unintelligeability. I guess that's because there is little singing data in the dataset. I'm wondering if there is a pretrained model that works well on both singing and speech inputs? Thank you so much!

about using api

i can not use this api in China with error like this
websockets.exceptions.InvalidStatusCode: server rejected WebSocket connection: HTTP 403

Enhancing Live Audio from PCM/OPUS RTP Stream

I am currently working with a live RTP stream that uses PCM/OPUS codecs. I am interested in enhancing the audio quality in real-time and would like to know if it is possible to achieve this using your library. Specifically, I am looking for guidance or examples on how to apply audio enhancement techniques to a live audio stream.

Any assistance or recommendations on how to integrate these enhancements with the RTP stream would be greatly appreciated.

In Ubuntu 22.04, The application stops with an error telling nothing except that cloning the repo has failed

This occurs because you don't have git-lfs package installed on your OS. Install git-lfs on your Distro. For Ubuntu:

sudo apt install git-lfs

This is also the fix to the gradio app telling that an Error Occurred in 2-3 seconds.

datasets

前景背景及RIR数据集分别使用了哪几种，训练测试比例如何划分，有无交叉验证？

reading hparams

I got nothing while reading params

Pre-trained usage with CPU / without Nvidia GPU

Is there any chance of enabling the usage of the pre-trained model with a CPU / non-Nvidia GPU?
(-> RuntimeError: Found no NVIDIA driver on your system)

Or is there no hope due to cuda?

Thank you for the awesome work!

May I ask why the audio length after noise reduction is inconsistent

The duration of the denoised audio is inconsistent. May I ask what the problem is? I am currently working on saving the denoised audio locally, but there is a discrepancy between the denoised audio and the original audio
`def denoise_audio(audio_path, output_path):
dwav, sr = torchaudio.load(audio_path)
dwav = dwav.mean(dim=0)
wav_denoised, _ = denoise(dwav, sr, device)
if wav_denoised.ndim == 1:
wav_denoised = wav_denoised.unsqueeze(0)

wav_denoised_tensor = torch.from_numpy(wav_denoised.cpu().numpy())


sf.write(output_path, wav_denoised_tensor.T, 24000)`

SSL: CERTIFICATE_VERIFY_FAILED on macOS Sonoma 14.5 AND on macOS Sequoia 15.0 beta 2

When using python3 app.py on macOS (the supplied web interface to resemble-enhance,
when using a public shared link OR when using localhost:7860,
the following error was seen in resemble-enhance/enhancer/download.py...
def download(run_dir: str | Path | None = None):

...

       torch.hub.download_url_to_file(url, str(path))   # <---- SSL CERTIFICATE_VERIFY_FAILED Error

IS THIS because macOS and Windows does not permit multiple redirects?
IS the only work-around at the moment to run from Linux?
Can this be run locally without having to ask the server which does multiple redirects?

Here is the detailed stack trace...
Downloads/resemble-enhance-main % python app.py
[2024-06-26 21:37:44,816] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to mps (auto detect)
[2024-06-26 21:37:45,218] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://02876e1f2a3e690306.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run gradio deploy from Terminal to deploy to Spaces (https://huggingface.co/spaces)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1303, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1349, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1298, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1058, in _send_output
self.send(msg)
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 996, in send
self.connect()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/http/client.py", line 1475, in connect
self.sock = self._context.wrap_socket(self.sock,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1104, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 1382, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
response = await route_utils.call_process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/gradio/blocks.py", line 1514, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/gradio/utils.py", line 833, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/Downloads/resemble-enhance-main/app.py", line 24, in _fn
wav1, new_sr = denoise(dwav, sr, device)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/Downloads/resemble-enhance-main/resemble_enhance/enhancer/inference.py", line 29, in denoise
enhancer = load_enhancer(run_dir, device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/Downloads/resemble-enhance-main/resemble_enhance/enhancer/inference.py", line 16, in load_enhancer
run_dir = download(run_dir)
^^^^^^^^^^^^^^^^^
File "/Users/williammccarthy/Downloads/resemble-enhance-main/resemble_enhance/enhancer/download.py", line 48, in download
torch.hub.download_url_to_file(url, str(path))
File "/Users/williammccarthy/.virtualenvs/asj/lib/python3.11/site-packages/torch/hub.py", line 620, in download_url_to_file
u = urlopen(req)
^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 519, in open
response = self._open(req, data)
^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 496, in _call_chain
result = func(*args)
^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006)>

Any chance you will release the discriminator for the supplied model?

Would be nice to be able to fine tune this model rather than train from scratch...

Any chance it could be added to HuggingFace with the generator?

How can i convert the .pt file to core ml model

How can i convert the mp_rank_00_model_states.pt file to Apple core ml model file？Is there anyone successful ?

resemble_enhance/enhancer /main.py fixes

Errors running enhancer from the command line. Fixes:

94 dwav, sr = torchaudio.load(path) >> dwav, sr = torchaudio.load(str(path))

115 torchaudio.save(out_path, hwav[None], sr) >> torchaudio.save(str(out_path), hwav[None], sr)

garbled output from the web demo

Screen.Recording.2023-12-15.at.9.21.35.AM.mov

why is this output, am i missing some parameters ?

colab

Thank you for this great tool. I made a colab version here: https://github.com/hopperrr/resemble-enhance

error: subprocess-exited-with-error

D:\AI\resemble-enhance>pip install resemble-enhance --upgrade
Collecting resemble-enhance
Downloading resemble_enhance-0.0.1-py3-none-any.whl.metadata (3.4 kB)
Collecting celluloid==0.2.0 (from resemble-enhance)
Downloading celluloid-0.2.0-py3-none-any.whl (5.4 kB)
Collecting deepspeed==0.12.4 (from resemble-enhance)
Downloading deepspeed-0.12.4.tar.gz (1.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 2.3 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
[WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system.
[WARNING] unable to import torch, please install it if you want to pre-compile any deepspeed ops.
DS_BUILD_OPS=1
Traceback (most recent call last):
File "C:\Users\as-aj\AppData\Local\Programs\Python\Python310\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 353, in
main()
File "C:\Users\as-aj\AppData\Local\Programs\Python\Python310\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "C:\Users\as-aj\AppData\Local\Programs\Python\Python310\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "C:\Users\as-aj\AppData\Local\Temp\pip-build-env-dqrtwetr\overlay\Lib\site-packages\setuptools\build_meta.py", line 325, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "C:\Users\as-aj\AppData\Local\Temp\pip-build-env-dqrtwetr\overlay\Lib\site-packages\setuptools\build_meta.py", line 295, in _get_build_requires
self.run_setup()
File "C:\Users\as-aj\AppData\Local\Temp\pip-build-env-dqrtwetr\overlay\Lib\site-packages\setuptools\build_meta.py", line 480, in run_setup
super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
File "C:\Users\as-aj\AppData\Local\Temp\pip-build-env-dqrtwetr\overlay\Lib\site-packages\setuptools\build_meta.py", line 311, in run_setup
exec(code, locals())
File "", line 147, in
AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Enhancing changes words

Whenever I enhance audios some words being said change in the audio. Why are some words changing?

PosixPath error when running on MacOS

Trying to run on MacOS, but to no avail.
I've placed a "1.wav" file in the current folder and start resemble-enhance like this:
resemble-enhance ./ ./
The same output is given when using an absolute path.

Output:

[2024-01-03 20:17:18,615] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to mps (auto detect)
[2024-01-03 20:17:18,938] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
CUDA is not available but --device is set to cuda, using CPU instead
Processing 1.wav:   0%|                                   | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/resemble-enhance", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/resemble_enhance/enhancer/__main__.py", line 100, in main
    dwav, sr = torchaudio.load(path)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torchaudio/_backend/utils.py", line 204, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torchaudio/_backend/sox.py", line 42, in load
    ret = torch.ops.torchaudio.sox_io_load_audio_file(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/torch/_ops.py", line 692, in __call__
    return self._op(*args, **kwargs or {})
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: torchaudio::sox_io_load_audio_file() Expected a value of type 'str' for argument '_0' but instead found type 'PosixPath'.
Position: 0
Value: PosixPath('1.wav')
Declaration: torchaudio::sox_io_load_audio_file(str _0, int? _1, int? _2, bool? _3, bool? _4, str? _5) -> (Tensor _0, int _1)
Cast error details: Unable to cast Python instance of type <class 'pathlib.PosixPath'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)

pre-trained model ?

Query Regarding the Impact of Varied Acoustic Environments on Model Performance

Dear Resemble Enhance Team,

I hope this message finds you well. I am reaching out to inquire about the robustness of the Resemble Enhance AI models, particularly in relation to their performance across diverse acoustic environments.

Having perused your documentation and successfully utilised your tool for speech enhancement and denoising, I've observed impressive results in standard settings. However, I am curious about the model's adaptability when confronted with audio data recorded in atypical acoustic spaces, which may not be well-represented in the training datasets.

Specifically, my questions are as follows:

How does the model cope with audio inputs recorded in highly reverberant spaces, or those with unique echo characteristics that might diverge significantly from the RIR datasets used during training?
Is there a recommended approach to fine-tuning the model on a custom dataset that includes such unique acoustic characteristics, to better tailor the enhancement capabilities to specific environments?
Could you provide insights into the model's limitations when dealing with extreme noise conditions or non-linear distortions that are not commonly found in everyday scenarios?

Understanding these aspects is crucial for my ongoing project, which involves processing archival audio recordings that exhibit a wide range of acoustic anomalies.

I appreciate the cutting-edge work your team has accomplished with Resemble Enhance and look forward to any guidance you can provide on the aforementioned queries.

Best regards,
yihong1120

Installation error

Hey,

System: MacOS Ventura 13.5.1
After using installation command as described in readme, I'm getting error:

ERROR: Could not find a version that satisfies the requirement resemble-enhance (from versions: none)
ERROR: No matching distribution found for resemble-enhance

I tried to install both versions (stable and pre-release), no luck.

Any ideas what could be wrong?

CUDA memory cache increasing

I've run gradio app from repo and saw that CUDA memory grows rapidly from ~3 Gbs to ~12Gbs.
If i put small audio, it is also increasing, but not so much.

CUDA memory when i put small (1 second) audio file:

CUDA memory after putting long audio file (2.5 minutes):

Also, I've checked torch.cuda.memory_allocated() and it was constant, but torch.cuda.memory_cached() was increasing.
Maybe someone can explain me, why CUDA cache is growing?

here is how to Use in colab

👇
https://github.com/rachitdeveloper/resemble-enhance-colab

Command line instructions

On the github instructions it uses "resemble_enhance" as the command instead of "resemble-enhance"

Great app, Thanks!

Docker installation possible?

Hi,

is there a docker image, which you can provide for an easy installation?

Regards,

When installing on Win10，A Error occured，UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 1907: illegal multibyte sequence

Training data / time

Few questions:

How much data was the demo model trained on?
How many epochs?
How long did training take and what system?

non english speech transformed to weird language

Peace, the non english speech transformed to weird language, i think it only work with english speech right now.

Is an Nvidia GPU required?

I guessed that I had to install git-lfs, which made the code work for a bit longer. But Now I am getting:
"RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx"

I am on Kubuntu 22.04, with an AMD gpu, and no Nvidia in this machine. If that's required it would be nice to state that somewhere? Full output is:

[2024-06-15 18:29:21,064] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Processing enhanced/ipraaudiowav.wav:   0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s]Already up to date.
Processing enhanced/ipraaudiowav.wav:   0%|                                                                                                                                               | 0/1 [00:05<?, ?it/s]
Traceback (most recent call last):
  File "/home/soren/.local/bin/resemble-enhance", line 8, in <module>
    sys.exit(main())
  File "/home/soren/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/soren/.local/lib/python3.10/site-packages/resemble_enhance/enhancer/__main__.py", line 104, in main
    hwav, sr = enhance(
  File "/home/soren/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/soren/.local/lib/python3.10/site-packages/resemble_enhance/enhancer/inference.py", line 39, in enhance
    enhancer = load_enhancer(run_dir, device)
  File "/home/soren/.local/lib/python3.10/site-packages/resemble_enhance/enhancer/inference.py", line 23, in load_enhancer
    enhancer.to(device)
  File "/home/soren/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1160, in to
    return self._apply(convert)
  File "/home/soren/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/soren/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  File "/home/soren/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "/home/soren/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
    param_applied = fn(param)
  File "/home/soren/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1158, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/home/soren/.local/lib/python3.10/site-packages/torch/cuda/__init__.py", line 298, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

Btw: there's a type in the documentation, using _ instead of - in "resemble-enhance".

Question about Dataset

Hi, thanks for releasing this model! Which dataset did you use for background and RIR? Thanks!

RuntimeError in torchaudio_sox::load_audio_file()

torchaudio_sox::load_audio_file() seems to expect str, not PosixPath (not sure why though):

Traceback (most recent call last):
  File "/Users/bwords/Devel/nightingale/./venv/bin/resemble-enhance", line 8, in <module>
    sys.exit(main())
  File "/Users/bwords/Devel/nightingale/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/Users/bwords/Devel/nightingale/venv/lib/python3.10/site-packages/resemble_enhance/enhancer/__main__.py", line 94, in main
    dwav, sr = torchaudio.load(path)
  File "/Users/bwords/Devel/nightingale/venv/lib/python3.10/site-packages/torchaudio/_backend/utils.py", line 205, in load
    return backend.load(uri, frame_offset, num_frames, normalize, channels_first, format, buffer_size)
  File "/Users/bwords/Devel/nightingale/venv/lib/python3.10/site-packages/torchaudio/_backend/sox.py", line 44, in load
    ret = sox_ext.load_audio_file(uri, frame_offset, num_frames, normalize, channels_first, format)
  File "/Users/bwords/Devel/nightingale/venv/lib/python3.10/site-packages/torch/_ops.py", line 755, in __call__
    return self._op(*args, **(kwargs or {}))
RuntimeError: torchaudio_sox::load_audio_file() Expected a value of type 'str' for argument '_0' but instead found type 'PosixPath'.
Position: 0
Value: PosixPath('tmp/myfile.wav')
Declaration: torchaudio_sox::load_audio_file(str _0, int? _1, int? _2, bool? _3, bool? _4, str? _5) -> (Tensor _0, int _1)
Cast error details: Unable to cast Python instance of type <class 'pathlib.PosixPath'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)

Environment: Mac OS Ventura, python 3.10, venv