bshall / acoustic-model Goto Github PK

View Code? Open in Web Editor NEW

95.0 95.0 24.0 168 KB

Acoustic models for: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

Home Page: https://bshall.github.io/soft-vc/

License: MIT License

Python 100.00%

pytorch representation-learning speech voice-conversion

acoustic-model's People

Contributors

Stargazers

Watchers

acoustic-model's Issues

validation loss not decreasing after 25k steps, Fine-tuning with small target data

@bshall Thank you for this great work.

validation loss not decreasing after 25k steps, Fine-tuning with small target data (~1-hour dataset).

the best model (25k steps) not working properly. content transferred ok, but the pitch was not converted from the source to the target. and the target speaker's tone does not close.

@bshall can you please give your suggestions?

Thanks

Finetuned model while loading RuntimeError: Error(s) in loading state_dict for AcousticModel

@bshall Thank you for this great work.

I did fine-tune the pre-trained acoustic LJSpeech model with my custom dataset (~ 1 hour).

python train.py --resume checkpoints/hubert-soft-0321fd7e.pt data/ finetuned_checkpoints/

I have newly fine-tuned the best model (model-best.pt) with 20000 steps. I modified the code (https://github.com/bshall/acoustic-model/blob/main/acoustic/model.py#L119). the loading from the torch.hub.load_state_dict_from_url to my checkpoint path. but I got the below error. I shared the error log for your reference.

can you please help me, how to resolve this issue?

Thanks

Traceback (most recent call last):
  File "/root/Experiments/soft-vc/inference.py", line 12, in <module>
    acoustic = hubert_soft().cuda()
  File "/root/Experiments/soft-vc/acoustic/acoustic/model.py", line 165, in hubert_soft
    return _acoustic(
  File "/root/Experiments/soft-vc/acoustic/acoustic/model.py", line 133, in _acoustic
    acoustic.load_state_dict(checkpoint["acoustic-model"])
  File "/root/anaconda3/envs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AcousticModel:
        Missing key(s) in state_dict: "encoder.prenet.net.0.weight", "encoder.prenet.net.0.bias", "encoder.prenet.net.3.weight", "encoder.prenet.net.3.bias", "encoder.convs.0.weight", "encoder.convs.0.bias", "encoder.convs.3.weight", "encoder.convs.3.bias", "encoder.convs.4.weight", "encoder.convs.4.bias", "encoder.convs.7.weight", "encoder.convs.7.bias", "decoder.prenet.net.0.weight", "decoder.prenet.net.0.bias", "decoder.prenet.net.3.weight", "decoder.prenet.net.3.bias", "decoder.lstm1.weight_ih_l0", "decoder.lstm1.weight_hh_l0", "decoder.lstm1.bias_ih_l0", "decoder.lstm1.bias_hh_l0", "decoder.lstm2.weight_ih_l0", "decoder.lstm2.weight_hh_l0", "decoder.lstm2.bias_ih_l0", "decoder.lstm2.bias_hh_l0", "decoder.lstm3.weight_ih_l0", "decoder.lstm3.weight_hh_l0", "decoder.lstm3.bias_ih_l0", "decoder.lstm3.bias_hh_l0", "decoder.proj.weight". 
        Unexpected key(s) in state_dict: "module.encoder.prenet.net.0.weight", "module.encoder.prenet.net.0.bias", "module.encoder.prenet.net.3.weight", "module.encoder.prenet.net.3.bias", "module.encoder.convs.0.weight", "module.encoder.convs.0.bias", "module.encoder.convs.3.weight", "module.encoder.convs.3.bias", "module.encoder.convs.4.weight", "module.encoder.convs.4.bias", "module.encoder.convs.7.weight", "module.encoder.convs.7.bias", "module.decoder.prenet.net.0.weight", "module.decoder.prenet.net.0.bias", "module.decoder.prenet.net.3.weight", "module.decoder.prenet.net.3.bias", "module.decoder.lstm1.weight_ih_l0", "module.decoder.lstm1.weight_hh_l0", "module.decoder.lstm1.bias_ih_l0", "module.decoder.lstm1.bias_hh_l0", "module.decoder.lstm2.weight_ih_l0", "module.decoder.lstm2.weight_hh_l0", "module.decoder.lstm2.bias_ih_l0", "module.decoder.lstm2.bias_hh_l0", "module.decoder.lstm3.weight_ih_l0", "module.decoder.lstm3.weight_hh_l0", "module.decoder.lstm3.bias_ih_l0", "module.decoder.lstm3.bias_hh_l0", "module.decoder.proj.weight".

def _acoustic(
    name: str,
    discrete: bool,
    upsample: bool,
    pretrained: bool = True,
    progress: bool = True,
) -> AcousticModel:
    acoustic = AcousticModel(discrete, upsample)
    if pretrained:
        # checkpoint = torch.hub.load_state_dict_from_url(URLS[name], progress=progress)
        # consume_prefix_in_state_dict_if_present(checkpoint["acoustic-model"], "module.")
        
        load_path = "/root/Experiments/soft-vc/acoustic/finetuned_checkpoints/model-best.pt"
        checkpoint = torch.load(load_path)
        acoustic.load_state_dict(checkpoint["acoustic-model"])
        acoustic.eval()
    return acoustic

Bug: Training crash with missing argument `discrete`

Summary

AcousticModel training by train.py crash with missing attribute error.
It is caused by missing parsearg attribute discrete.
It can be fixed with additional argument, so I made a pull request (#5).

Phenomena

When run train.py with proper dataset-dir and checkpoint-dir, it crash.
Error message argue that the attribute discrete is missing.

Error Message

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/softVC_AM/train.py", line 96, in train
    discrete=args.discrete,
AttributeError: 'Namespace' object has no attribute 'discrete'

Cause

In train.py, there is a argument usage args.discrete, but there is no corresponding parser.add_argument.

acoustic-model/train.py

Lines 87 to 91 in df6eba9

 train_dataset = MelDataset( 

 root=args.dataset_dir, 

 train=True, 

 discrete=args.discrete, 

 )

Fix idea

As in paper, softVC-AM seems to support both soft and discrete.
So we can add discrete flag (by default, it works as soft mode).
When I add it, the bug disappear.

Notes

I make a pull request (#5) which will fix this bug.

Thanks for your great OSS! I am happy if this help you and community.

Information about a complete training pipeline?

Greetings.

I am aware of the existence of the different repositories for the generation of a voice conversion model. However, few information about a whole training pipeline is covered in the repositories. Could the README.md file be extended with information for training a voice conversion model from scratch? Similar to the information provided in your parallel repository hubert, in order to perform a full training pipeline for a voice conversion model. Information such as:

Repository requirements in a requirements.txt file
Dataset requirements, in terms of audio characteristics, number of speakers (e.g. input and output voices) and directory structure
Steps required for training a model from scratch. e.g. execute preprocess.py -i foo -o bar, then train.py -i bar -o model_output...

Thanks in advance for your time.

Vietnamese language VC

Hi @bshall , can the pre-trained hubert-soft or discrete model be used for encoding mandarin Chinese language data? I want to train a model for Vietnamese language VC. But only train acoustic model and HiFiGAN vocoder on Vietnamese dataset.

map_location argument is not supported

Typically it's possible to load torch models to cpu / gpu by using the map_location argument.

This doesn't work for the acoustic model:

TypeError: hubert_soft() got an unexpected keyword argument 'map_location

On a CPU-only machine loading this model gives the error:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu')
 to map your storages to the CPU.

Bug: `generate.py` failed with No such file error

Summary

unit-to-mel inference by generate.py crash with missing file error.
It is caused by variable name mistake in generate.py.
It can be fixed with one-line fix, so I made a pull request (#2).

Phenomena

When run generate.py with proper in-dir and out-dir, it crash.
Error message argue that No such file or directory: 'path'.

Error messages

Generating from sample_softVC -> o_test
  0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "./generate.py", line 57, in <module>
    generate(args)
  File "./generate.py", line 22, in generate
    units = np.load("path")
  File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 417, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'path'

Cause

In generate.py, variable path becomes mistakenly string "path".

acoustic-model/generate.py

Line 16 in c30a7c3

units = np.load("path")

When I fix it, the bug disappear.

Notes

I make a pull request (#2) which will fix this bug.
I am so impressed with softVC project, so, If this PR will help this super cool project, I am grad.

Will you consider encoding F0 into the acoustic-model as one input?

switch to bigvgan

Hello,
i've been trying to drop-in bigvgan for hifigan but i keep running into an issue related to the number of mel channels the acoustic model is trained on 128 vs the 100 channels bigvgan uses. Is there a simple way to fix this or does the acoustic model need to be trained with 100 mel channels?

MultiSpeaker setup

Have you try this on multi-speaker way ?

	train_dataset = MelDataset(
	root=args.dataset_dir,
	train=True,
	discrete=args.discrete,
	)

bshall / acoustic-model Goto Github PK

acoustic-model's People

Contributors

Stargazers

Watchers

Forkers

acoustic-model's Issues

Summary

Phenomena

Error Message

Cause

Fix idea

Notes

Summary

Phenomena

Error messages

Cause

Notes

Recommend Projects

Recommend Topics

Recommend Org