Git Product home page Git Product logo

acoustic-model's People

Contributors

bshall avatar qgentry avatar seastar105 avatar tarepan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

acoustic-model's Issues

Finetuned model while loading RuntimeError: Error(s) in loading state_dict for AcousticModel

@bshall Thank you for this great work.

I did fine-tune the pre-trained acoustic LJSpeech model with my custom dataset (~ 1 hour).

python train.py --resume checkpoints/hubert-soft-0321fd7e.pt data/ finetuned_checkpoints/

I have newly fine-tuned the best model (model-best.pt) with 20000 steps. I modified the code (https://github.com/bshall/acoustic-model/blob/main/acoustic/model.py#L119). the loading from the torch.hub.load_state_dict_from_url to my checkpoint path. but I got the below error. I shared the error log for your reference.

can you please help me, how to resolve this issue?

Thanks

Traceback (most recent call last):
  File "/root/Experiments/soft-vc/inference.py", line 12, in <module>
    acoustic = hubert_soft().cuda()
  File "/root/Experiments/soft-vc/acoustic/acoustic/model.py", line 165, in hubert_soft
    return _acoustic(
  File "/root/Experiments/soft-vc/acoustic/acoustic/model.py", line 133, in _acoustic
    acoustic.load_state_dict(checkpoint["acoustic-model"])
  File "/root/anaconda3/envs/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1406, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AcousticModel:
        Missing key(s) in state_dict: "encoder.prenet.net.0.weight", "encoder.prenet.net.0.bias", "encoder.prenet.net.3.weight", "encoder.prenet.net.3.bias", "encoder.convs.0.weight", "encoder.convs.0.bias", "encoder.convs.3.weight", "encoder.convs.3.bias", "encoder.convs.4.weight", "encoder.convs.4.bias", "encoder.convs.7.weight", "encoder.convs.7.bias", "decoder.prenet.net.0.weight", "decoder.prenet.net.0.bias", "decoder.prenet.net.3.weight", "decoder.prenet.net.3.bias", "decoder.lstm1.weight_ih_l0", "decoder.lstm1.weight_hh_l0", "decoder.lstm1.bias_ih_l0", "decoder.lstm1.bias_hh_l0", "decoder.lstm2.weight_ih_l0", "decoder.lstm2.weight_hh_l0", "decoder.lstm2.bias_ih_l0", "decoder.lstm2.bias_hh_l0", "decoder.lstm3.weight_ih_l0", "decoder.lstm3.weight_hh_l0", "decoder.lstm3.bias_ih_l0", "decoder.lstm3.bias_hh_l0", "decoder.proj.weight". 
        Unexpected key(s) in state_dict: "module.encoder.prenet.net.0.weight", "module.encoder.prenet.net.0.bias", "module.encoder.prenet.net.3.weight", "module.encoder.prenet.net.3.bias", "module.encoder.convs.0.weight", "module.encoder.convs.0.bias", "module.encoder.convs.3.weight", "module.encoder.convs.3.bias", "module.encoder.convs.4.weight", "module.encoder.convs.4.bias", "module.encoder.convs.7.weight", "module.encoder.convs.7.bias", "module.decoder.prenet.net.0.weight", "module.decoder.prenet.net.0.bias", "module.decoder.prenet.net.3.weight", "module.decoder.prenet.net.3.bias", "module.decoder.lstm1.weight_ih_l0", "module.decoder.lstm1.weight_hh_l0", "module.decoder.lstm1.bias_ih_l0", "module.decoder.lstm1.bias_hh_l0", "module.decoder.lstm2.weight_ih_l0", "module.decoder.lstm2.weight_hh_l0", "module.decoder.lstm2.bias_ih_l0", "module.decoder.lstm2.bias_hh_l0", "module.decoder.lstm3.weight_ih_l0", "module.decoder.lstm3.weight_hh_l0", "module.decoder.lstm3.bias_ih_l0", "module.decoder.lstm3.bias_hh_l0", "module.decoder.proj.weight". 
def _acoustic(
    name: str,
    discrete: bool,
    upsample: bool,
    pretrained: bool = True,
    progress: bool = True,
) -> AcousticModel:
    acoustic = AcousticModel(discrete, upsample)
    if pretrained:
        # checkpoint = torch.hub.load_state_dict_from_url(URLS[name], progress=progress)
        # consume_prefix_in_state_dict_if_present(checkpoint["acoustic-model"], "module.")
        
        load_path = "/root/Experiments/soft-vc/acoustic/finetuned_checkpoints/model-best.pt"
        checkpoint = torch.load(load_path)
        acoustic.load_state_dict(checkpoint["acoustic-model"])
        acoustic.eval()
    return acoustic 

Bug: Training crash with missing argument `discrete`

Summary

AcousticModel training by train.py crash with missing attribute error.
It is caused by missing parsearg attribute discrete.
It can be fixed with additional argument, so I made a pull request (#5).

Phenomena

When run train.py with proper dataset-dir and checkpoint-dir, it crash.
Error message argue that the attribute discrete is missing.

Error Message

-- Process 0 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap
    fn(i, *args)
  File "/content/softVC_AM/train.py", line 96, in train
    discrete=args.discrete,
AttributeError: 'Namespace' object has no attribute 'discrete'

Cause

In train.py, there is a argument usage args.discrete, but there is no corresponding parser.add_argument.

acoustic-model/train.py

Lines 87 to 91 in df6eba9

train_dataset = MelDataset(
root=args.dataset_dir,
train=True,
discrete=args.discrete,
)

Fix idea

As in paper, softVC-AM seems to support both soft and discrete.
So we can add discrete flag (by default, it works as soft mode).
When I add it, the bug disappear.

Notes

I make a pull request (#5) which will fix this bug.

Thanks for your great OSS! I am happy if this help you and community.

Information about a complete training pipeline?

Greetings.

I am aware of the existence of the different repositories for the generation of a voice conversion model. However, few information about a whole training pipeline is covered in the repositories. Could the README.md file be extended with information for training a voice conversion model from scratch? Similar to the information provided in your parallel repository hubert, in order to perform a full training pipeline for a voice conversion model. Information such as:

  • Repository requirements in a requirements.txt file
  • Dataset requirements, in terms of audio characteristics, number of speakers (e.g. input and output voices) and directory structure
  • Steps required for training a model from scratch. e.g. execute preprocess.py -i foo -o bar, then train.py -i bar -o model_output...

Thanks in advance for your time.

Vietnamese language VC

Hi @bshall , can the pre-trained hubert-soft or discrete model be used for encoding mandarin Chinese language data? I want to train a model for Vietnamese language VC. But only train acoustic model and HiFiGAN vocoder on Vietnamese dataset.

map_location argument is not supported

Typically it's possible to load torch models to cpu / gpu by using the map_location argument.

This doesn't work for the acoustic model:

TypeError: hubert_soft() got an unexpected keyword argument 'map_location

On a CPU-only machine loading this model gives the error:

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu')
 to map your storages to the CPU.

Bug: `generate.py` failed with No such file error

Summary

unit-to-mel inference by generate.py crash with missing file error.
It is caused by variable name mistake in generate.py.
It can be fixed with one-line fix, so I made a pull request (#2).

Phenomena

When run generate.py with proper in-dir and out-dir, it crash.
Error message argue that No such file or directory: 'path'.

Error messages

Generating from sample_softVC -> o_test
  0% 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "./generate.py", line 57, in <module>
    generate(args)
  File "./generate.py", line 22, in generate
    units = np.load("path")
  File "/usr/local/lib/python3.7/dist-packages/numpy/lib/npyio.py", line 417, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: 'path'

Cause

In generate.py, variable path becomes mistakenly string "path".

units = np.load("path")

When I fix it, the bug disappear.

Notes

I make a pull request (#2) which will fix this bug.
I am so impressed with softVC project, so, If this PR will help this super cool project, I am grad.

switch to bigvgan

Hello,
i've been trying to drop-in bigvgan for hifigan but i keep running into an issue related to the number of mel channels the acoustic model is trained on 128 vs the 100 channels bigvgan uses. Is there a simple way to fix this or does the acoustic model need to be trained with 100 mel channels?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.