Git Product home page Git Product logo

facodec's Issues

Are the requirements incomplete?

For example, if I don't see Tensorbord in it and report an error during runtime: Audiotools is required but I haven't seen it in the requirements section, should I consider completing the requirements?
The error log mentions that some modules were compiled using an older version of NumPy (1. x series), while the current environment is using NumPy 2.0.0, which may cause program crashes. To support both NumPy 1. x and 2. x versions simultaneously, it is necessary to recompile these modules using NumPy 2.0. Does this require an update?
AttributeError: _ARRAY_API not found,may i ask what is that API?

Dataset of the pre-trained checkpoint

Hi, I have a question about the dataset.

As far as I know, the official FACodec checkpoint was trained using Librilight.
Was this version of the checkpoint also trained using Librilight?
README only says 50k hours of training data and the possibility of multi-language.
I'm confused because Librilight is known as containing 60k hours, Libriheavy is known as containing 50k hours.
I wonder about the details of the training data.

Thanks.

Does the prosody codes[0] work?

I tried to test the code some specifically for prosody but it seemed like the prosody was tied to codes[1] with the content?

What do the loss curves look like during your successful training?

Hello,

I've attempted to train FAcodec using my own dataset. However, whether I start from scratch or fine-tune your provided checkpoint, the reconstructed audio clips are just noise. I fine-tuned the model using around 128 hours of Common Voice 18 ZH-TW data. After approximately 20k steps, the loss seemed to converge. Some losses, like feature loss, decreased successfully, while others, such as mel loss and waveform loss, were oscillating.

Do all losses decrease during your training process?

模型问题咨询

想请教下,您是否已经跑通了代码,并且验证了效果呢?
因为看到好多权重设置跟论文中不一致

似乎有bug

meldataset.py中68行的clamp是否想打clip?
以及84行的
max_wave_length = max([b[0].size(0) for b in batch])
TypeError: 'int' object is not callable
是否应该改成与上一行一样的shape[0]?

关于训练以及推理流程有一些疑问

一、请问我的流程是否正确:
1、修改meldataset.py,改为自己的dataloader,使用VCTK数据集以及wav2vec生成伪标签,在train.py上训练出几个ckpt文件
2、使用训练出的最后一个ckpt作为预训练模型,训练train_redecoder.py(有一个疑问是此处用于训练train_redecoder.py的和train所用的数据集一样即可吗?)
3、使用train训练出的ckpt以及train_redecoder.py训练出的ckpt,作用于reconstruct_redecoder.py上进行音色转换

二、请问通过train和train_redecoder.py训练出的ckpt文件是否和您所提供的bin预训练模型有着相同的结构和参数?

感谢解答!

运行train.py报错:AttributeError: 'EncDecSpeakerLabelModel' object has no attribute 'infer_segment'

您好,我在运行train.py的时候碰到以下报错:

Traceback (most recent call last):
  File "/home/tts/ref/ns3/train.py", line 496, in <module>
    main(args)
  File "/home/tts/ref/ns3/train.py", line 342, in main
    spk_logits = torch.cat([speaker_model.infer_segment(w16.cpu()[..., :wl])[1] for w16, wl in zip(waves_16k, wave_lengths)], dim=0)
  File "/home/tts/ref/ns3/train.py", line 342, in <listcomp>
    spk_logits = torch.cat([speaker_model.infer_segment(w16.cpu()[..., :wl])[1] for w16, wl in zip(waves_16k, wave_lengths)], dim=0)
  File "/opt/conda/envs/amphion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1688, in __getattr__
    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'EncDecSpeakerLabelModel' object has no attribute 'infer_segment'

nemo-toolkit的版本为1.21.0。
这个报错我在nemo的issues中找过,但是没有找到相关的问题。

pytorch_model.bin key error and WN activated function

Hi, thank you for sharing the training code of FACodec! I've come across a couple of points:
1.Fine-tuning the redecoder:
I'm interested in fine-tuning the redecoder using the provided encoder and redecoder bin files. However, I noticed that there's no 'net' key in the bin file, which seems to cause an issue when loading the checkpoint. Could you provide some guidance on how to properly load these files for fine-tuning?
2.Additional activation function:
I noticed that there's an additional WN gated activation function applied after the timbre layer norm, which differs from the original code and description in the paper. I'm curious about the reasoning behind this architectural change. Could you share some insights into why this modification was made and how it impacts the model's performance?

代码细节问题

您好,请问 FAcodec/modules /quantize.py中FApredictors中forward_v2函数注释掉了
spk_pred = self.timbre_predictor(timbre)[0]
这行代码,因此timbre为None,这里会导致后面

     spk_pred_logits = preds['timbre']
     spk_loss = F.cross_entropy(spk_pred_logits, spk_labels)

spk_pred_logits 的内容为None,因此报错,这里是bug吗?

你好,我想问下关于检查点的问题

我发现您们所提供的预训练检查点似乎都是只有权重的bin格式,而使用仓库中train训练出来的检查点都是pth格式,先是大小就差了2.5个G
由于我既无法连上HF也无法连上HFmirror,于是我就想着先用自己训练出来的检查点试试,就把检查点的名字改成了pytorch_model.bin,连着config一起放到了checkpoints里
然后我发现训练出来的模型并不能够用于声音重构,因为在reconstruct的时候,模型的键是:
dict_keys(['encoder', 'quantizer', 'decoder', 'discriminator', 'fa_predictors'])
而检查点的键是:
Keys in ckpt_params: dict_keys(['net', 'optimizer', 'scheduler', 'iters', 'epoch'])
请问是就是这样设计的呢,还是我的使用方法是错误的呢?
最后我想问一下,请问您们是如何不加上任何标签和注释就将一个音频的音色内容音高给解耦开的呢?是用的哪个文件中的哪一段函数呢?
多谢解答

GPU数量

你好,目前的config,batch size是4,但是论文里是8卡32的batch size,想问下released pre-train model是用了多少张卡进行训练?谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.