plachtaa / facodec Goto Github PK
View Code? Open in Web Editor NEWTraining code for FAcodec presented in NaturalSpeech3
Training code for FAcodec presented in NaturalSpeech3
For example, if I don't see Tensorbord in it and report an error during runtime: Audiotools is required but I haven't seen it in the requirements section, should I consider completing the requirements?
The error log mentions that some modules were compiled using an older version of NumPy (1. x series), while the current environment is using NumPy 2.0.0, which may cause program crashes. To support both NumPy 1. x and 2. x versions simultaneously, it is necessary to recompile these modules using NumPy 2.0. Does this require an update?
AttributeError: _ARRAY_API not found,may i ask what is that API?
Hi, I have a question about the dataset.
As far as I know, the official FACodec checkpoint was trained using Librilight.
Was this version of the checkpoint also trained using Librilight?
README only says 50k hours of training data and the possibility of multi-language.
I'm confused because Librilight is known as containing 60k hours, Libriheavy is known as containing 50k hours.
I wonder about the details of the training data.
Thanks.
I tried to test the code some specifically for prosody but it seemed like the prosody was tied to codes[1] with the content?
Hello,
I've attempted to train FAcodec using my own dataset. However, whether I start from scratch or fine-tune your provided checkpoint, the reconstructed audio clips are just noise. I fine-tuned the model using around 128 hours of Common Voice 18 ZH-TW data. After approximately 20k steps, the loss seemed to converge. Some losses, like feature loss, decreased successfully, while others, such as mel loss and waveform loss, were oscillating.
Do all losses decrease during your training process?
你好,请问解码器是否可以支持流式输出?
gr_content_f0和gr_prosody_phone这两个grl层似乎没有使用,这与原论文是不符的,请问你有探究过这两部分的影响吗?
Can you indicate in which file you implemented this feature?
and , As you wrote in Read Me: \ t<speakeer_id>\ t\ t<script>\ t<phonemixed_transscript>If these parameters cannot be replaced with placeholders, will the presence or absence of these parameters have a performance impact on the final trained model?
想请教下,您是否已经跑通了代码,并且验证了效果呢?
因为看到好多权重设置跟论文中不一致
meldataset.py中68行的clamp是否想打clip?
以及84行的
max_wave_length = max([b[0].size(0) for b in batch])
TypeError: 'int' object is not callable
是否应该改成与上一行一样的shape[0]?
一、请问我的流程是否正确:
1、修改meldataset.py,改为自己的dataloader,使用VCTK数据集以及wav2vec生成伪标签,在train.py上训练出几个ckpt文件
2、使用训练出的最后一个ckpt作为预训练模型,训练train_redecoder.py(有一个疑问是此处用于训练train_redecoder.py的和train所用的数据集一样即可吗?)
3、使用train训练出的ckpt以及train_redecoder.py训练出的ckpt,作用于reconstruct_redecoder.py上进行音色转换
二、请问通过train和train_redecoder.py训练出的ckpt文件是否和您所提供的bin预训练模型有着相同的结构和参数?
感谢解答!
您好,我在运行train.py的时候碰到以下报错:
Traceback (most recent call last):
File "/home/tts/ref/ns3/train.py", line 496, in <module>
main(args)
File "/home/tts/ref/ns3/train.py", line 342, in main
spk_logits = torch.cat([speaker_model.infer_segment(w16.cpu()[..., :wl])[1] for w16, wl in zip(waves_16k, wave_lengths)], dim=0)
File "/home/tts/ref/ns3/train.py", line 342, in <listcomp>
spk_logits = torch.cat([speaker_model.infer_segment(w16.cpu()[..., :wl])[1] for w16, wl in zip(waves_16k, wave_lengths)], dim=0)
File "/opt/conda/envs/amphion/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1688, in __getattr__
raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
AttributeError: 'EncDecSpeakerLabelModel' object has no attribute 'infer_segment'
nemo-toolkit的版本为1.21.0。
这个报错我在nemo的issues中找过,但是没有找到相关的问题。
项目中的reconstruct和redecoder reconstruct似乎只能针对预训练文件,也就是bin,我想请教下train训练的pth文件能否用于推理
还有就是想请问不用任何标签也可以训练出解耦音频要素的方法是在哪个文件中体现的
感谢解答
Hi, thank you for sharing the training code of FACodec! I've come across a couple of points:
1.Fine-tuning the redecoder:
I'm interested in fine-tuning the redecoder using the provided encoder and redecoder bin files. However, I noticed that there's no 'net' key in the bin file, which seems to cause an issue when loading the checkpoint. Could you provide some guidance on how to properly load these files for fine-tuning?
2.Additional activation function:
I noticed that there's an additional WN gated activation function applied after the timbre layer norm, which differs from the original code and description in the paper. I'm curious about the reasoning behind this architectural change. Could you share some insights into why this modification was made and how it impacts the model's performance?
您好,请问 FAcodec/modules /quantize.py中FApredictors中forward_v2函数注释掉了
spk_pred = self.timbre_predictor(timbre)[0]
这行代码,因此timbre为None,这里会导致后面
spk_pred_logits = preds['timbre']
spk_loss = F.cross_entropy(spk_pred_logits, spk_labels)
spk_pred_logits 的内容为None,因此报错,这里是bug吗?
即某一帧是否有声音,计算方式为f0是否大于某一阈值?
我发现您们所提供的预训练检查点似乎都是只有权重的bin格式,而使用仓库中train训练出来的检查点都是pth格式,先是大小就差了2.5个G
由于我既无法连上HF也无法连上HFmirror,于是我就想着先用自己训练出来的检查点试试,就把检查点的名字改成了pytorch_model.bin,连着config一起放到了checkpoints里
然后我发现训练出来的模型并不能够用于声音重构,因为在reconstruct的时候,模型的键是:
dict_keys(['encoder', 'quantizer', 'decoder', 'discriminator', 'fa_predictors'])
而检查点的键是:
Keys in ckpt_params: dict_keys(['net', 'optimizer', 'scheduler', 'iters', 'epoch'])
请问是就是这样设计的呢,还是我的使用方法是错误的呢?
最后我想问一下,请问您们是如何不加上任何标签和注释就将一个音频的音色内容音高给解耦开的呢?是用的哪个文件中的哪一段函数呢?
多谢解答
Thanks for you great work on implementing FACodec!
I found the data file in https://github.com/Plachtaa/FAcodec/blob/master/data/val.txt has some labels, like speaker id, phonemes. How can I get these labels? Will these labels be auto-generated in the training process?
你好,目前的config,batch size是4,但是论文里是8卡32的batch size,想问下released pre-train model是用了多少张卡进行训练?谢谢
Hi! Nice work!
Could you share how many steps would be sufficient to train a new model? I'm trying to train a 16k FAcodec. The results reconstructed by ckpt 130,000 still sound different from the real speech, especially for the speaker timbre.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.