Official implementation of "Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis", Interspeech 2023
Hi, thank you for great paper and repo
I got a confusion when debug your repo following your paper in Interspeech 2023.
I saw you said in the paper that you join train speaker encoder alongside VITS model but when I inspect the code, I can not find where you have implemented that join training. As I understand, your code use speaker encoder as an layer in VITS model.
Pls can you make it clear for me?
Thank you !!!