Git Product home page Git Product logo

seed-tts-eval's Introduction

seed-tts-eval

💥 This repository contains the objective test set as proposed in our project, seed-TTS, along with the scripts for metric calculations. Due to considerations for AI safety, we will NOT be releasing the source code and model weights of seed-TTS. We invite you to experience the speech generation feature within ByteDance products. 💥

To evaluate the zero-shot speech generation ability of our model, we propose an out-of-domain objective evaluation test set. This test set consists of samples extracted from English (EN) and Mandarin (ZH) public corpora that are used to measure the model's performance on various objective metrics. Specifically, we employ 1,000 samples from the Common Voice dataset and 2,000 samples from the DiDiSpeech-2 dataset.

Requirements

To install all dependencies, run

pip3 install -r requirements.txt

Metrics

The word error rate (WER) and speaker similarity (SIM) metrics are adopted for objective evaluation.

  • For WER, we employ Whisper-large-v3 and Paraformer-zh as the automatic speech recognition (ASR) engines for English and Mandarin, respectively.
  • For SIM, we use WavLM-large fine-tuned on the speaker verification task (model link) to obtain speaker embeddings used to calculate the cosine similarity of speech samples of each test utterance against reference clips.

Dataset

You can download the test set for all tasks from this link. The test set is mainly organized using the method of meta file. The meaning of each line in the meta file: filename | the text of the prompt | the audio of the prompt | the text to be synthesized | the ground truth counterpart corresponding to the text to be synthesized (if exists). For different tasks, we adopt different meta files:

  • Zero-shot text-to-speech (TTS):
    • EN: seed-tts-eval/en/meta.lst
    • ZH: seed-tts-eval/zh/meta.lst
    • ZH (hard case): seed-tts-eval/zh/hardcase.lst
  • Zero-shot voice conversion (VC):
    • EN: seed-tts-eval/en/non_para_reconstruct_meta.lst
    • ZH: seed-tts-eval/zh/non_para_reconstruct_meta.lst

Code

We also release the evaluation code for both metrics:

# WER
bash cal_wer.sh {the path of the meta file} {the directory of synthesized audio} {language: zh or en}
# SIM
bash cal_sim.sh {the path of the meta file} {the directory of synthesized audio} {path/wavlm_large_finetune.pth}

seed-tts-eval's People

Contributors

faceless-rex avatar npujcong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.