Git Product home page Git Product logo

lightningfastspeech2's Introduction

LightningFastSpeech

WARNING: This is a work in progress and until version 0.1 (which will be out very soon), it might be hard to get running on your own machine. Thanks for your patience.

Large Pretrained TTS

In the NLP community, and more recently in speech recognition, large pre-trained models and how they can be used for down-stream tasks have become an exciting area of research.

In TTS however, little similar work exists. With this project, I hope to make a first step into bringing pretrained models to TTS. The original FastSpeech 2 model is 27M parameters large and models a single speaker, while our version would have almost 2B parameters without the improvements from LightSpeech, which bring its size down to a manageable 135M, and models more than 2,000 speakers.

A big upside of this implementation is that it is based on Pytorch Lightning, which makes it easy to do multi-gpu training, load pre-trained models and a lot more.

LightningFastSpeech couldn't exist without the amazing open source work of many others, for a full list see Attribution.

Current Status

This library is a work in progress, and until v1.0, updates might break things occasionally.

Goals

v0.1

0.1 is right around the corner! For this version, the core functionality is already there, and what's missing are mostly quality of life improvements that we should get out of the way now.

  • Replicate original FastSpeech 2 architecture
  • Include Depth-wise separable convolutions found in LightSpeech
  • Dataloader which computes prosody features online
  • Synthesis of both individual utterances and whole datasets
  • Configurable training script.
  • Configurable synthesis script.
  • First large pre-trained model (LibriTTS, 2k speakers, 135M).
  • Documentation & tutorials.
  • Configurable metrics.
  • LJSpeech support.
  • PyPi package.

v1.0

It will take a while to get to 1.0 -- the goal for this to allow everyone to easily fine-tune our models and to easily do controllable synthesis of utterances.

  • Allow models to be loaded from the Huggingface hub.
  • Streamlit interface for synthesising utterances and generating datasets.
  • Tract and tractjs integration to export models for on-device and web use.
  • Make it easy to add new datasets and to fine-tune models with them.
  • Add HiFi-GAN fine-tuning to the pipeline.
  • A range of pre-trained models with different domains and sizes (e.g. multi-lingual, noisy/clean)

Attribution

This would not be possible without a lot of amazing open source project in the TTS space already present -- please cite their work when appropriate!

lightningfastspeech2's People

Contributors

minixc avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.