Git Product home page Git Product logo

mtgan's Introduction

MTGAN

MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks

Abstract

In this paper, we propose an enhanced triplet method that improves the encoding process of embeddings by jointly utilizing generative adversarial mechanism and multitasking optimization. We extend our triplet encoder with Generative Adversarial Networks (GANs) and softmax loss function. GAN is introduced for increasing the generality and diversity of samples, while softmax is for reinforcing features about speakers. For simplification, we term our method Multitasking Triplet Generative Adversarial Networks (MTGAN). Experiment on short utterances demonstrates that MTGAN reduces the verification equal error rate (EER) by 67% (relatively) and 32% (relatively) over conventional i-vector method and state-of-the-art triplet loss method respectively. This effectively indicates that MTGAN outperforms triplet methods in the aspect of expressing the high-level feature of speaker information.

Instruction

This is an unofficial MTGAN implementation. It only provides a preliminary code for these neural network of this model architecture without concretely calculation such as convolutional kernel size and pooling kernel size.

In this repository, I used cosine similarity-based Tripelt loss instead of the Euclidean distance-based Triplet loss used by the authors of this paper.

Dataset

Private dataset. Anybody using this code can download Voxceleb dataset as substitution.

I substituted the private dataset with Voxceleb1 Dev dataset to train both original triplet model and MTGAN model. The models is evaluated on Voxceleb1 Test dataset.

Original Triplet Loss

In original_triplet directory, I implemented a simple triplet loss which uses a randomly hard sampling strategy. It is treated as a contrastive experiment to MTGAN so the encoder of it is same with MTGAN.

After 500 epochs, the model is converged. Changing trend of loss function is shown as the following picture.

Triplet loss

triplet loss

Number of non zero triplets

non zero triplets

EER

eer

EER: 9.020%

I-Vector System

I used KALDI sre16 scripts to build an I-Vector system.

EER: 5.509%

ResNet18 + LMCL

Another neural network-based system was built with augmenting data by using MUSAN.

EER: 4.32%

MTGAN Performance

Continuing...

Contact

Email: [email protected]

WeChat: zengchang-_-

mtgan's People

Contributors

zengchang233 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.