Git Product home page Git Product logo

vggvox-tensorflow's Introduction

Vggvox-TensorFlow

This model is for speaker identification.

This model could be trained and test pre-trained on the VoxCeleb(1) datasets as described in the following paper:

[1] A. Nagrani*, J. S. Chung*, A. Zisserman, VoxCeleb: a large-scale speaker identification dataset, 
INTERSPEECH, 2017

Demo

http://chuyuan.vipgz1.idcfengye.com/

Dependencies

[1] tensorflow-gpu=1.11.0

[2] librosa=0.6.3

[3] scipy=1.1.0

Platform

[1] Ubuntu 16.04

[2] Tesla P40

[3] Python 3.5

Train

Before training and testing, you must prepare the dataset and configure the environment well.

The full dataset can be freely downloaded from VoxCeleb.

After get the full dataset, and you can see [utils/vox1_split_backup.txt] in this repo so just prepare a gpu environment, get start training!

Training:

$ python3 train.py train --voxceleb_wav_dir [wav_dir] --vox_split_txt_file [split_file] --batch_size [bs] --lr [lr] --ckpt_save_dir [save_dir]

The args:

  1. --voxceleb_wav_dir: After you get full dataset, you will find all data is in a wav dir. Record the dir path.
  2. --vox_split_txt_file: You can find this file in [utils/vox1_split_backup.txt], and you can also find it in VoxCeleb.
  3. --batch_size: Batch size.
  4. --lr: Learning rate. The default optimizer is Adam.
  5. --ckpt_save_dir: Where you want to save the ckpt files. Default max ckpt files is 3.

Test

Before test, you must have the pre-trained ckpt file.

Test:

$ python3 train.py test --voxceleb_wav_dir [wav_dir] --vox_split_txt_file [split_file] --batch_size [bs] --ckpt_restore_file [ckpt_file]

The args:

  1. --voxceleb_wav_dir: After you get full dataset, you will find all data is in a wav dir. Record the dir path.
  2. --vox_split_txt_file: You can find this file in [utils/vox1_split_backup.txt], and you can also find it in VoxCeleb.
  3. --batch_size: Batch size.
  4. --ckpt_restore_file: Pre-trained model.

Example

Training:

$ python3 train.py train --voxceleb_wav_dir '/data/ChuyuanXiong/up/wav/' --vox_split_txt_file 'utils/vox1_split_backup.txt' --batch_size 32 --lr 0.001 --ckpt_save_dir '/data/ChuyuanXiong/backup/speaker_real318_ckpt' 

Testing:

$ python3 train.py test --voxceleb_wav_dir '/data/ChuyuanXiong/up/wav/' --vox_split_txt_file 'utils/vox1_split_backup.txt' --batch_size 32 --ckpt_restore_file '/data/ChuyuanXiong/backup/triplet_backup2/Speaker_vox_iter_51500.ckpt' --random_seed 100

Dataset

Set # POIs Utterances
Train 1251 145265
Test 1251 8251
Total 1251 153516
Dataset My Result
voxceleb1 acc=0.7, eer=0.08

Models

Model, password: 8gy0 trained by me, and this could be used in this repo. The accuracy is about 70%, this can be further optimized. Verification eer=8%.

Model trained by the author, this is for Matlab.

Citation

@InProceedings{Nagrani17,
  author       = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
  title        = "VoxCeleb: a large-scale speaker identification dataset",
  booktitle    = "INTERSPEECH",
  year         = "2017",
}

vggvox-tensorflow's People

Contributors

ecohnoch avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.