Git Product home page Git Product logo

deep-casa's Introduction

Deep CASA for talker-independent monaural speaker separation

Introduction

This is the Tensorflow implementation of "Divide and conquer: A deep CASA approach to talker-independent monaural speaker separation", IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 27, pp. 2092-2102. Please find demos and keynote lecture slides at ASRU-19 here.

Contents

  • ./feat/exp_prepare_folder.sh: prepares folders for experiments.
  • ./feat/feat_gen.py: generates STFT featuers for training, validation and test.
  • ./feat/stft.py: defines STFT and iSTFT.
  • ./nn/simul_group.py: training/validation/test of the simultaneous grouping stage.
  • ./nn/seq_group.py: training/validation/test of the sequential grouping stage.
  • ./nn/utility.py: defines various functions in simul_group.py and seq_group.py.

Experimental setup

This codebase has been tested on AWS EC2 p3.2xlarge nodes with Deep Learning AMI (Ubuntu 16.04) Version 26.0.

Follow instructions in turn to set up the environment and run experiments.

  1. Requirements:

    • Tensorflow 1.15.0.
      Activate the environment on EC2:
      source activate tensorflow_p27
      
    • gflags
      pip install python-gflags
      
    • Please install other necessary python packages if not using AWS deep Learning AMI (Ubuntu 16.04) Version 26.0.
  2. Before running experiments, activate the tensorflow environment on EC2 using:

    source activate tensorflow_p27
    
  3. Generate the WSJ0-2mix dataset using http://www.merl.com/demos/deep-clustering/create-speaker-mixtures.zip. Copy the generated files to the EC2 instance.

  4. Start feature extraction by running the following command in the main directory:

    python feat/feat_gen.py
    

    Thre are two arguments in feat_gen.py, data_folder and wav_list_folder. Change them to where your WSJ0-2mix dataset and file list locate.

  5. Train the simultaneous grouping stage using:

    TIME_STAMP=train_simul_group
    python nn/simul_group.py --time_stamp $TIME_STAMP --is_deploy 0 --batch_size 1 
    
    • Due to utterance-level training and limited GPU memory, batch_size can be selected as 1 or 2.
    • Change data_folder and wav_list_folder accordingly.
    • You can also change other hyperparameters, e.g., the number of epochs and learning rate, using gflags arguments.
  6. Run inference of simultaneous grouping (tt set) using:

    RESUME_MODEL=exp/deep_casa_wsj/models/train_simul_group/deep_casa_wsj_model.ckpt_step_1
    python nn/simul_group.py --is_deploy 1 --resume_model $RESUME_MODEL
    
    • $RESUME_MODEL is the model to be loaded for inference. Change it accordingly.
    • Mixtures, clean references and Dense-UNet estimates will be generated and saved in folder ./exp/deep_casa_wsj/output_tt/files/.
    • Please use your own scripts to generate results in different metrics.
  7. Generate temporary .npy file for the next stage (sequential grouping):

    RESUME_MODEL=exp/deep_casa_wsj/models/train_simul_group/deep_casa_wsj_model.ckpt_step_1
    python nn/simul_group.py --is_deploy 2 --resume_model $RESUME_MODEL
    
    • Setting is_deploy to 2 will generate unorganized estimates by Dense-UNet, and save them as .npy files for the sequential grouping stage.
    • tr, cv and tt data are generated in turn, and saved in ./exp/deep_casa_wsj/feat/.
  8. Train the sequential grouping stage using:

    TIME_STAMP=train_seq_group
    python nn/seq_group.py --time_stamp $TIME_STAMP --is_deploy 0
    

    Change data_folder and wav_list_folder accordingly. You can also change other hyperparameters, e.g., the number of epochs and learning rate, using gflags arguments.

  9. Run inference of sequential grouping (tt set) using:

     RESUME_MODEL=exp/deep_casa_wsj/models/train_seq_group/deep_casa_wsj_model.ckpt_step_1
     python nn/seq_group.py --is_deploy 1 --resume_model $RESUME_MODEL
    
    • $RESUME_MODEL is the model to be loaded for inference. Change it accordingly.
    • Mixtures, clean references and estimates will be saved in folder ./exp/deep_casa_wsj/output_tt/files/.
    • Please use your own scripts to generate results in different metrics.

deep-casa's People

Contributors

yuzhou-git avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.