Git Product home page Git Product logo

icse's Introduction

About this work

Most recent studies on deep learning based speech enhance-ment (SE) focused on improving denoising performance. However, successful SE applications require striking a desirable balance between denoising performance and computational cost in real scenarios. In this study, we propose a novelparameter pruning (PP) technique, which removes redundant channels in a neural network. In addition, a parameter quan-tization (PQ) technique was applied to reduce the size of aneural network by representing weights with fewer clustercentroids. Because the techniques are derived based on dif-ferent concepts, the PP and PQ can be integrated to provideeven more compact SE models. The experimental resultsshow that the PP and PQ techniques produce a compactedSE model with a size of only 9.76% compared to that of the original model, resulting in minor performance losses from 0.85 to 0.84 for STOI and from 2.55 to 2.52 for PESQ. The promising results suggest that the PPand PQ techniques can be used in an SE system in devices with limited storage and computation resources.

PP & PQ schematic

PP

We found high redundancies in the channels of the well trained FCN layers, which provides similar latent information of a input testing speech. Thus, we define a threshold for sparsity to prune these redundant channels, and the process is like the graph below: image as shown in (c.), we used a "soft pruning" technique which retrains the model at some specific number of pruning rate. This allows the channels adjuist its latent behavior better after pruning.

PQ

The PQ process, the making of code book is shown in the graph below: image

Integration of PP & PQ

The best setup of PP PQ combination which we proposes is shown in the graph below: image

Experimental Results

The integration of these two approaches achieved 10 times model compression ratio with minor performance drop, like:

  • PESQ image
  • STOI image

ICSE technical summary

(A) Training/Testing environment setup

  • Conda 8.0
  • tensorflow-gpu 1.4.0
  • Python 2.7
  • Keras 1.1
  • Nvidia GTX-1080Ti
How to use the TIMIT_FCN_MSE.py

- Get python 2.7 environment
- Install Keras 1.1 (if you already have later version of Keras, please reinstall this version). 
- Fill in the GPU that is being used (default = 0, for 1 GPU computation resource, -1 for no CPU computation resource).
- Fill in the paths of the data expected to train/test with.
- Command: python TIMIT_FCN_MSE.py, you will get the model used in this work.
- This baseline model follows the settings in Fu, et.al's FCN.

(B) Baseline models

Normally, the FCN learning curve of this model will be like the following graph: image

The model we used in the following experiments can be found here.

(C)Dataset

In this paper, we used TIMIT dataset as our training and testing corpus.

(D)Additional Experimental Results

- Denoising task on different datasets:

Data Set Method PESQ STOI
CHiME-2 Noisy 1.95 0.60
CHiME-2 FCN 2.03 0.75
CHiME-2 PP+PQ (8x compressed) 2.01 0.74
MHINT Noisy 1.54 0.81
MHINT FCN 2.17 0.86
MHINT PP+PQ (10x compressed) 2.08 0.84

- Denoising+Dereverberation joint training/testing:

Denoising & Dereverberation Test

image

(E)Evaluation Metrics

We adopt PESQ and STOI to evaluate the proposed ICSE. The tools we used can be found here.

(F) Computational Cost

The results show that the computation loads in terms of simulated cycles is reduced from 23,821,318 to 19,084,879 (1.25 times) , and in terms of FLOPs is reduced from 0.6M FLOPs to 0.48M FLOPs per input size (arbitrary length of a speech utterance). The Results are computed by ARM software simualtion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.