Most recent studies on deep learning based speech enhance-ment (SE) focused on improving denoising performance. However, successful SE applications require striking a desirable balance between denoising performance and computational cost in real scenarios. In this study, we propose a novelparameter pruning (PP) technique, which removes redundant channels in a neural network. In addition, a parameter quan-tization (PQ) technique was applied to reduce the size of aneural network by representing weights with fewer clustercentroids. Because the techniques are derived based on dif-ferent concepts, the PP and PQ can be integrated to provideeven more compact SE models. The experimental resultsshow that the PP and PQ techniques produce a compactedSE model with a size of only 9.76% compared to that of the original model, resulting in minor performance losses from 0.85 to 0.84 for STOI and from 2.55 to 2.52 for PESQ. The promising results suggest that the PPand PQ techniques can be used in an SE system in devices with limited storage and computation resources.
We found high redundancies in the channels of the well trained FCN layers, which provides similar latent information of a input testing speech. Thus, we define a threshold for sparsity to prune these redundant channels, and the process is like the graph below:
as shown in (c.), we used a "soft pruning" technique which retrains the model at some specific number of pruning rate. This allows the channels adjuist its latent behavior better after pruning.
The PQ process, the making of code book is shown in the graph below:
The best setup of PP PQ combination which we proposes is shown in the graph below:
The integration of these two approaches achieved 10 times model compression ratio with minor performance drop, like:
- Conda 8.0
- tensorflow-gpu 1.4.0
- Python 2.7
- Keras 1.1
- Nvidia GTX-1080Ti
How to use the TIMIT_FCN_MSE.py
- Get python 2.7 environment
- Install Keras 1.1 (if you already have later version of Keras, please reinstall this version).
- Fill in the GPU that is being used (default = 0, for 1 GPU computation resource, -1 for no CPU computation resource).
- Fill in the paths of the data expected to train/test with.
- Command: python TIMIT_FCN_MSE.py, you will get the model used in this work.
- This baseline model follows the settings in Fu, et.al's FCN.
Normally, the FCN learning curve of this model will be like the following graph:
The model we used in the following experiments can be found here.
In this paper, we used TIMIT dataset as our training and testing corpus.
Data Set | Method | PESQ | STOI |
---|---|---|---|
CHiME-2 | Noisy | 1.95 | 0.60 |
CHiME-2 | FCN | 2.03 | 0.75 |
CHiME-2 | PP+PQ (8x compressed) | 2.01 | 0.74 |
MHINT | Noisy | 1.54 | 0.81 |
MHINT | FCN | 2.17 | 0.86 |
MHINT | PP+PQ (10x compressed) | 2.08 | 0.84 |
We adopt PESQ and STOI to evaluate the proposed ICSE. The tools we used can be found here.
The results show that the computation loads in terms of simulated cycles is reduced from 23,821,318 to 19,084,879 (1.25 times) , and in terms of FLOPs is reduced from 0.6M FLOPs to 0.48M FLOPs per input size (arbitrary length of a speech utterance). The Results are computed by ARM software simualtion.