Git Product home page Git Product logo

kaldi-cnn's Introduction

kaldi-cnn

Introduction

This Git repository is the CNN source code following the nnet2 (Dan's DNN implementation) in KALDI Speech Recognition Toolkit, which is the implementation of the paper: Lee, Hwaran, et al. "Deep CNNs Along the Time Axis With Intermap Pooling for Robustness to Spectral Variations." IEEE Signal Processing Letters 23.10 (2016): 1310-1314. [paper] [demo]

We provide:

  1. 2D Convolution layer (ConvoutionComponent)
  2. 3D Maxpooling layer (MaxpoolComponent)
  3. Fully connected layer (FullyConnectedComponent), which is plain version and different from the ['AffineComponentPreconditioned'] in nnet2.

Install

  1. Download and install the Kaldi Speech Recognition Toolkit from [kaldi-git-trunk].

  2. In the file "src/cudamatrix/cu-matrix.h", copy and paste the followings as member functions of class CuMatrixBase

     // Convolution 'this' with kernel => out
     // this matrix : row = num_chunks, col=in_height * in_width * in_channel
    
     void Conv2D(const CuMatrixBase<Real> &kernel,
     	int32 in_height,
     	int32 in_width,
     	int32 in_channel,
     	int32 kernel_height,
     	int32 kernel_width,
     	int32 group,
     	CuMatrixBase<Real> *out,
     	bool concat) const;
    
     // if vec = [1 2 3] and rep = 2 => vec2 = [ 1 1 2 2 3 3];
     // this = this * repmat(vec2, NumRows(), 1);
     void AddMatRepVec(const CuVectorBase<Real> &vec, int32 rep) const;
    
     // Flip 2D matrix. this [(kernel_height*kernel_width*in_channel) x group]
     // flip [(kernel_height*kernel_width*group) x in_channel]
     void FlipMat(int32 kernel_height, int32 kernel_width, int32 in_channel, int32 group, CuMatrix<Real> *flip) const;
    
     // zero padding along the edge
     // zero [ (NumRows() + pad_height*2) x (NumCols() + pad_width*2) ]
     void PaddingZero(int32 orig_height, int32 orig_width, int32 orig_channel, int32 kernel_height, int32 kernel_width, CuMatrix<Real> *padmat) const;
    
     void TpBlock(int32 in_channel, int32 block_size, CuMatrix<Real> *out) const;
    
     void TpInsideBlock(int32 group, int32 block_size, CuMatrix<Real> *out) const;
    
     void ModPermuteRow(int32 in_channel, int32 block_size, CuMatrix<Real> *out) const;
    
     void Maxpool_prop(int32 in_height, int32 in_width, int32 pool_height_dim, int32 pool_width_dim, int32 pool_channel_dim, CuMatrixBase<Real> *out) const;
     void Maxpool_backprop(const CuMatrixBase<Real> &out_value, const CuMatrixBase<Real> &out_deriv, CuMatrix<Real> *in_deriv,
     										int32 in_height, int32 in_width, int32 pool_height_dim, int32 pool_width_dim, int32 pool_channel_dim) const;
    
  3. Add new components in nnet0 into nnet2's header and source codes.

  • In the file "src/nnet2/nnet-component.cc"
    • add: #include "nnet0/nnet-component-nnet0.h"
    • add followings under "Component* Component::NewComponentOfType(const std::string &component_type) "

  } else if (component_type == "ConvolutionComponent") {
    ans = new cnsl::nnet0::ConvolutionComponent();
  } else if (component_type == "MaxpoolComponent") {
    ans = new cnsl::nnet0::MaxpoolComponent();
  } else if (component_type == "FullyConnectedComponent") { 
    ans = new cnsl::nnet0::FullyConnectedComponent();
  }
  • In the file "src/nnet0/nnet-component-nnet0.h"
    • Change the ChunkInfo's private variables to be "public"
    • In the class NonlinearComponent, change 'UpdateStates' function to be "public"
  1. Copy "src/cnslmat" folder into the kaldi trunk and make

  2. Copy "src/nnet0" folder into the kaldi trunk and make

  3. In the file "src/nnet2bin/Makefile" add followings: ADDLIBS ../nnet0/cnsl-nnet0.a ../cnslmat/cnsl-cnslmat.a

  4. Make all source files cd ../src make

Guide to run the library

  1. To train CNN, run "local/nnet0/run_nnet.sh". Before you run the code, you need a network configuration file "nnet.conf" in your experiment directory. Also when the network includes dropout layers, "dropout_scale.config" file is required.

Note

  • Implemented by Hwaran Lee (Computational NeroSystems Labs, KAIST)
  • under KALDI Revision 4510
  • updated date : 2015. 05. 15.

kaldi-cnn's People

Contributors

hwaranlee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kaldi-cnn's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.