Git Product home page Git Product logo

honkling's Introduction

Honkling : JavaScript based Keyword Spotting System

Honkling is a novel web application with an in-browser keyword spotting system implemented with TensorFlow.js.

Honkling can efficiently identify simple commands (e.g., "stop" and "go") in-browser without a network connection. It demonstrates cross-platform speech recognition capabilities for interactive intelligent agents with its pure JavaScript implementation. For more details, please consult our writeup:

Honkling implements a residual convolutional neural network [1] and utilizes Speech Commands Dataset for training.

Honkling-node & Honkling-assistant

Node.js implementation of Honkling is also available under Honking-node folder.

Honkling-assistant is a customizable voice-enabled virtual assistants implemented using Honkling-node and Electron.

Details about Honkling-node and Honkling-assistant can be found in:

Personalization

Honkling can be personalized to individual user by recognizing the accent. From our experiments it is found that only 5 recordings of individual keyword can increase accuracy by up to 10%! With GPU, personalization can be achieved within only 8 seconds.

Pre-trained Weights

Pre-trained weights are available at Honkling-models.

Please run the following command to obtain pre-trained weights:

git submodule update --init --recursive

Customizing Honkling

Please refer honkling branch of honk to customize keyword set or train a new model.

Once you obtain weight file in json format using honk, move the file into weights/ directory and append weights[<wight_id>] = to link it to weights object.

Depending on change, config.js has to be updated and a model object can be instantiated as let model = new SpeechResModel(<wight_id>, commands);

Performance Evaluation

It is possible to evaluate the in-browser neural network inference performance of your device on the Evaluate Performance page of Honkling.

Evaluation is conducted on a subset of the validation and test sets used in training. Once the evaluation is complete, it will generate reports on input processing time (MFCC) and inference time.

As part of our research, we explored the network slimming [2] technique to analyze trade-offs between accuracy and inference latency. With honkling, it is possible to evaluate the performance on a pruned model as well!

The following is the evaluation result on Macbook Pro (2017) with Firefox:

Model Amount Pruned (%) Accuracy (%) Innput Processing (ms) Inference (ms)
RES8-NARROW - 90.78 21 10
RES8-NARROW-40 40 88.99 21 9
RES8-NARROW-80 80 84.90 22 9
RES8 - 93.96 23 24
RES8-40 40 93.99 23 17
RES8-80 80 91.66 22 11
  • Note that WebGL is disabled on Chrome and enabled on Firefox by default
  • Honkling uses RES8-NARROW
  • Details on model architecture can be found in the paper

Reference

  1. Raphael Tang and Jimmy Lin. Deep Residual Learning for Small-Footprint Keyword Spotting. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018), pages 5484-5488.
  2. Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, Changshui Zhang. Learning Efficient Convolutional Networks through Network Slimming. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV 2017), pages 2755-2763.

honkling's People

Contributors

daemon avatar dependabot[bot] avatar lintool avatar ljj7975 avatar masatoprc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

honkling's Issues

Weight file format

Format of node_js/src/weights/RES8_NARROW.js and the weight file created by training.py do not have same format. How to convert the file created by training.py into format similar to node_js/src/weights/RES8_NARROW.js ? Thanks.

Input shape mismatch error during training

I am getting following error:
python training.py -d ../data/go/ -cgo
tensorflow 1.12.0
keras 2.2.4

training model with learning rate = 0.1, num epochs = 26, batch size = 64
training start time = 2019-02-22 15:54:26
Traceback (most recent call last):
File "training.py", line 445, in
main()
File "training.py", line 410, in main
verbose=0)
File "/home/psasatte/.local/lib/python2.7/site-packages/keras/engine/training.py", line 952, in fit
batch_size=batch_size)
File "/home/psasatte/.local/lib/python2.7/site-packages/keras/engine/training.py", line 751, in _standardize_user_data
exception_prefix='input')
File "/home/psasatte/.local/lib/python2.7/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected input to have 4 dimensions, but got array with shape (0, 1)

2048 game demo

Let's build a voice-controlled 2048 demo using the system we currently have...

Find an online impl of 2048 and let's slap our KWS front-end on it.

For the initial impl, it's okay to press "space" every time to make a move...

UI webpage tweaks

@ljj7975 Can we add a spectrograph visualization to the web page? Sometimes the demo doesn't work, and I'm not sure if it's because microphone isn't work or some other issue. A spectrograph vis would help...

Building Script for checking accuracy of the model on browser

For purpose of personalizing model using browser side extra training,
we need to have a way to evaluate accuracy of model given set of test data

Steps may include

  1. python script to generate json with file names & corresponding keyword label
  2. generating audio tag with list of test files
  3. playing each audio and retrieve MFCC
  4. predicting the label and comparing with the actual label
  5. collecting result and displaying

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.