Git Product home page Git Product logo

iwatake2222 / pico-loud_talking_detector Goto Github PK

View Code? Open in Web Editor NEW
20.0 4.0 0.0 10.76 MB

A tinyML system using a Raspberry Pi Pico and TensorFlow Lite for Microcontrollers to detect loud talking. It can be utilized to encourage people to eat quietly to prevent the spread of the coronavirus and help in the fight against COVID

License: Apache License 2.0

Python 0.99% CMake 0.03% C++ 4.55% C 0.91% Jupyter Notebook 93.52%
raspberry-pi-pico tinyml cpp raspberrypi tensorflow tensorflowlite

pico-loud_talking_detector's Introduction

Loud Talking Detector in FRISK

  • This tinyML system uses a Raspberry Pi Pico and TensorFlow Lite for Microcontrollers to detect loud talking. It can be utilized to encourage people in restaurants/cafes to eat quietly to prevent the spread of the coronavirus and help in the fight against COVID
    • It detects "talking" when people talk loudly
    • It doesn't detect "talking" when people talk quietly or the sound is not talking (e.g. noise, music, etc.)

00_doc/pic.jpg

YouTube

00_doc/youtube.jpg

System overview

  • Deep Learning Model
    • A deep learning model is created to classify 10 or 5 seconds of audio to two types of sound ("Talking", "Not Talking")
      • Change the value of CLIP_DURATION and kClipDuration to switch clip duration (10sec/5sec)
    • The model is converted to TensorFlow Lite for Microcontrollers format
    • The training runs on Google Colaboratory
  • Device
    • The model is deployed to a Raspberry Pi Pico
    • A microphone and a display are connected to the Raspberry Pi Pico
    • The Raspberry Pi Pico captures sound from the microphone, judges whether it's loud talking and outputs a result to the display

system_overview.png

Sound Category

  • Not Talking:
    • Quiet
    • Voice from a distance
    • Music
    • Noise
    • Others
  • Talking:
    • Voice
    • Voice (more than one person)
    • Voice + Music
    • Voice + Noise

type_category.png

How to make

Components

Connections

00_doc/connections.txt

How to Build

git clone https://github.com/iwatake2222/pico-loud_talking_detector.git
cd pico-loud_talking_detector
git submodule update --init
cd pico-sdk && git submodule update --init && cd ..
mkdir build && cd build

# For Windows Visual Studio 2019 (Developer Command Prompt for VS 2019)
# cmake .. -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Debug -DPICO_DEOPTIMIZED_DEBUG=on
cmake .. -G "NMake Makefiles"
nmake

# For Windows MSYS2 (Run the following commands on MSYS2)
# cmake .. -G "MSYS Makefiles" -DCMAKE_BUILD_TYPE=Debug -DPICO_DEOPTIMIZED_DEBUG=on
cmake .. -G "MSYS Makefiles" 
make

How to Debug on PC

  • If you want to debug this project on PC, please run cmake in pj_loud_talking_detector/pj_loud_talking_detector directory
  • The created project uses TestBuffer instead of microphone. You can change audio data by modifying C array in test_audio_data.h
  • wave2array.py is useful to convert wave file to C array.
  • I tested on Visual Studio 2019 and I'm not sure it works on other environments

About Deep Learning Model

How to Create a Deep Learning Model

  • Run the training script 01_script/training/train_micro_speech_model_talking_10sec.ipynb on Google Colaboratory. It takes around 10 hours to train the model using GPU instance
  • The original script is https://colab.research.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/examples/micro_speech/train/train_micro_speech_model.ipynb
  • I made some modifications:
    • Use my dataset
    • Mix noise manually:
      1. Prepare original data: [Talking]
      2. Mix background: [Talking, Talking + Background]
      3. Mix noise: [Talking, Talking + Background, Talking + Noise, Talking + Background + Noise]
    • Separate test data from training data completely
      • Some clips are generated from the same video, so data leakage may happen if I randomly separate data following the original script
    • Change the wanted word list from [yes, no] to [talking, not_talking]
    • Remove "SILENCE" and "UNKNOWN" category
      • Because "SILENCE" and "UNKNOWN" are parts of "Not Talking"
    • Change clip duration from 1 sec to 10 sec
    • Increase training steps
  • Note: You cannot run the script because data download will fail. I don't share dataset due to copyright restrictions

Dataset

  • Details
    • Talking:
      • Talk show from YouTube
      • TV show
    • Not Talking:
    • Background (for augmentation and "Not Talking")
      • Restaurant / Coffee shop ambience
    • Noise (for augmentation and "Not Talking")
      • White noise, pink noise
  • The number of data
    • 10 sec model
      • Talking: 22,144
      • Not Talking: 20,364
    • 5 sec model
      • Talking: 38,854
      • Not Talking: 36,557

Software Design

Ths original project is from https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro/examples/micro_speech .

Dataflow

dataflow.png

Modules

modules.png

  • AudioBuffer:
    • provides an interface to access storead audio data in a ring block buffer
    • has three implementations: ADC (for analog mic connected to ADC), PDM (for PDM mic), TestBuffer (prepared data array). I use PDM in this project
  • RingBlockBuffer:
    • consists of some blocks. The block size is 512 Byte and the size is equal to DMA's transfer size and the size is equal to the buffer size for PDM mic module
    • 512 Byte ( 32 msec @16kHz ) is also convenient to work with FeatureProvider which generates feature data using 30 msec of audio data at 20 msec intervals
  • AudioProvider:
    • copies data from the ring block buffer to the local buffer for the requested time
    • converts data from uint8_t to int16_t if needed
    • allocates the data on sequential memory address
  • FeatureProvider:
    • almost the same as the original code.
  • Judgement:
    • judges whether the captured sound is "talking" using the following conditions:
      • current score of "talking" >= 0.8
      • average score of "talking" >= 0.6
      • decibel >= -20 [dB] (0 [dB] is the max value of input (65536))

Performance

10 sec model 5 sec model
Accuracy 96.1 [%] 94.8 [%]
Processing time --- ---
__Total 554 [msec] 298 [msec]
__Preprocess 61 [msec] 31 [msec]
__Inference 455 [msec] 226 [msec]
__Other 38 [msec] 41 [msec]
Power consumption 3.3 [V] x 21 [mA] 3.3 [V] x 22 [mA]

Note: Power consumption is measured without OLED (with OLED, it's around 26 [mA]).

Future works

order_call.png

  • This is a very tiny system ( fittiing in FRISK ! ) , so that it can be implemented in an order call system in restaurants to encourage customers to eat quietly to prevent the spread of the coronavirus
  • Need to reduce power consumption
    • Current system continuously captures audio and runs inference. However, a quick response is not so important for many cases. The frequency of inference can be decreased, probably once every several seconds or once a minute is enouhgh
    • Or using an analog circuit to check voice level and kick pico in sleep mode may be a good idea
  • Need to improve accuracy
    • So far, the training data is very limited (most of them are Japanese)

Acknowledgements

pico-loud_talking_detector's People

Contributors

iwatake2222 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.