lilianemomeni / kws-net Goto Github PK

View Code? Open in Web Editor NEW

60.0 60.0 12.0 2.96 MB

Seeing Wake Words: Audio-visual Keyword Spotting

License: MIT License

Python 99.66% Shell 0.34%

kws-net's People

Contributors

Stargazers

Watchers

Forkers

entn-at elenazy xuanhanyu erikhou45 weimingtom road2018 tstafylakis wuyx azuredsky fb029ed qingqingxu2020 drallert

kws-net's Issues

training environment and time

Thanks for this amazing work!

May I know what is your GPU environment and how long it takes for training?

Audio only, Audio-Visual implementation availability

The implementation for audio-specific training and audio-visual combined training seems to be missing in this repository. Could you please point me to where I can access the same and reproduce the experiments from the paper.

Cheers!

The processed features

Thanks for this amazing work!

The data pre-processing stages are a bit confusing for me.
Could you please kindly share the processed features? So I am avoid the results difference from the feature part.

Colab for easy replication

First I have to say this is truly amazing work, I can see use cases beyond what you present.

I know you have a very extensive README, but I was wondering if you could make it a bit simpler to run, specifically, have a Google Colab notebook that:

installs the necessary apt and pip packages
downloads the pre-trained models
allows inputting a video path/url and a list of words to spot
runs the demo and spots these words

Such a notebook would be ideal for quick experimentations with your models, and for me, for example, allow testing additional languages in which I'm interested (besides English, French, and German).

Audio only model preprocessing

Hi, thx for sharing this amazing work!

Actually, after finding that the directory does not contain audio-only KWS pre-trained model or model class, I've been trying to train audio-only KWS model for my own.

I extracted mel-spectrogram, and I'm wondering if you did any kind of normalization, and I would be much appreciated if you share what type of normalization or rough range of input mel-spectrogram data.

Hope you have a great day, bye!

lilianemomeni / kws-net Goto Github PK

kws-net's People

Contributors

Stargazers

Watchers

Forkers

kws-net's Issues

training environment and time

Audio only, Audio-Visual implementation availability

The processed features

Colab for easy replication

Audio only model preprocessing

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent