bedangsen / voicesens Goto Github PK

A Voice Biometric Application using Watson Speech to Text

License: Apache License 2.0

Python 28.93% JavaScript 35.99% CSS 5.00% HTML 30.08%

speech watson-speech python voice-biometric-solutions cybersecurity

voicesens's Introduction

VoiceSens - Adding Voice Biometrics to your Application

VoiceSens is a text independent voice biometric solution developed to combat some of the shortcomings of standard authentication techniques like passwords and pincodes, as well as current available voice biometric solutions. The solution is developed in Python and uses Watson Speech to Text (speech recognition).

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Install and set up Python 3.
Sign up for an IBM Cloud account.
Create an instance of the Speech to Text service and get your credentials:
- Go to the Speech to Text page in the IBM Cloud Catalog.
- Log in to your IBM Cloud account.
- Click Create.
- Click Show to view the service credentials.
- Copy the iam_apikey and url values.

Configuring the application

Open the sample_config.py file and change the username and password for the text to speech service. Then rename the file to config.py

APIKEY = "APIKEY"  
URL = "URL"

Running locally

Clone the repository.

git clone https://github.com/bedangSen/VoiceSens.git

Move into the project directory.
```
cd VoiceSens
```
(Optional) Running it in a virtual environment.
1. Downloading and installing virtualenv.
```
pip install virtualenv
```
1. Create the virtual environment in Python 3.
```
 virtualenv -p path\to\your\python.exe test_env
```
1. Activate the test environment.
  1. For Windows:
```
test_env\Scripts\Activate
```
  1. For Unix:
```
source test_env/bin/activate
```
Install all the required libraries, by installing the requirements.txt file.
```
pip install -r requirements.txt
```
Run the application.
```
python voice.py
```
Go to http://localhost:8080

Demo

1. VoiceSens Homepage

The first thing that you see when you open the web page are two options:

Enroll a new user
Authenticate an existing user

2. Enrollment Page

If you haven't created a voice sample, the first step is to create an account and enroll your voice samples. The model then generates a voice print on the voice samples provided.

3. Authentication Page

Once you have created an account, you can authenticate yourself by recording a voice sample, generating a voice print, and then comparing the voice print to the voice prints in the database

4. Voice Biometrics Page

When you record your voice sample, the first thing you do is record the environmental sound. This creates a baseline for noise in the following recording, increasing the accuracy of your results. Once you are done with that you can proceed with reciting the randomly generated words. If the fuzzy matching ratio between the generated words and recognised words is less than 65, the recorded voice phrase will not be accepted, and you will be asked to record your voice sample again.

Key Components

IBM Watson Speech to Text - The Speech to Text Service used.
Scipy - SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.
Speech Recognition - Library for performing speech recognition, with support for several engines and APIs, online and offline.
Python Speech Features - This library provides common speech features for ASR including MFCCs and filterbank energies.
Fuzzy Wuzzy - Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.
Random Words - This is a simple python package to generate random english words.
Skitlearn Gaussian Mixture Models - sklearn.mixture is a package which enables one to learn Gaussian Mixture Models

References

To Do

Hashing the audio files and signing it with the clients private key, to prevent man in the middle attacks.
Improve the accuracy of the GMM model.
Add solution architecture.
Storing the models in a secure Object storage

voicesens's People

Contributors

Stargazers

Watchers

voicesens's Issues

Always Showing "Scanning for Environmental Sound"

Always Showing "Scanning for Environmental Sound" and after I tried to enrol in using my voice it always shows "Analyzing audio" for a long time.

WML API

Look into how to deploy the model created as a WML model and generating an endpoint for the scoring, so that it can be used as an API

Only One Person Enrolled

Really nice project! I'd like to ask say there's only one person enrolled as a user. If that person logs in and then a different person speaks into the mic, the "max" will always return the same person so there will always be a match. Is there a threshold for the log likelihood we can set that at some point none of the enrolled users match the user thats speaking?

Storing the models in a secure Object storage

How to create My own gmm models for voice analyzing

When the user is enrolling it's creating a Users directory but during checking it's checking in Models. How am I supposed to create my own .gmm files so as to get validated as an user ?

Performance Monitoring

Configure performance monitoring to evaluate and retrain the model periodically to ensure the model performance is acceptable. You will need an existing PostgreSQL or IBM Db2 Warehouse on Cloud connection associated with your project to be used as your feedback data connection.

Can this be used in the project?

Microphone not detected during the execution

The microphone access is not being asked when i execute the program, during the User Voice Print Enrollment - Start Recording, the microphone doesn't seem to activate, please help

Stuck in "Scanning for Environmental Sound"

Hey there,
I'm trying to setup the demo of this awesome project but I'm experimenting a bug. I've managed to add my IBM credentials and start up the Flask server, but when I try to complete the enrollment process, after pressing the "Start recording" button, the web app keeps stuck in the "Scanning for Environmental Sound" for quite more than 5 seconds. I'm silent and a I can see sound waves in the web app, but nothing happens. I don't know if I'm suppose to start speaking or anything.

Thanks for your time and hope to get this working!

Hashing the audio files

Hashing the audio files and signing it with the clients private key, to prevent man in the middle attacks.

Suggest to loosen the dependency on fuzzywuzzy

Dear developers,

Your project VoiceSens requires "fuzzywuzzy==0.17.0" in its dependency. After analyzing the source code, we found that the following versions of fuzzywuzzy can also be suitable without affecting your project, i.e., fuzzywuzzy 0.8.0, 0.8.1, 0.8.2, 0.9.0, 0.10.0, 0.11.0, 0.11.1, 0.12.0, 0.13.0, 0.14.0, 0.15.0, 0.15.1, 0.16.0, 0.18.0. Therefore, we suggest to loosen the dependency on fuzzywuzzy from "fuzzywuzzy==0.17.0" to "fuzzywuzzy>=0.8.0,<=0.18.0" to avoid any possible conflict for importing more packages or for downstream projects that may use ddos_script.

May I pull a request to further loosen the dependency on fuzzywuzzy?

By the way, could you please tell us whether such dependency analysis may be potentially helpful for maintaining dependencies easier during your development?

Details:

Your project (commit id: 92c6def) directly uses 2 APIs from package fuzzywuzzy.

fuzzywuzzy.fuzz.partial_ratio, fuzzywuzzy.fuzz.ratio

Beginning fromwhich, 11 functions are then indirectly called, including 7 fuzzywuzzy's internal APIs and 4 outsider APIs as follows:

[/bedangSen/VoiceSens]
+--fuzzywuzzy.fuzz.partial_ratio
|      +--fuzzywuzzy.utils.make_type_consistent
|      +--difflib.SequenceMatcher
|      +--fuzzywuzzy.StringMatcher.StringMatcher.__init__
|      |      +--warnings.warn
|      |      +--fuzzywuzzy.StringMatcher.StringMatcher._reset_cache
|      +--difflib.SequenceMatcher.get_matching_blocks
|      +--fuzzywuzzy.StringMatcher.StringMatcher.get_matching_blocks
|      |      +--fuzzywuzzy.StringMatcher.StringMatcher.get_opcodes
|      +--difflib.SequenceMatcher.ratio
|      +--fuzzywuzzy.StringMatcher.StringMatcher.ratio
|      +--fuzzywuzzy.utils.intr
+--fuzzywuzzy.fuzz.ratio
|      +--fuzzywuzzy.utils.make_type_consistent
|      +--difflib.SequenceMatcher
|      +--fuzzywuzzy.StringMatcher.StringMatcher.__init__
|      +--fuzzywuzzy.utils.intr
|      +--difflib.SequenceMatcher.ratio
|      +--fuzzywuzzy.StringMatcher.StringMatcher.ratio

Since all these functions have not been changed between any version for package "fuzzywuzzy" from [0.8.0, 0.8.1, 0.8.2, 0.9.0, 0.10.0, 0.11.0, 0.11.1, 0.12.0, 0.13.0, 0.14.0, 0.15.0, 0.15.1, 0.16.0, 0.18.0] and 0.17.0. Therefore, we believe it is safe to loosen the corresponding dependency.

background_voice.wav file

background_voice.wav file is missing in the static/audio/background_voice.wav.
can please say what type of file is it

Integrate Watson Speech to Text directly into the application

Microphone not detected

/verify

when signing in with an existing user, it brings me back to the enrollment page instead of the verify page, when trying to manually go to the verify page it produces this error. Does anyone know how to resolve this?

Traceback (most recent call last):
File "/home/michael/.local/lib/python3.8/site-packages/flask/app.py", line 2091, in call
return self.wsgi_app(environ, start_response)
File "/home/michael/.local/lib/python3.8/site-packages/flask/app.py", line 2076, in wsgi_app
response = self.handle_exception(e)
File "/home/michael/.local/lib/python3.8/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/home/michael/.local/lib/python3.8/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/michael/.local/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/home/michael/.local/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/michael/VoiceSens/voice.py", line 291, in verify
(rate, signal) = scipy.io.wavfile.read(filename_wav)
File "/home/michael/.local/lib/python3.8/site-packages/scipy/io/wavfile.py", line 647, in read
fid = open(filename, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: ''

Create a front end webpage

Make a pull request to the SpeechRecognition Library to update the ibm_recognize serivice to the updated IBM credentials service.

list index out of range

I got the error in line number 198 in voice.py at recognised_words
once I complete my audio detection after that I got error like list index out of range
please help me in this

Add solution architecture

Failed installing

Running on m1 macbook pro, while following installation instructions I've encountered this:

 Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [4 lines of output]
      <string>:12: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
      error in pandas setup command: 'install_requires' must be a string or list of strings containing valid project/version requirement specifiers; Expected end or semicolon (after version specifier)
          pytz >= 2011k
               ~~~~~~~^
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.

I recorded my sample voice but when I try to authenticate my voice imprint it shows no user found but the user is present in my system. This username does not exist. Please try another user name, or enroll the user