Git Product home page Git Product logo

swanton's Introduction

swanton

Swanton Pacific Ranch chatbot with a knowledge graph on a Raspberry Pi. Please refer to the Cal Poly CSAI Docs for a conceptual understanding of the project and this repository's wiki for more information about setup and operation.

Getting Started

Step 1 - Setup a fresh Debian 10 or Raspbian image

On a raspberry pi, we recommend using the AIY voicekit image from here for your sd card.

sudo apt update \
    && sudo apt install -y git \
    && git clone https://github.com/calpoly-csai/swanton \
    && cd swanton \
    && ./debian_setup.sh
    
source $HOME/.poetry/env

Step 2 - Verify Versions

python3 >= 3.6

$ python3 --version
Python 3.7.3

pip3 >= 20 using python >= 3.6

$ pip3 --version
pip 20.1.1 from /usr/local/lib/python3.7/dist-packages/pip (python 3.7)

poetry

$ poetry --version
Poetry version 1.0.10

Step 3 - Install Python Dependencies

poetry install

swanton's People

Contributors

chidiewenike avatar gwholland3 avatar jason-ku avatar masonmcelvain avatar mfekadu avatar snekiam avatar taylor-nguyen-987 avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

mjwesolo

swanton's Issues

TTS Implementation to RPi

Objective

Take the TTS module created by @gwholland3 and implement it into the RPi4 with audio output.

Key Result

Deploy text_to_speech.py to the Raspberry Pi.

Details

Remove generic data in mapping JSON

Objective

Remove the generic data that is generated when running the Speech Generator program.

Key Result

Removes dependence on the mapper to randomly select responses which is now intrinsically handled by Rasa Core.

Details

The current iteration of the Speech Generator traverses through the generic utterances CSV and creates a mapping of these utterances to be randomly selected by the voice assistant.

Add a comment portion to every story

Objective

Modify the story conversion code to add a comment portion to each story.

Key Result

Obtain feedback from users after every story to improve the system.

Details

To make improvements to the system, we need to obtain feedback from users. To accomplish this, we will add a comment portion to each story to ask users if they would like to provide feedback about their experience.

Bash script to monitor a directory for code updates

Objective

We want to be able to update the model and code by pushing a file to a pre-defined directory.

Key Result

Code on the Pi, as well as models, should be updatable.

Details

The zip file can be formatted however the implementer would like, but I'd suggest the zip file be simply the github repo.
I'd suggest using a cronjob to monitor the target directory, and once a new file is dropped in, unzip it and copy some target files (we don't have a definitive list yet) to a target directory (also not defined yet).

Additional context
Add any other context or screenshots about the feature request here.

Analyze Deepspeech Feedback Transcriptions

Objective

Generate a dataset of feedback users could give to the chatbot and log its transcription.

Key Result

A CSV with the ground-truth feedback and how the feedback was transcribed using Deepspeech.

Details

We want to obtain feedback from users to analyze the user experience and make improvements. One solution is to store the raw audio response and pull that data for analysis. An issue with this is that audio data is large in comparison to text and transferring this data could take a long time. The easiest thing to do would be to transcribe that audio to text and logging the data.

You will need the following DeepSpeech model and DeepSpeech scorer to use run_stt. Ensure that these files are in the same directory as the stream_deepspeech.py program.

Example of feedback dialogue:
Chatbot: "Would you like to provide feedback about your experience?"
User: "Sure."
Chatbot: "Okay, please let me know how your experience was."
User: "Um..." Pauses for 3 seconds to think "My experience was good. The assistant answered my question and also shared something about the Cheese House that I did not know"

(The last user response is the type of feedback data you will need to generate and test Deepspeech with.)

Improve Speech-To-Text (ML Approach)

Objective

The DeepSpeech Speech-To-Text system needs to be improved to handle uncommon & non-English words. The machine learning approach is to retrain the DeepSpeech model with new audio data and analyze the results.

Key Result

Using the run_stt function of stream_deepspeech.py, return a string of audio input that is correctly transcribed.

def run_stt(time_len=TIME_LEN):

Details

Correctly transcribe all QA pairs from the question-answer pairs Google Sheet. To train a new DeepSpeech model, you can follow these instructions.

You will need the following DeepSpeech model and DeepSpeech scorer to use run_stt.

If in need of assistance, please ask @chidiewenike

Additional Resources

Text-to-Speech Solution Benchmarking

Objective

Generate metrics for the most viable offline/local text-to-speech solution and graph the results.

Key Result

A separate graphs for both memory usage and run time per library.

Details

Run the answer for each QA pair in the given document and get the runtime/memory metrics. Try 3 runs per string and also an average of the three runs per string. Check out the header example below. You could write them out to a CSV and then graph them on Excel. @chidiewenike can help you with the graphing step if needed.

Additional context

Ex Header per library (each column is delimited with a | ):
String | (Library)-Runtime#1 | (Library)-Runtime#2 | (Library)-Runtime#3 | (Library)-Runtime Avg | (Library)-Memory Usage#1 | (Library)-Memory Usage#2 | (Library)-Memory Usage#3 | (Library)-Memory Usage Avg|

Text-To-Speech Module

Objective

Explore offline Text-To-Speech (TTS) libraries that will convert a string to audio.

Key Result

Create a function that will output raw audio bytes from a string input.

Details

The function will take, as input, a string which will be the output from Rasa. The string is then converted to raw audio bytes by an offline/local TTS library. When selecting the appropriate library, priorities are as follows:

  1. Memory
  2. Runtime
  3. Speaker Quality
  4. Customizability of sound properties

Uploading Code Changes

Objective

We need to have a script to stop the assistant, unzip the new code, rasa model, stories, etc, and update the cron jobs. This will be triggered by the phone app after uploading new code. It should delete the update file after completing the unzip/update/restart of the rasa service.

Log Queries to Google Sheets

Objective

Log user queries to a shared Google Sheets page to analyze user experiences and make improvements in both the UX and NLP model.

Key Result

Create two separate functions:

  1. First function must setup the Google Sheets API service with the necessary configs.
  2. The second function must take the user's question and the system's answer & log it on a single row of a Google Sheets page with a timestamp of the log entry.

Details

You can use this tutorial to achieve the required functionality for this program.

To log entries to a Google Sheet page, you will need to use the Google Sheets API.

To use the Google Sheets API, you will need an authentication JSON and Google Sheet ID. Make the strings for the path to the authentication JSON and Google Sheet ID constants in the script which the function resides.

A log entry row will look as follows:

| Question | Answer | Timestamp |
"How big is Swanton Pacific Ranch?" " 3200 acres " "12/22/2020 12:45:23"

Programmatically control Raspberry Pi WiFi

The plan is to use the Pi as a wifi access point so a user can connect and pull logs off, but we want to be able to control this programmatically, from python. here is some info on running the pi as a hotspot (access point). We want to be able to turn this on/off from python.

Logging for run_assistant.py

Objective

We need questions, as well as responses, logged from the run_assistant process. This should be with something like this, with log rotations daily at a specified time. This should not include some of the info that currently gets printed to the console, like the following:

JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
JackShmReadWritePtr::~JackShmReadWritePtr - Init not done for -1, skipping unlock
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_oss.c:377:(_snd_pcm_oss_open) Unknown field port
ALSA lib pcm_a52.c:823:(_snd_pcm_a52_open) a52 is only for playback
ALSA lib conf.c:5014:(snd_config_expand) Unknown parameters {AES0 0x6 AES1 0x82 AES2 0x0 AES3 0x2  CARD 0}
ALSA lib pcm.c:2565:(snd_pcm_open_noupdate) Unknown PCM iec958:{AES0 0x6 AES1 0x82 AES2 0x0 AES3 0x2  CARD 0}
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_usb_stream.c:486:(_snd_pcm_usb_stream_open) Invalid type for card
ALSA lib pcm_dmix.c:1108:(snd_pcm_dmix_open) unable to open slave

Add Desktop Voice Assistant

Objective

Develop a script comparable to the voice assistant program that is deployed on the Raspberry Pi.

Key Result

A desktop program will allow users to test the RPi functionality without the need for a Raspberry Pi.

Details

Possession of an RPi is an immediate barrier of entry to test the voice assistant. Developing a comparable program that is usable on a desktop will allow anyone to test the system's functionality.

Create Greeting NLP Dialogue with Rasa

Objective

Create a greeting dialogue using Rasa 1.x to be used by the voice assistant.

Key Result

The dialogue should be able to understand any greeting by the user and respond accordingly. The deliverables are the nlu and stories Markdown files that will be used to train the model.

Details

Rasa is an open-source NLP framework that allows users to create dialogues using Markdown files as training data. You can get an idea of how Rasa functions using the Rasa Playground. For the Swanton voice assistant, we will be using Rasa 1.x. The setup is simple and training models is straight-forward as well. The brunt of Rasa is understanding what data you have and interpreting why the model is acting the way it is based on your data. This is a great introduction to machine learning, data science, and natural language processing.

Speech-To-Text Module

Objective

Explore offline Speech-To-Text (STT) libraries that will convert raw audio bytes to a string.

Key Result

Create a function that will output a string from raw audio bytes input.

Details

The function will take, as input, raw audio bytes. The properties of the audio is TBD. The raw audio bytes are then converted to a string by an offline/local STT library. Beyond memory, the priority should be a library that allows custom speech adaption. Speech adaption will allow some sort of user input (list of words, transcripts, etc) to disambiguate uncommon words.

When selecting the appropriate library, priorities are as follows:

  1. Memory
  2. Customizable speech understanding
  3. Customizability of sound properties
  4. Runtime

Mozilla Text-To-Speech

Objective

Test the Mozilla Text-To-Speech module as an offline TTS option which sounds more natural and human-like.

Key Result

A function which takes a string as input to the function and outputs the string as audio.

Details

The current TTS solution provides decent output with preferred memory usage & runtime. The next step would be a deep-learning approach which traditionally allows for more natural-sounding speech. Mozilla's Text-To-Speech module shows promise and could be a possible solution. The output audio will need to be tested/analyzed as well as benchmarking memory usage/runtime.

Pipenv To Organize Python Packages

Objective

A consolidated list of Python packages for simple setup when pulling the repository.

Key Result

Users will be able to acquire the necessary Python packages to utilize the repository.

Details

Pipenv has a comprehensive interface to organize Python packages needed for running the programs on our repository.

Improve Speech-To-Text (Rules-Based Approach)

Objective

The DeepSpeech Speech-To-Text system needs to be improved to handle uncommon & non-English words. The rules-based approach is to inspect the output of the current DeepSpeech model and create a mapping of transcribed audio to the actual expected output. It will look-up all the substrings of the transcribed text to search for potential errors when the text is transcribed and replace those substrings in the event of errors. You can store these mappings in a JSON.

Key Result

Using the run_stt function of stream_deepspeech.py, return a string of audio input that is correctly transcribed.

def run_stt(time_len=TIME_LEN):

Example

Expected Transcription: "What is Casa Verde used for?" => DeepSpeech transcription: "What is cause uh very day used for?"

mapping={
    "cause uh very day" : "Casa Verde"
}
transcription_substring = "cause uh very day"
print(mapping[transcription_substring ]) # Output: Casa Verde

Details

Correctly transcribe all QA pairs from the question-answer pairs Google Sheet.

You will need the following DeepSpeech model and DeepSpeech scorer to use run_stt. Ensure that these files are in the same directory as the stream_deepspeech.py program.

If in need of assistance, please ask @chidiewenike

Project Constraints

Describe the bug

This project has some challenging constraints

1. RAM limitations of Raspberry Pi

2. Internet Connectivity Limitations @ Swanton Pacific Ranch

swanton image

3. Budget Limitations?

We have some Google Voice Kit devices, but they lack sufficient RAM and a raspberry pi with sufficient RAM costs more.

It would be nice to keep it cheap & keep the code minimal

Solutions:

¯\_(ツ)_/¯

What are your thoughts?

Convert QA Pairs CSV to Yaml Files

Objective

Takes the QA pair CSV and outputs the necessary yaml files to train the Rasa model.

Key Result

Produces yaml files needed to train a Rasa model.

Details

The program will take the path to the QA pair CSV as a cmd-line argument and generate the yaml files as a result. Testing will be done by taking the output yaml files and testing them on the Rasa Playground. Once confirmed on the playground, a model will be trained on the Raspberry Pi to be used for the final product.

Install Rasa on Raspbian (Debian) Buster

Objective

Installing a working version of Rasa on Raspbian Buster to be used on a Raspberry Pi.

Key Result

Details

Explore solutions for installing Rasa on Raspbian Buster which requires both Tensorflow and SpaCy.

Logs input to a local CSV

Objective

Stores the dictionary input to a local CSV

Key Result

[Commit / Publish / Post / Report / Deploy] something to somewhere by sometime, somehow
The result is a CSV which is a log of question and answer pairs, the time, and the confidence score the the prediction.

Details

A function which takes a dictionary as input. The input will be as follows

input_dict = {
"question": "User input string [string]",
"answer": "Predicted answer [string]",
"time": "Current time [HH:MM:SS] (PST) and date [MM:DD:YYYY]",
"score": "Confidence score of the input [float]"
}

The function will open a CSV, store the log with the columns for the row in the same order as the dict (question, answer, time, score), and then close the CSV. This CSV will be extracted at a later time.

Upload log to a database

Objective

Checks for internet connection and when connected to the internet, uploads the log to a database.

Key Result

Uploads QA logs to a database when moved into Wi-Fi to extract log data.

Details

The function will check for internet connection. If there is no connection, it will simply return. If there is connection, it will take the contents of the local CSV which logs user QA and store it into a database. It will then remove the CSV log.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.