Audio Analysis

This application is an Application Starter Kit (ASK) that is designed to get you up and running quickly with a common industry pattern, and to provide information about best practices around Watson services. The Audio Analysis application was created to highlight the combination of the Speech to Text (STT) and AlchemyLanguage services as an Audio Analysis tool. This application can serve as the basis for your own applications that follow that pattern.

Give it a try! Click the button below to fork the repository that contains the source code for this application into IBM DevOps Services, which then deploys your own copy of this application on Bluemix automatically:

You can see a version of this app that is already running by clicking here.

Note: This sample application only works on desktop computer systems, and then only in the Firefox and Chrome web browsers.

How this app works
Getting Started
Running the application locally
About the Audio Analysis pattern
User interface in this sample application
Troubleshooting

How this app works

The Audio Analysis application extracts concepts from YouTube videos.

To begin, select or specify a YouTube video. As the video streams, the Speech to Text service transcribes its audio track. That text is then piped to the [AlchemyLanguage][alchemylanguage] service for analysis. The AlchemyLanguage extracts concepts from the trascription. When AlchemyLanguage identifies a concept from the audio, it returns that concept and an associated confidence score.

Getting started

The application is written in Node.js and uses npm. Instructions for downloading and installing these are included in the following procedure.

The following instructions explain how to fork the project on GitHub and push that fork to Bluemix using the cf command-line interface (CLI) for Cloud Foundry. If you want to run the application locally, see the next section, Running the application locally:

Log into GitHub and fork the project repository. Clone your fork to a folder on your local system and change to that folder.
Create a Bluemix account. Sign up in Bluemix or use an existing account. Watson services in beta are free to use, as are GA services in the standard plan below a certain usage threshold.
If it is not already installed on your system, download and install the Cloud-foundry CLI tool.
If it is not already installed on your system, install Node.js. Installing Node.js will also install the npm command.

5. Edit the manifest.yml file in the folder that contains your fork and replace audio-analysis with a unique name for your copy of the application. The name that you specify determines the application's URL, such as application-name.mybluemix.net. The relevant portion of the manifest.yml file looks like the following:

```yml
applications:
- services:
  - speech-to-text-service
  name: application-name
  command: npm start
  path: .
  memory: 512M
```

Connect to Bluemix by running the following commands in a terminal window:

cf api https://api.ng.bluemix.net
cf login -u <your-Bluemix-ID> -p <your-Bluemix-password>

Create an instance of the Speech to Text in Bluemix by running the following command:

cf create-service speech_to_text standard speech-to-text-service

Create the AlchemyLanguage service:

cf create-service alchemy_language standard my-alchemylanguage

Push the updated application live by running the following command:

cf push

See the User interface in this sample application section for information about modifying the existing user interface to support other video sources.

Running the application locally

First, make sure that you followed steps 1 through 9 in the previous section and that you are still logged in to Bluemix. Next:

Create a .env file in the root directory of the project with the following content:

ALCHEMY_LANGUAGE_API_KEY=
SPEECH_TO_TEXT_USERNAME=
SPEECH_TO_TEXT_PASSWORD=

Copy the username, password credentials from your speech-to-text-service and the api key from my-alchemylanguage services in Bluemix to the previous file. To see the service credentials, run the following command, replacing <application-name> with the name of the application that you specified in your manifest.yml file:

cf env <application-name>

Your output should contain a section like the following:

System-Provided:
{
"VCAP_SERVICES": {
  "speech_to_text": [{
    "credentials": {
      "url": "<url>",
      "password": "<password>",
      "username": "<username>"
    },
    "label": "speech-to-text",
    "name": "speech-to-text-service",
    "plan": "standard"
 }]
}
}

Install any dependencies that a local version of your application requires:

npm install

Start the application by running:

$ node app.js

Open http://localhost:3000 to see the running application.

About the Audio Analysis pattern

First, make sure you read the Reference Information to understand the services that are involved in this pattern. The following image shows a flow diagram for spoken language analysis using the Speech-To-Text and AlchemyLanguage services:

Using the Speech To Text and the AlchemyLanguage services

When a quality audio signal contains terms found in Wikipedia (the current source of concepts in AlchemyLanguage), the combination of Speech To Text and AlchemyLanguage can be used to analyze the audio source to build summaries, indices, and to provide recommendations for additional related content. Though the Speech-To-Text service supports several languages, the AlchemyLanguage service currently only supports English.

The Audio Analysis app uses the node.js Speech-To-Text JavaScript SDK, which is a client-side library for audio transcriptions from the Speech To Text service. It also uses the annotate_text and conceptual_search APIs from AlchemyLanguage to extract concepts and provide content recommendations. In order to change the source of content for recommendations, the user should ingest the content of interest into the AlchemyLanguage service using the API set under the prefix /corpus.

This app generates a new recommendation when it detects an existing concept from the streaming audio transcript. Typically, the app will be reasonably confident about the top 3 concepts after one to two minutes. After the app is confident about the top 3 concepts, the recommendations will stabilize. One can make small modifications to the app to produce different behaviors.

Additional experiences can be created using these services. For example, the transcripts of multiple audio files can be added to a corpus in AlchemyLanguage (using the API endpoint /corpus) to be used as the content source for recommendations. This in turn can be used to provide a "fuzzy" search function that is able to locate reference to a topic of interest. For example, a search for “algae” will locate those portions of a video where not only “algae” is mentioned, but also “coral reefs” and “biodiversity”.

When to use this pattern

You need to analyze or index content contained within speech.
You want to make content recommendations based on speech.

Best practices

The quality of the audio source determines the quality of the transcript, which affects the quality of extracted concepts and recommendations.
The quality and confidence of the extracted concepts increases with the amount of transcribed text.

Reference information

The following links provide more information about the AlchemyLanguage and Speech to Text services, including tutorials on using those services:

AlchemyLanguage

API documentation: Get an in-depth understanding of the AlchemyLanguage service
API explorer: Try out the REST API

Speech To Text

API documentation: Get an in-depth understanding of the Speech To Text service
API reference: SDK code examples and reference
API Explorer: Try out the API

User interface in this sample application

The user interface that this sample application provides is intended as an example, and is not proposed as the user interface for your applications. However, if you want to use this user interface, you will want to modify the following files:

src/views/index.ejs - Lists the YouTube videos and footer values that are shown on the demo application's landing page. These items are defined using string values that are set in the CSS for the application. (See the next bullet in this list.) By default, the items in the footer are placeholders for IBM-specific values because they are used in the running instance of this sample application. For example, the Terms and Conditions do not apply to your use of the source code, to which the Apache license applies.
src/views/videoplay.js - Maps YouTube video URLs to API calls and initiates streaming. You will want to expand or modify this if you want to use another video source or player.
src/index.sj - Supports multiple types of YouTube URLs. You will want to expand or modify this if you want to use another video source or player.

Troubleshooting

When troubleshooting your Bluemix app, the most useful source of information is the execution logs. To see them, run:

$ cf logs <application-name> --recent

Open Source @ IBM

Find more open source projects on the IBM GitHub Page

License

This sample code is licensed under the Apache 2.0 license. Full license text is available in LICENSE.

Contributing

See CONTRIBUTING.

nagyistge / watson-audio-analysis Goto Github PK

watson-audio-analysis's Introduction

Audio Analysis

Table of Contents