Git Product home page Git Product logo

audio_search_on_podcasts's Introduction

A Python Flask audio search application

Often while listening to a Podcast or probably listening to a course video/audio files, we might want to straight jump to the topic of our interest rather going through the entire recording again and again. But finding the topics and keywords in the entire recording could be challenging.

In this code pattern, we will create an application with which you can search within the video/audio files. Not only search but it will highlight the part where searched term/context is occuring on the video/audio. This code pattern will perform Natural language query search in audio files and get back with the results with the proper time frame where your search is being talked about.

In this example, we will use a Watson Machine Learning Introduction Video to illustrate the process.

When the reader has completed this code pattern, they will understand how to:

  • Prepare audio/video data and perform chunking to break it into smaller chunks to work with.
  • Work with the Watson Speech to Text service through API calls to convert audio/video to text.
  • Work with the Watson Discovery service through API calls to perform search on text chunks.
  • Create a python flask Application and deploy on IBM Cloud.

architecture

Flow

  1. The user uploads the video/audio file on the UI.
  2. The Video/Audio is processed with python libraries moviepy and pydub and perform chunking on them to convert it into smaller chunks to work with.
  3. The user interacts with the Watson Speech to Text service via the provided application UI. The Audio chunks are converted into text chunks with Watson Speech to Text.
  4. The text chunks are uploaded on Watson Discovery by calling Discovery APIs with python SDKs.
  5. The user hit a search query using Discovery.
  6. The results are shown on the UI .

Included components

  • IBM Watson Speech to Text: easily convert audio and voice into written text for quick understanding of content.
  • IBM Watson Discovery: IBM Watson Discovery, you can ingest, normalize, enrich, and search your unstructured data (JSON, HTML, PDF, Word, and more) with speed and accuracy.

Featured technologies

  • Python Flask: Flask is a lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex application.
  • IBM Watson Speech to Text: easily convert audio and voice into written text for quick understanding of content.
  • IBM Watson Discovery: IBM Watson Discovery, you can ingest, normalize, enrich, and search your unstructured data (JSON, HTML, PDF, Word, and more) with speed and accuracy.

Steps

  1. Clone the repo
  2. Create Watson Speech to text
  3. Create Watson Discovery
  4. Run The Application Locally

1. Clone the repo

Clone the audio_search_on_podcasts repo locally. In a terminal, run:

git clone https://github.com/IBM/audio_search_on_podcasts/

2. Create Watson Speech To Text

Create the service:

  • Click on the Watson Speech To Text. It will take to the Catalog on IBM Cloud. Just hit the create button.

Note: In order to perform customization, you will need to select the Standard paid plan. But for this Code Pattern, you can work with the LITE Plan.

From your Watson Speech to Text service instance, select Manage tab:

  • Copy the credentials to authenticate to your service instance:
  • On the Manage page, click Show Credentials to view your credentials.
  • Copy the API Key and URL values as they will be needed in future steps.

architecture

If no credentials exist, select the New Credential button to create a new set of credentials. Then save API Key and URL values.

Create Service Credential

3. Create Watson Discovery

Create the service:

  • Click on the Watson Discovery. It will take to the Catalog on IBM Cloud. Just hit the create button.

Note: For this Code Pattern, you can work with the LITE Plan. service-credentials-discovery

architecture

If no credentials exist, select the New Credential button to create a new set of credentials. Then save API Key and URL values.

4. Run The Application Locally

4.1. Update global variables in app.py

In the repo parent folder, open app.py file.

Global Variables

  • Enter Discovery API Key and Discovery URL saved from earlier steps in placeholder in the flask server code as shown above. You can find these in line numbers 29 and 30.
  • Similarly, enter Speech to Text API Key and Speech to Text URL saved from earlier steps in placeholder in the flask server code as shown above. You can find these in line numbers 31 and 32.
  • Enter the desired name for your Discovery Environment, or use your existing environment name. Update the variable envname. You can find this in line number 34.
  • Enter the desired name for the Collection that will be created for this project. Update the variable collection_name. You can find this in line number 35.

Note: When a user creates a Watson Discovery instance, he or she would have to create an Environment to create your collections (the documents that form the basis for Discover query). For more details visit - https://cloud.ibm.com/docs/discovery?topic=discovery-getting-started

4.2. Install requirement.txt

  • Open the Terminal on the cloned repo folder.
  • Run the command
pip install -r requirements.txt

4.3. Run the flask app

  • Now run the below command
python app.py

Sample output

Visit http://localhost:8080 on your browser.

  • We’ll be using the video file from video/watson_studio_tutorial_part1.mp4 from the cloned folder. Click on the Upload button and wait for 8-10 minutes for the video to complete processing and get results from the Watson Services

    Upload Video

  • Once the processing is done, you will receive an Intro Page, where user can enter a query as shown.

  • Now play the video and then navigate to the query box, and enter your desired search keyword. For our video we enter the following search key- machine learning.

  • Wait for 20-30 seconds for a response and you will receive the searched key word on the Table of Contents section. On clicking on the keyword, in this case machine learning, the video will begin from the most relevant occurence of the desired topic.

    Query

  • Similarly, you can repeat this process for other keywords, we have entered- supervised learning and deep learning. On the right side of the screen you can see all the previously searched keys as a table of contents.

    TOC

Deploy on IBM Cloud

Instructions for deploying the web application on Cloud Foundry can be found here.

Learn more

  • Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other AI Code Patterns
  • AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
  • With Watson: Want to take your Watson app to the next level? Looking to utilize Watson Brand assets? Join the With Watson program to leverage exclusive brand, marketing, and tech resources to amplify and accelerate your Watson embedded commercial solution.

License

This code pattern is licensed under the Apache License, Version 2. Separate third-party code objects invoked within this code pattern are licensed by their respective providers pursuant to their own separate licenses. Contributions are subject to the Developer Certificate of Origin, Version 1.1 and the Apache License, Version 2.

Apache License FAQ

audio_search_on_podcasts's People

Contributors

smruthi33 avatar rahulreddyravipally avatar neha-setia avatar stevemar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.