Git Product home page Git Product logo

lukereichold / speechtimestamper Goto Github PK

View Code? Open in Web Editor NEW
17.0 3.0 1.0 6.03 MB

Generate an accurate, timestamped transcript given an audio file and its text using Google Cloud's Speech-to-Text API via gRPC.

License: MIT License

Ruby 0.04% Shell 0.14% Swift 68.61% Makefile 0.22% Starlark 30.99%
natural-language-processing nlp audio-transcribing speech-to-text timestamp grpc gcp string-alignment needleman-wunsch google-cloud-platform

speechtimestamper's Introduction

Speech Timestamper

GitHub license Twitter

Demo iOS application to generate an accurately timestamped transcript given an audio file and its pre-supplied text.

It leverages Google Cloud Platform's Speech-to-Text API for its time offsets feature (that is, returning the absolute timestamp of each word relative to the beginning of the audio). Any mistranscriptions from the service can then be corrected using the pre-supplied text and a sequence alignment algorithm.

Why is this useful?

In cases where the raw text transcription of a piece of audio is known upfront, we can use this to obtain the correct word-level timestamps. This resulting information is useful for contexts relating to interactive language education and music lyrics.

Big idea

  1. Record some speech
  2. Provide manual transcript of what was said (a String)
  3. Generate timestamped transcription of audio from GCP's Speech-to-Text API
  4. Square up the provided transcription and the API's transcript using a sequence alignment algorithm, keeping the underlying actual word-timestamp pairs.
  5. Voilà!

Getting Started

  • Rename Secrets.template.plist to Secrets.plist and provide your Google Cloud Platform API key for the field GOOGLE_API_KEY.
  • Build and run the SpeechTimestamper target in Xcode.

gRPC

This project uses gRPC instead of REST to communicate with GCP services. While there is not yet an official Google Cloud Client Library for Swift), this project can be used as an example of how to interact with Google Cloud services (polling long-running operations, etc) using Swift with gRPC. For basic functionality of the gRPC API for Swift, see the Swift gRPC Overview.

If you'd like to generate the most up-to-date protobuf definitions, you'll need protobuf installed and then run a fresh pod update.

speechtimestamper's People

Contributors

lukereichold avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

tarsbase

speechtimestamper's Issues

request: colab demo

can you please create a google colab notebook to demonstrate this amazing project, thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.