Git Product home page Git Product logo

gauge_voice_transcription's Introduction

Gauge Voice Transcription

Calculate various quantitative gauges of the accuracy of voice transcription.


Slides of Hack and Tell presentation

These slides may not display correctly on GitHub.com; if not, clone this repository to a regular file system and open the html file from there.


Note that this software is designed for use only with Python 2, at or above version 2.6. Running in other environments will fail by design, since the NLTK dependency requires Python 2.

To run

The required arguments of the main function are the original string and a string containing its transcription. Output is a rounded-off value showing the percentage of the original string found intact, within some tolerance, in the transcription:

In [1]: import gauge_voice_transcription_py2 as G
In [2]: G.main('I would like a banana, too.', 'i would likes, the bananas chew')
By character:       0.8
By word:            0.3
By normalized word: 0.7

Guide to the output:

  • By character compares the strings character by character, with punctuation stripped and upper- and lower-case differences eliminated. Comparing the strings by character not a good principle when well-formed whole English words are expected to be in the second string, which is usually the case with voice transcription output.
  • By word compares arrays of whole words, with punctuation stripped and upper- and lower-case differences eliminated.
  • By normalized word is just like "by word", but with additional normalization of most nouns and verbs to their "lemma" (morphologically neutral dictionary-headword) forms, and with most contractions expanded (can't => cannot, won't => will not, etc.).

"By normalized word" is the most useful option if one wants to ignore minor differences in morphology, such as likes for like and bananas for banana.

By default, results are rounded to a single decimal place. For other precision, use a third argument places:

In [4]: G.main('I would like a banana, too.', 'i would likes, the bananas chew', places=2)
By character:       0.76
By word:            0.33
By normalized word: 0.67

The unrounded results are returned by the percent_matching function:

In [5]: G.percent_matching('I would like a banana, too.', 'i would likes, the bananas chew')
Out[6]: (0.7636363636363637, 0.3333333333333333, 0.6666666666666666)

Testing

There is a rudimentary test suite in test, intended for use with pytest. To run:

$ pip install pytest
$ py.test test

Algorithm

The algorithm used was described as "the gestalt approach" to pattern-matching in the 1988 article introducing it, in an assembly-language implementation. Dating to 1983, it was intended for use in educational software, for automating the checking of student responses on exams; it was also used in early spell-checkers. Known as the Ratcliff/Obershelp pattern-matching algorithm, it is implemented as difflib.SequenceMatcher in the Python standard library, with cubic worst-case time complexity.

The original algorithm is designed for comparing two strings of characters. Here, I am interested in comparison not of strings but of arrays, and want to measure not single-character differences but the matching of whole, well-formed words in standard English (the usual output of voice-transcription software). But difflib.SequenceMatcher is able to operate on arrays of indivisible elements.


Still to be done

  • Times of day (08:30, e.g.) occasionally appear in Google Voice transcriptions and need to be handled. This will require adding the numbers 11 to 59 to digit2word.

[end]

gauge_voice_transcription's People

Contributors

brannerchinese avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.