Git Product home page Git Product logo

textra's Introduction

textra

A command-line application to extract text from images, PDFs, and audio files using Apple's Vision and Speech APIs.

A terminal window showing the text: | % textra The-Mueller-Report.pdf -o report.txt | Converting: | - Input (448 pg) The-Mueller-Report.pdf | - Output full text report.txt | | 16 of 448 [-      ] ETA: 00:05:21 (at 1.34 it/s)

Installation

Textra requires Mac OS version 13 or greater to access the latest VisionKit APIs.

The easiest way to install textra is to open a terminal window and run the following command:

curl -L https://github.com/freedmand/textra/raw/main/install.sh | bash

Alternatively, download the latest release, unzip it, and place the textra executable somewhere on your $PATH.

Usage

textra [options] FILE1 [FILE2...] [outputOptions]

Options

-h, --help: Show advanced help

-s, --silent: Suppress non-essential output

-l, --locale: Specify a locale (e.g. en-US) for text recognition

-v, --version: Show version number

Output options

-x, --outputStdout: Output everything to stdout (default)

-o, --outputText: Output everything to a single text file

-t, --outputPageText: Output each file/page to a text file

-p, --outputPositions: Output positional text for each file/page to json (experimental; results may differ from page text)

Examples

textra audio.mp3: Extract the text from "audio.mp3" and output to stdout

textra page1.png page2.png -o combined.txt: Extract the text from "page1.png" and "page2.png" and output the combined text to "combined.txt"

textra doc.pdf -o doc.txt -t doc/page-{}.txt: Extract text from "doc.pdf" and output in two formats: 1) combined text of all the pages stored in "doc.txt" and 2) positional text from each page extracted at the pattern "doc/page-{}.txt" (e.g. "doc/page-1.txt", "doc/page-2.txt", etc.)

textra image1.png -o text1.txt image2.png -o text2.txt: Extract text from "image1.png" and output at "text1.txt"; extract text from "image2.png" and output at "text2.txt"

textra image.png --outputPositions positionalText.json: Extract positional text from "image.png" and output at "positionalText.json"

Instructions

To use textra, you must provide at least one input file.

textra will then extract all the text from the inputted image/PDF/audio files. By default, textra will print the output to stdout, where it can be viewed or piped into another program.

You can use the output options above at any point to extract the specified files to disk in various formats. For instance, textra doc.png -o page.txt -p page.json will extract "doc.png" in two formats: as page text to "page.txt" and as positional text to "page.json".

You can punctuate chains of inputs with output options to finely control where multiple extracted documents will end up. For example, textra doc.png -o image.txt speech.mp3 -o audio.txt will extract "doc.png" to "image.txt" and "speech.mp3" to "audio.txt" respectively.

For output options that write to each page (-t, -p), textra allows an output path that contains curly braces {}. These braces will be substituted with page numbers in the case of a PDF file, base file names in the case of image files, or baseFileName-pageNumber in the case of multiple PDF files. Without specifying the braces, textra will append a dash followed by the page number/base file name to the specified path.

Troubleshooting

  • ERROR: Speech recognizer does not support on-device recognition:

    If you get this error, you may need dictation enabled, which you can accomplish in System Settings -> Keyboard -> Dictation -> Enable dictation.

    Flipping the dictation setting may not immediately fix the error. If textra still provides this error or if you cannot toggle the setting, try clicking the "Edit" menu item from the top menu bar when you're in an application (e.g. Terminal) and clicking "Start dictation." This may prompt you to enable "Dictation" again, and a microphone prompt may appear (which you can immediately dismiss by clicking "Done").

    Try textra again. If it does work, you may safely disable dictation at any time in the system settings. If it does not, please file an issue.

License

MIT

Contributions

This repo is in early stages but contributions are welcome. Please submit an issue or feel free to fork and contribute a pull request.

Credits

Many thanks to Brandon Roberts and Marcos Huerta for their help and encouragement with positional text extraction.

textra's People

Contributors

freedmand avatar rolandcrosby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

textra's Issues

Add support for online speech recognition (behind a flag)

Apparently on-device speech recognition is only supported on some Mac devices.

https://support.apple.com/guide/mac-help/if-dictation-on-mac-doesnt-work-as-expected-mchlc480652b/mac:
Tip: If you need to dictate text when you’re offline, use Voice Control. See Control your Mac and apps using Voice Control. On a Mac with Apple silicon, dictation requests are processed on your device in many languages—no internet connection is required. When dictating in a search box, dictated text may be sent to the search provider in order to process the search.

This issue tracks adding a feature to support speech recognition powered by Apple. Since it would require sending data to Apple's servers, this should not be enabled by default and should require a flag.

Also, the current error: ERROR: Speech recognizer does not support on-device recognition is not too descriptive. It could be revised to include more context or suggest the proposed flag.

Transcript to STDOUT?

It would be very convenient if textra could display the output to STDOUT, via some optional command line argument.

This would make it easier to say, use with python subprocess to automate running it on images and then capture the text to a string (which we could then use programmatically however one likes.)

It's not hard to work around and just use the file system for this, but it would make it a little easier.

ImageAnalyzer running VNRecognizeTextRequest in the background?

Using VisionKit, there's two main ways to get text from images.

Based on some OCR tests, I'm seeing that the outputs from these two methods are different. Initially, I thought ImageAnalyzer was running VNRequestTextRecognitionLevel.fast because it's for Live Text, but the outputs from ImageAnalyzer are sometimes better than VNRequestTextRecognitionLevel.accurate.

VNRecognizeRequest does have more options, including language correction and custom words.

Do you know what ImageAnalyzer is calling in the background? Is it essentially running VNRecognizeRequest or is it a separate model/pipeline? And this naturally begs the question, which model would be better for OCR? My initial tests show a pretty similar performance in aggregation between ImageAnalyzer and VNRequestTextRecognitionLevel.accurate, but the results per test case can sometimes be highly variable between the two.

For documentation & in case this is outside the scope of your expertise, I've asked the same question on Apple Developers forum here.

locale support?

HEllo,
I get the following error when trying to convert some file. I think it might be because either my system locale is set to Spanish (probably) or because the file that I tried to convert was in Spanish. Any hints? The system should have locale resources installed, or does this recognizer only work when system is set to English?

2023-04-03 15:20:48.276 textra[54391:4225664] Required assets are not available for Locale:es_ES
ERROR: Speech recognizer does not support on-device recognition

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.