Git Product home page Git Product logo

martinbede / second-sight Goto Github PK

View Code? Open in Web Editor NEW

This project forked from tensorflow/tensorflow

2.0 2.0 1.0 49.75 MB

An Android app designed to assist the visually impaired using TensorFlow

License: Apache License 2.0

Python 29.71% C++ 52.06% C 1.09% CMake 0.21% Protocol Buffer 0.72% Java 0.77% Jupyter Notebook 7.62% HTML 4.81% Objective-C 0.01% JavaScript 0.11% TypeScript 2.33% CSS 0.01% Shell 0.55%

second-sight's Introduction

Second Sight is an Android 6.0+ app that can read text "in the wild" (e.g. price tags, product labels, text on shirts) using TensorFlow and Google Cloud Vision. It's my Udacity Machine Learning Nanodegree capstone project.

Features

  • Detect text using the camera
  • Read detected text (requires an Internet connection)
  • Interrupt speech by touching the lower portion of the screen

Check out the Conclusion section of the Project Report for a description of the overall performance.

How to install

A prebuilt APK is available in the bin directory: download

The following instructions are from the TensorFlow Android Demo README:

If adb debugging is enabled on your Android 5.0 or later device, you may then use the following command from your workspace root to install the APK once built:

$ adb install -r -g bazel-bin/second-sight/second-sight.apk

Some older versions of adb might complain about the -g option (returning: "Error: Unknown option: -g"). In this case, if your device runs Android 6.0 or later, then make sure you update to the latest adb version before trying the install command again. If your device runs earlier versions of Android, however, you can issue the install command without the -g option.

Tips for modifying

The neural network running on the device and detecting text is trained using Jupyter Notebooks using the following Python 2.7 libraries:

  • NumPy 1.10.4
  • Python Imaging Library 1.1.7
  • MatPlotLib 1.4.3
  • TensorFlow 0.7.1
  • SciPy 0.16.0

These are the most important files of the project:

  • notebooks/Prepare data.ipynb: Downloads the datasets and prepares the data for training
  • notebooks/Create and freeze graph.ipynb: Defines, trains, and saves the classifier, which is a ConvNet that looks for text in images
  • second-sight/src/.../TensorflowImageListener.java: Preprocesses images, calls the JNI code to do inference, calls the Cloud Vision API, etc.
  • second-sight/jni/tensorflow_jni.cc: The inference happens here

The notebooks use bash commands to create/delete directories and to download and unzip archives. These might not run on some systems (e.g. Windows). In that case, you have to do those tasks manually.

Also note that training the neural network requires a computer with a relatively new NVIDIA GPU (e.g. GeForce 980TI), at least 8 GB of RAM, and a fast internet connection (the dataset used for training is larger than 10 GB).

Build instructions

  1. Make sure you have the Android SDK (API level 23, sdk-build-tools version 23.0.1), NDK r10e (see download links below), TensorFlow (for Python 2.7), and Bazel 0.1.5 installed (It's strongly recommended to use Ubuntu or OS X for building the application, as the Windows support of Bazel is experimental)
  2. Get the project from GitHub git clone https://github.com/martinbede/second-sight --recursive
  3. Modify the WORKSPACE file so that the paths to the Android SDK and NDK are correct
  4. Get a Google Cloud Vision key and create a file at second-sight/res/values/keys.xml with the following content:
<?xml version="1.0" encoding="utf-8"?>
<resources>
    <string name="CloudVisionApiKey">ENTER_KEY_HERE!!</string>
</resources>
  1. Build the project bazel build //second-sight:second-sight

To build, install, and run the app: bazel mobile-install //second-sight:second-sight --start_app

If you get build errors about protocol buffers, run git submodule update --init and build again.

NDK r10e download links

Used works

second-sight's People

Contributors

martinwicke avatar josh11b avatar keveman avatar girving avatar mrry avatar benoitsteiner avatar ebrevdo avatar jendap avatar vincentvanhoucke avatar caisq avatar dongjoon-hyun avatar milanbede avatar dsmilkov avatar petewarden avatar zheng-xq avatar yuanbyu avatar teamdandelion avatar sherrym avatar lukaszkaiser avatar panmari avatar kentonl avatar jmchen-g avatar concretevitamin avatar andrewharp avatar eerwitt avatar izeye avatar cg31 avatar chemelnucfin avatar ysuematsu avatar yaroslavvb avatar

Stargazers

Wang Pujin avatar Faizan Shaikh avatar

Watchers

James Cloos avatar  avatar

Forkers

nanlin0507

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.