Git Product home page Git Product logo

gaze-pointer-controller's Introduction

Computer Pointer Controller

This is the last Project in order to obtain the Intel Edge AI Nanodegree from Udacity, the objective is to create an application capable of moving the computer pointer using only the movement from the head and eyes. This involves many considerations:

  • We need several models working together, each one covering a needed functionality. For example, a face detection model is used to feed a face landmarks model.
  • We need to control the application logic to create an optimal flow, as the models involved could use a lot of resources.
  • Regarding the models themselves, it could be interesting to check if we could use different precisions models in order to gain more performance.

In general, the flow looks like this:

Project Set Up and Installation

Step 1. Install and configure OpenVINO

It is necessary to have installed OpenVINO, follow the instructions here depending on your Operating System.

Note: This project was created using OpenVINO version 2020.1, it should work in older and newer versions but keep this in mind if you face any issue.

Step 2. Clone this Repository

Create a new folder and clone this repository. As provided, it should have everything to run the application.

Step 3. Create and configure a Python Virtual Environment

This is recommended as a good practice. This project assumes you have Python>=3.6.9, if not, please check the related documentation.

Having Python installed, create a virtual environment running the following shell command:

python3 -m venv <your-env>

Activate it by running:

source your-env/bin/activate

Install required libraries and dependencies provided in requirements.txt:

 pip3 install -r requirements.txt

Step 4. Download OpenVINO pretrained models:

This repo already includes the models to use under "intel" folder. But if you want to download and use different ones then you should download them inside "intel" folder. For example running inside gaze-pointer-controller folder:

python3 <openvino-installation-path>/deployment_tools/tools/model_downloader/downloader.py --name face-detection-adas-0001 --precisions FP32

Will download the model face-detection-adas-0001 model under the "intel" folder.

Note: In Linux, "openvino-installation-path" usually is: /opt/intel/opnevino

For example, the working directory tree for gaze-pointer-controller I used was:

├── bin
│   └── demo.mp4
├── intel
│   ├── face-detection-adas-0001
│   │   ├── FP16
│   │   │   ├── face-detection-adas-0001.bin
│   │   │   └── face-detection-adas-0001.xml
│   │   └── FP32
│   │       ├── face-detection-adas-0001.bin
│   │       └── face-detection-adas-0001.xml
│   ├── gaze-estimation-adas-0002
│   │   ├── FP16
│   │   │   ├── gaze-estimation-adas-0002.bin
│   │   │   └── gaze-estimation-adas-0002.xml
│   │   └── FP32
│   │       ├── gaze-estimation-adas-0002.bin
│   │       └── gaze-estimation-adas-0002.xml
│   ├── head-pose-estimation-adas-0001
│   │   ├── FP16
│   │   │   ├── head-pose-estimation-adas-0001.bin
│   │   │   └── head-pose-estimation-adas-0001.xml
│   │   └── FP32
│   │       ├── head-pose-estimation-adas-0001.bin
│   │       └── head-pose-estimation-adas-0001.xml
│   └── landmarks-regression-retail-0009
│       ├── FP16
│       │   ├── landmarks-regression-retail-0009.bin
│       │   └── landmarks-regression-retail-0009.xml
│       └── FP32
│           ├── landmarks-regression-retail-0009.bin
│           └── landmarks-regression-retail-0009.xml
├── README.md
├── requirements.txt
├── resources
│   ├── FP16.png
│   ├── FP32.png
│   └── pipeline.png
├── src
│   ├── input_feeder.py
│   ├── main.py
│   ├── model.py
│   ├── mouse_controller.py
└── utils.py

Demo

A demo could be run using a video file already provided:

python3 src/main.py -i bin/demo.mp4

Or using a webcamera:

python src/main.py -i CAM

Documentation

Input Flag:

This is the only required flag, this is used to indicate the input file or to state that you want to use the webcamera:

-i or --input: Location of the input file. Type 'CAM' to use the webcamera.

Visualization Flags:

This project uses many different models, as explained at the beginning of this document. Each output could be showed on screen using appropriate command line flags

-vf or --view_face: To view a bounding box over the detected face.

-ve or --view_eyes: To view bounding boxes surrounding both eyes.

-vh or --view_headpose: To view Yaw, Pitch and Roll angles.

-vg or --view_gaze: To view estimated gaze direction.

Models Flags:

To use different models than the default ones, you could use the following flags. However, keep in mind that model output shape could change, potentially crashing the application.

-mf or --model_facedetector: To indicate the path to the face detector model .xml file.

-ml or --model_facelm: To indicate the path to the face landmarks detector model .xml file.

-mh or --model_headpose: To indicate the path to the headpose model .xml file.

-mg or --model_gaze: To indicate the path to the gaze detector model .xml file

Other Flags:

These flags control other behaviours of the code logic:

-l or --cpu_extension: If needed, this flag should be used to provide the path for an OpenVINO cpu extension.

-d or --device: To specify the hardware where the models will be running. Default is CPU.

-pt or --prob_threshold: To state the confidence the models need to have in order to consider an actual detection.

-fc or --frame_count: To state how many frames count before actually doing a detection, useful for debugging and for low resources hardware.

Benchmarks

Processor Characteristics: Intel® Core i5-3230M @2.6 GHz Ivy Bridge (3rd Generation)

Mean inference time per precision:


Models load time (in secs):

Model FP32 FP16
Face 0.3105 0.3457
Face Landmarks 0.0729 0.076
Headpose 0.1233 0.116
Gaze 0.1412 0.1586

Results

In this case, both model precisions tested had around the same performance. what is remarkably is that the face detection model is, by far, the most expensive computationally speaking, representing around 50% of all the inference time taken per frame.

In order to improve the application, it is suggested to check if there is another available face detection model that could reach similar performance, because at this point the biggest issue is the time taken for this model.

Regarding accuracy, it wasn't observed big differences between FP32 and FP16 models. This could be relevant because with FP16 models we can use hardware accelerators like VPUs without worrying about accuracy loss.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.