udacity / cvnd---gesture-recognition Goto Github PK

License: Other

Python 100.00%

cvnd---gesture-recognition's Introduction

Hand Gesture Recognition Tutorial

These scripts are modified from TwentyBN's GulpIO-benchmarks repository, written by Raghav Goyal and the TwentyBN team. These scripts serve as a starting point to create your own gesture recognition system using a 3D CNN.

Requirements

Python 3.x
PyTorch 0.4.0

Instructions

1. Download The Jester Dataset

In order to train the gesture recognition system, we will use TwentyBN's Jester Dataset. This dataset consists of 148,092 labeled videos, depicting 25 different classes of human hand gestures. This dataset is made available under the Creative Commons Attribution 4.0 International license CC BY-NC-ND 4.0. It can be used for academic research free of charge. In order to get access to the dataset you will need to register.

The Jester dataset is provided as one large TGZ archive and has a total download size of 22.8 GB, split into 23 parts of about 1 GB each. After downloading all the parts, you can extract the videos using:

cat 20bn-jester-v1-?? | tar zx

The CSV files containing the labels for the videos in the Jester dataset have already been downloaded for you and can be found in the 20bn-jester-v1/annotations folder.

More information, including alternative ways to download the dataset, is available in the Jester Dataset website.

2. Modify The Config File

In the configs folder you will find two config files:

config.json
config_quick_testing.json

The config.json file should be used for training the network and the config_quick_testing.json file should be used for quickly testing models. These files need to be modified to indicate the location of both the CSV files and the videos from the Jester dataset. The default location is ./20bn-jester-v1/annotations/ for the CSV files and ./20bn-jester-v1/videos/ for the videos.

These config files also contain the parameters to be used during training and quick testing, such as the number of epochs, batch size, learning rate, etc... Feel free to modify these parameters as you see fit.

Please note that the default number of epochs used for training is set to -1 in the config.json file, which corresponds to 999999 epochs.

3. Create Your Own Model

The model.py module already has a simple 3D CNN model that you can use to train your gesture recognition system. You are encouraged to modify model.py to create your own 3D CNN architecture.

4. Modify the CSV Files For Quick Testing (Optional)

In the 20bn-jester-v1/annotations folder you will find the following CSV files:

jester-v1-labels-quick-testing.csv
jester-v1-train-quick-testing.csv
jester-v1-validation-quick-testing.csv

These files are used when quickly testing models and can be modified as you see fit. By default, the jester-v1-labels-quick-testing.csv file contains labels for only 4 classes of hand gestures and 1 label for "Doing other things"; the jester-v1-train-quick-testing.csv file contains the video ID and the corresponding labels of only 8 videos for training; and the jester-v1-validation-quick-testing.csv file contains the video ID and the corresponding labels for only 4 videos for validation.

Feel free to add more classes of hand gestures or more videos to the training and validation sets. To add more classes of hand gestures, simply copy and paste from the jester-v1-labels.csv file that contains all the 25 different classes of hand gestures. Similarly, to add more videos to the training and validation sets, simply copy and paste from the jester-v1-train.csv and jester-v1-validation.csv files that contain all the video IDs and corresponding labels from the Jester dataset.

NOTE: In this folder you will also find the CSV files used for training: jester-v1-labels.csv, jester-v1-train.csv, and jester-v1-validation.csv. These CSV files should NOT be modified.

CPU/GPU Option

You can choose whether you want to train the network using only a CPU or a GPU. Due to the very large size of the Jester dataset it is strongly recommended that you only perform the training using a GPU. The CPU mode is favorable when you just want to quickly test models.

To specify that you want to use the CPU for your computation, use the --use_gpu=False flag as described below.

Procedure

Testing

It is recommended that you quickly test your models before you train them on the full Jester dataset. When quickly testing models we suggest you use the config_quick_testing.json file and the CPU. To do this, use the following command:

python train.py --config configs/config_quick_testing.json --use_gpu=False

Training

When training a model you should use the config.json file and a GPU (strongly recommended). To train your model using a GPU use the following command:

python train.py --config configs/config.json -g 0

cvnd---gesture-recognition's People

Contributors

Stargazers

Watchers

cvnd---gesture-recognition's Issues

20bn Jester Dataset

Hi, I have downloaded the 20bn jester dataset from their website, but I don't know how to extract it. Where or how should I insert the command "cat 20bn-jester-v1-?? | tar zx"? Sorry, I do not know whether this is the right channel for me to ask but I hope you can help me with this. Was figuring out how to solve this for days as I am very unfamiliar with all of these. Thank you.

Dependency version

Hi,

What were the versions of torchvision and pillow used here?

Plotting graphs

Once I've trained my model, it creates a new folder called 'plots' which is supposed to store plots for accuracy, loss, and learning rate. I'm having a problem where my accuracy graph is created but there aren't any plots on the graph.

Is anyone else having this same problem? And does anyone know the reason behind it?

when run train.py, got "IndexError: list index out of range"

Hi, everyone, when I run train.py, I got a error as follow:

(base) root@bf5f0cf0f3d5:/app/uda-code/CVND---Gesture-Recognition-master/CVND---Gesture-Recognition-master# python train.py --config configs/config_quick_testing.json --use_gpu=False
=> Output folder for this run -- jester_conv_4_classes

Using 8 processes for data loader.
Training is getting started...
Training takes 1 epochs.
Current LR : 0.001
Traceback (most recent call last):
File "train.py", line 328, in
main()
File "train.py", line 185, in main
val_loss, val_top1, val_top5 = validate(val_loader, model, criterion)
File "train.py", line 257, in validate
for i, (input, target) in enumerate(val_loader):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 286, in next
return self._process_next_batch(batch)
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 307, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 57, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/app/uda-code/CVND---Gesture-Recognition-master/CVND---Gesture-Recognition-master/data_loader.py", line 38, in getitem
img_paths = self.get_frame_names(item.path)
File "/app/uda-code/CVND---Gesture-Recognition-master/CVND---Gesture-Recognition-master/data_loader.py", line 73, in get_frame_names
frame_names += [frame_names[-1]] * (num_frames_necessary - num_frames)
IndexError: list index out of range

I don't know what's wrong, I did not change any code. I'm very appreciated if anyone can give me some advice.

Segmentation Fault 11

Hi @eleow,

I am getting the following error, the dataset has been correctly extracted. The only thing I changed was the number of classes in the config file to match len(train_data.classes)

python train.py --config ./config_quick_testing.json --use_gpu=False

=> Output folder for this run -- jester_conv_4_classes
=> no checkpoint found at 'trainings/jpeg_model/jester_conv_4_classes/checkpoint.pth.tar'
 > Using 8 processes for data loader.
27
len(train_data.classes)= 27 num classes= 27

{'Swiping Left': 0, 0: 'Swiping Left', 'Swiping Right': 1, 1: 'Swiping Right', 'Swiping Down': 2, 2: 'Swiping Down', 'Swiping Up': 3, 3: 'Swiping Up', 'Pushing Hand Away': 4, 4: 'Pushing Hand Away', 'Pulling Hand In': 5, 5: 'Pulling Hand In', 'Sliding Two Fingers Left': 6, 6: 'Sliding Two Fingers Left', 'Sliding Two Fingers Right': 7, 7: 'Sliding Two Fingers Right', 'Sliding Two Fingers Down': 8, 8: 'Sliding Two Fingers Down', 'Sliding Two Fingers Up': 9, 9: 'Sliding Two Fingers Up', 'Pushing Two Fingers Away': 10, 10: 'Pushing Two Fingers Away', 'Pulling Two Fingers In': 11, 11: 'Pulling Two Fingers In', 'Rolling Hand Forward': 12, 12: 'Rolling Hand Forward', 'Rolling Hand Backward': 13, 13: 'Rolling Hand Backward', 'Turning Hand Clockwise': 14, 14: 'Turning Hand Clockwise', 'Turning Hand Counterclockwise': 15, 15: 'Turning Hand Counterclockwise', 'Zooming In With Full Hand': 16, 16: 'Zooming In With Full Hand', 'Zooming Out With Full Hand': 17, 17: 'Zooming Out With Full Hand', 'Zooming In With Two Fingers': 18, 18: 'Zooming In With Two Fingers', 'Zooming Out With Two Fingers': 19, 19: 'Zooming Out With Two Fingers', 'Thumb Up': 20, 20: 'Thumb Up', 'Thumb Down': 21, 21: 'Thumb Down', 'Shaking Hand': 22, 22: 'Shaking Hand', 'Stop Sign': 23, 23: 'Stop Sign', 'Drumming Fingers': 24, 24: 'Drumming Fingers', 'No gesture': 25, 25: 'No gesture', 'Doing other things': 26, 26: 'Doing other things'}

 > Training is getting started...
 > Training takes 1 epochs.
 > Current LR : 0.001
Segmentation fault: 11

Thanks!

Related paper

Hi,
Can you please provide the related paper to this implementation?

Great Job!!!

How to predict using webcam feed?

Hello, I am learning computer vision from Udacity and trying to build gesture recognition system it would be awesome anyone can help me understand how can I predict the result from a single video file.

Thanks in advance

RuntimeError: expected a non-empty list of Tensors

I am trying to explore the field of computer vision. Came across this repository. When I try to run the code using the procedure given in the readme, I get this error.
RuntimeError: expected a non-empty list of Tensors

Any help to resolve this issue would great !
Thanks

Jester Dataset extraction

Except file 00, all other files return errors on mac and windows.
Mac: Unrecognized archive format
Windows: Not in gzip format

How do I solve this?

prediction

Hi , Can you please help me to predict the sample video from your trained file ? I would really appreciate your efforts if you can provide me predict python file.

Changing model for prediction

Hi there, I'm having another problem with regards to when I try to change the model to be used in prediction. It gives me this error:

usage: webcam.py [-h] [-e EXECUTE] [-d DEBUG] [-u USE_GPU] [-g GPUS]
[-v VIDEO] [-vb VERBOSE] [-cp CHECKPOINT] [-m MAPPING]

optional arguments:
-h, --help show this help message and exit
-e EXECUTE, --execute EXECUTE
Bool indicating whether to map output to
keyboard/mouse commands
-d DEBUG, --debug DEBUG
In debug mode, show webcam input
-u USE_GPU, --use_gpu USE_GPU
Bool indicating whether to use GPU. False - CPU, True
- GPU
-g GPUS, --gpus GPUS GPU ids to use
-v VIDEO, --video VIDEO
Path to video file if using an offline file
-vb VERBOSE, --verbose VERBOSE
Verbosity mode. 0- Silent. 1- Print info messages. 2-
Print info and debug messages
-cp CHECKPOINT, --checkpoint CHECKPOINT
Location of model checkpoint file
-m MAPPING, --mapping MAPPING
Location of mapping file for gestures to commands
Using GPU for inference
Traceback (most recent call last):
File "webcam.py", line 112, in
model.load_state_dict(checkpoint['state_dict'])
File "C:\Users\User\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 847, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ConvColumn:
Missing key(s) in state_dict: "conv_layer1.0.weight", "conv_layer1.0.bias", "conv_layer1.1.weight", "conv_layer1.1.bias", "conv_layer1.1.running_mean", "conv_layer1.1.running_var", "conv_layer2.0.weight", "conv_layer2.0.bias", "conv_layer2.1.weight", "conv_layer2.1.bias", "conv_layer2.1.running_mean", "conv_layer2.1.running_var", "conv_layer3.0.weight", "conv_layer3.0.bias", "conv_layer3.1.weight", "conv_layer3.1.bias", "conv_layer3.1.running_mean", "conv_layer3.1.running_var", "conv_layer4.0.weight", "conv_layer4.0.bias", "conv_layer4.1.weight", "conv_layer4.1.bias", "conv_layer4.1.running_mean", "conv_layer4.1.running_var", "fc5.weight", "fc5.bias", "fc6.weight", "fc6.bias".
Unexpected key(s) in state_dict: "yer1.0.weight", "yer1.0.bias", "yer1.1.weight", "yer1.1.bias", "yer1.1.running_mean", "yer1.1.running_var", "yer1.1.num_batches_tracked", "yer2.0.weight", "yer2.0.bias", "yer2.1.weight", "yer2.1.bias", "yer2.1.running_mean", "yer2.1.running_var", "yer2.1.num_batches_tracked", "yer3.0.weight", "yer3.0.bias", "yer3.1.weight", "yer3.1.bias", "yer3.1.running_mean", "yer3.1.running_var", "yer3.1.num_batches_tracked", "yer4.0.weight", "yer4.0.bias", "yer4.1.weight", "yer4.1.bias", "yer4.1.running_mean", "yer4.1.running_var", "yer4.1.num_batches_tracked", "ght", "s".

Does anyone know what the problem here could be?

cat 20bn-jester-v1-?? | tar zx

Please assist.