Git Product home page Git Product logo

ehpi_action_recognition's Introduction

We've released the code for PedRecNet and EHPI3D which provides:

  • Human BB Detection (via YoloV4).
  • Human Tracking
  • 2D Human Pose Estimation
  • 3D Human Pose Estimation
  • Human Body Orientation (currently only Phi) Estimation
  • Human Head Orientation (currently only Phi) Estimation
  • "Pedestrian recognizes the camera" Estimation
  • Human Action Recognition (via EHPI3D)

You can find the new repository here: https://github.com/noboevbo/PedRec

This repository contains the code for our real-time pose based action recognition pipeline, which was introduced in our paper Simple yet efficient real-time pose-based action recognition (submitted to ITSC 2019, as of 2019-04-23). There is also paper result reconstruction code for another publication (submitted to Transactions on ITS SE ITSC 2018, as of 2019-04-23), XYZ, based on an alternative approach using LSTM.

The approach is based on encoding human poses over a period of time in an image-like data format (Encoded Human Pose Images) that can then be classified using standard CNNs. The entire pipeline is lightweight and runs 20-60 FPS, depending on settings. It was tested on an XMG Ultra 17 Laptop with an i7-8700 Desktop CPU, NVIDIA GTX1080 GPU, 32GB RAM using KDE Neon (Ubuntu 18.04) and CUDA 10.0 with CUDNN 7.5.

We currently provide example code and a pre-trained model for our use-case action recognition which required us to differentiate between the actions idle, walk and wave. We provide training code etc. which can be used to train your own models with different / more actions. To generate a EHPI see the feature_vec_producer for an EHPI vector / the action recognition network. I plan to provide source code for our database powered dataset management tool which contains importers and exporters for different datasets / formats, but I need a little cleanup time for this.

EHPI Example

Installation

Prerequisites

  • Python 3.6+
  • CUDA (tested with 9.0 and 10.0)
  • CUDNN (tested with 7.5)
  • PyTorch (tests with 1.0)
  • OpenCV with Python bindings

A basic setup guide for Ubuntu 18.04 is available at: https://dennisnotes.com/note/20180528-ubuntu-18.04-machine-learning-setup/. I set up my system like this, with the difference that I now use CUDA 10.0 and CUDNN 7.5, the blogpost will be updated sometime.

Note: The code runs on Windows but with decreased performance, see Known Bugs.

Setup

I use two of my libraries in this code, nobos_commons and nobos_torch_lib. These and their dependencies have to be installed first. In the following code example I assume a Python installation with virtualenvwrapper, if this is not used the code must be adapted accordingly. A new virtual environment is created in the code, then PyTorch (with CUDA 10.0) is installed and then the required repositories cloned, dependencies installed and finally the required model weights loaded from our web server. The weights for YoloV3 and 2D Human Pose Recognition are originally from https://github.com/ayooshkathuria/pytorch-yolo-v3 and https://github.com/Microsoft/human-pose-estimation.pytorch. We have the weights on our server to ensure availability and version.

git clone https://github.com/noboevbo/nobos_commons.git
git clone https://github.com/noboevbo/nobos_torch_lib.git
git clone https://github.com/noboevbo/ehpi_action_recognition.git
mkvirtualenv ehpi_action_recognition -p python3 --system-site-packages
workon ehpi_action_recognition
pip install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl
pip install torchvision
pip install -r nobos_commons/requirements.txt 
pip install -r nobos_torch_lib/requirements.txt
pip install -r ehpi_action_recognition/requirements.txt  
pip install --upgrade nobos_commons
pip install --upgrade nobos_torch_lib
cd ehpi_action_recognition
sh get_models.sh

An example showing the whole pipeline on the webcam can be executed as follows:

export PYTHONPATH="~/path/to/ehpi_action_recognition:$PYTHONPATH"
python ehpi_action_recognition/run_ehpi.py

I haven't adapted the whole thing to the command line yet, changes can be made in the code. Examples for training and evaluation can be found in the files "train_ehpi.py" and "evaluate_ehpi.py".

Configuration Options

There are some configuration options available in run_ehpi.py:

  • image_size = ImageSize(width=640, height=360): The image size to be used. Higher resolutions usually help Yolo to detect objects.
  • camera_number = 0: The webcam id
  • fps = 30: FPS which should be used for the input source (webcam or image folder)
  • buffer_size = 20: The size of the action buffer, in this project not really used, just the detected humans from frame n-1.
  • action_names = [Action.IDLE.name, Action.WALK.name, Action.WAVE.name]: The corresponding names to the action class vector outputed by the action recognition network. Need to be updated when you train your own models with different action classes.
  • use_action_recognition = True: Turns the action recognition on / off
  • use_quick_n_dirty = False: If set to true it deactivates the object recognition completly after a human skeleton has been found. Continues to track this skeleton but won't recognize new humans. Improves the performance by a huge margin.

Known Bugs

  • The code runs on Windows, but there is somewhere a bug, so the whole thing runs on our system with only 10-30% of the FPS on Linux (Ubuntu 18.04).
  • When use_quick_n_dirty is set to zero there's sometimes a merge bug where a person gets two skeletons assigned.

Reconstruct paper results

This repository contains code for our (submitted, as of 23.04.2019) publication on ITSC 2019 and ITS Journal Special Issue ITSC 2018. As the EHPI publication is not yet published and citable, we have used an LSTM approach for action recognition for the ITS Journal publication, which is based on the normalized EHPI inputs. We want to ensure that the results can be reproduced from our papers. Therefore, we provide our training and evaluation code in this repository. The results in our papers are reported as mean values from five training sessions with different seeds. As seeds we use 0, 104, 123, 142 and 200. We use fixed values so that the results are 100% reproducible, seeds 142 and 200 are randomly selected, 0 and 123 are seeds often used in other work and 104 is our office room number.

IEEE Intelligent Transportation Systems Conference (ITSC 2019)

Here is an example of the standard setup that should allow our training and evaluation code to be used directly:

mkdir ./ehpi_action_recognition/data
mkdir ./ehpi_action_recognition/data/datasets
mkdir ./ehpi_action_recognition/data/models

cd ./ehpi_action_recognition/data/datasets

wget https://mkiserv114.reutlingen-university.de/pub/files/2019_04_ehpi/itsc_2019_datasets.tar.gz
tar -xvf itsc_2019_datasets.tar.gz

cd ../models

wget https://mkiserv114.reutlingen-university.de/pub/files/2019_04_ehpi/itsc_2019_models.tar.gz
tar -xvf itsc_2019_models.tar.gz

Here is the direct link to the training code for the JHMDB dataset: JHMDB Training Code
And here to the evaluation code: JHMDB Evaluation Code

Here is the direct link to the training code for the Use Case dataset: Use Case Training Code
And here to the evaluation code: Use Case Evaluation Code

IEEE Transactions on Intelligent Transportation Systems - Special Issue 21st IEEE Intelligent Transportation Systems Conference (ITSC 2018)

Here is an example of the standard setup that should allow our training and evaluation code to be used directly:

mkdir ./ehpi_action_recognition/data
mkdir ./ehpi_action_recognition/data/datasets
mkdir ./ehpi_action_recognition/data/models

cd ./ehpi_action_recognition/data/datasets

wget https://mkiserv114.reutlingen-university.de/pub/files/2019_04_ehpi/its_2019_datasets.tar.gz
tar -xvf its_2019_datasets.tar.gz

cd ../models

wget https://mkiserv114.reutlingen-university.de/pub/files/2019_04_ehpi/its_2019_lstm_models.tar.gz
tar -xvf its_2019_lstm_models.tar.gz

Here is the direct link to the training code for the both datasets (ActionSim and Office): ITS Training Code
And here to the evaluation code: ITS Evaluation Code

Citation

Please cite the following papers if this code is helpful in your research. Currently the publications to this repository are submitted, but not yet accepted or published. I will update the entries as soon as I have feedback about the submissions. A preprint for the ITSC 2019 publication is available here on arxiv.org.

Edit 2019-06-25: The EHPI ITSC 2019 publication is accepted and will be presented at ITSC 2019 (Oct 27-30).

D. Ludl, T. Gulde, and C. Curio, “Simple yet efficient real-time pose-based action recognition,” in 22nd Int. Conf. on Intelligent Transportation Systems (ITSC), 2019, pp. 581–588, doi: 10.1109/ITSC.2019.8917128.

Open Source Acknowledgments

I used parts of the following open source projects in my code:

Thank you for making this code available!

ehpi_action_recognition's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ehpi_action_recognition's Issues

annotated new human action

I would like to ask you how to mark your motion recognition and what tools or codes are used
I don’t know how to generate training files(.csv)

Real-time viability for multiple number of persons?

Since the Top-Down approach is being used for human-pose estimation, the computational power required will increase as the number of people increase, will this model be still good enough for real-time applications?

PS: I Will try this out too, but just asking if someone here has tried it out.

EHPI generation

Hi,

Very interesting paper.
I was looking at your code to generate the EHPI images and I have some questions.

for human_id, action_vecs in humans_for_action_rec.items():
			ehpi_img = np.zeros((32, 15, 3), dtype=np.float32)
			for frame_num, action_vec in enumerate(action_vecs):
				if action_vec is None:
					continue
				ehpi_img[frame_num] = action_vec
			ehpi_img = np.transpose(ehpi_img, (2, 0, 1))
			# Set Blue Channel to zero
			ehpi_img[2, :, :] = 0
			# Normalize EHPI
			tmp_dict = {'x': ehpi_img}
			tmp_dict['x'] = self.remove(tmp_dict)['x']
			ehpi_img = self.normalize(tmp_dict)['x']
			net_input = np.zeros((1, 3, 32, 15), dtype=np.float32)
			net_input[0] = ehpi_img

			input_seq = Variable(torch.tensor(net_input, dtype=torch.float)).cuda()
			tag_scores = self.model(input_seq).data.cpu().numpy()[0]
			outputs[human_id] = tag_scores
		return outputs

How would I change the code to encode x, y, an z information and normalize it?
I'm having some issues in understanding how did you normalize each coordinate.
Could you provide a script to generate and visualize the EHPI images?

Thanks

When trying to run run_ehpi.py this error pop up

Traceback (most recent call last):
File "ehpi_action_recognition/run_ehpi.py", line 86, in
pose_model = pose_model.cuda()
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 223, in _apply
param_applied = fn(param)
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in
return self._apply(lambda t: t.cuda(device))
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/cuda/init.py", line 196, in _lazy_init
_check_driver()
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/cuda/init.py", line 94, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Dead download links of Pretrained weight and Dataset

Hello, @noboevbo,
Thanks for sharing your excellent works! I am quite interested in your works and have been attempting to reproduce the demo to explore potential improvements and new insights of your repository.

I discovered that all artifacts were failed to download since the domain mkiserv114.reutlingen-university.de seems to be unavailale at the moment (please correct me if I'm mistaken). Could you kindly provide an alternative download mirror? If only temporary access could be granted, I would hapily offer you a mirror on my private cloud!

Broken Links

Pretrained Weights - Files inside get_models.sh

Dataset - IEEE Intelligent Transportation Systems Conference (ITSC 2019)

Dataset - IEEE Transactions on Intelligent Transportation Systems - Special Issue 21st IEEE Intelligent Transportation Systems Conference (ITSC 2018)

only use one frame when inference with the web camera

Thanks for providing the pipeline for action recognition.
I tried the script "run_ehpi.py" and took a look on the code. It seems the actionnet only use one frame to predict the action. While in your paper, there're 32 skeletons sequence being concatenated together to get the action result. Although there is a parameter called buffer_size in this scripts, it doesn't contribute to the serial frames' action recognition.
I feel a little bit confused, is there anything I misunderstood?

Problem while saving pose video

Hello, Thank you for such a nice code, I need little information regarding the saving of generated pose video, I uploaded a video file for action recognition not used a webcam. I would like to know how can I save the resultant video just like you save the image.
Please let me give some guide on code.

Dataset dimensions

Thanks for making this great work available.
my question is how the transformation happened in the datasets.
The datasets (Jhmdb and use case) dimensions are m rows x 1440 columns.
in the paper its written

The encoded joints are assigned in a
fixed order in a 1 × n × 3 matrix, where n stands for the
number of joints. After the human pose for a frame has been
encoded into such a matrix, it is appended as last column
to a m × n × 3 matrix and replaces the first column of
this matrix if it already contains m frames.

I think i'm lost here, can you please explain it.

JHMDB dataset generation/parsing

Hello,
I would like to understand how to read the .csv dataset files for the JHMDB dataset. What it the format? Lines, columns signification?
How was those files created from the .mat joints positions from the original JHMDB dataset?
Thank you very much in advance

Shape of EHPI image

Hello,

I just want to inquire about the shape of EHPI image, the paper mentions it as 3 x 15 x 32, such that the image channels have the x and y coordinates and each row represents a joint value at each timestep. However when debugging the file "action_rec_net_ehpi.py", it shows the input to the model as (3, 32, 15 ).

ShuffleNetV2 vs Simple network architecture that we used to classify the EHPIs on the JHMDB dataset

Hi @noboevbo,
I wanted to know if you have done Quantitative(FPS and Accuracy) and Qualitative analysis for ShuffleNetV2 and the Simple network architecture that we used to classify the EHPIs on the JHMDB dataset models?

You have mentioned this in the paper for the reason for preferring ShuffleNetV2
"Since we have used considerably more data in our use case than is available in the JHMDB [28] dataset (see section V), the network is no longer sufficient. Expanding the network with further convolutional layers and also increasing the size of the fully connected layer would result in the network having more parameters than some existing and efficient CNNs for classification. Therefore we employ the ShuffleNetv2 [29] architecture with which we also demonstrate the application of standard computer vision algorithms to EHPIs"

What if our application is using around 4 actions, then which model do you think is better?

guide to train from video files?

hello, thank you for such a great work, would you mind to share knowledge how to train from our dataset like video or image files? I just little bit stuck here, thanks

Demo always label the action as 'wave'

I tried the demo "run_ehpi.py" and everything got setup and run very easily, well done for that. However, the result on live footage were very inaccurate. I wonder if I am doing anything wrong or if the demo code works on any assumption that I am not taking in to account.
The demo starts and even if the person is idle, the classifier labels it as 'wave' after a few seconds and keeps the label no matter how still the person remains. I've tried it multiple times with various distances from camera but every time the demo labels the person's action as 'wave' and then do not change.
Can you comment or provide any guideline as this does not seems to be the intending behaviours?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.