Git Product home page Git Product logo

ehpi_action_recognition's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

ehpi_action_recognition's Issues

Dead download links of Pretrained weight and Dataset

Hello, @noboevbo,
Thanks for sharing your excellent works! I am quite interested in your works and have been attempting to reproduce the demo to explore potential improvements and new insights of your repository.

I discovered that all artifacts were failed to download since the domain mkiserv114.reutlingen-university.de seems to be unavailale at the moment (please correct me if I'm mistaken). Could you kindly provide an alternative download mirror? If only temporary access could be granted, I would hapily offer you a mirror on my private cloud!

Broken Links

Pretrained Weights - Files inside get_models.sh

Dataset - IEEE Intelligent Transportation Systems Conference (ITSC 2019)

Dataset - IEEE Transactions on Intelligent Transportation Systems - Special Issue 21st IEEE Intelligent Transportation Systems Conference (ITSC 2018)

JHMDB dataset generation/parsing

Hello,
I would like to understand how to read the .csv dataset files for the JHMDB dataset. What it the format? Lines, columns signification?
How was those files created from the .mat joints positions from the original JHMDB dataset?
Thank you very much in advance

Demo always label the action as 'wave'

I tried the demo "run_ehpi.py" and everything got setup and run very easily, well done for that. However, the result on live footage were very inaccurate. I wonder if I am doing anything wrong or if the demo code works on any assumption that I am not taking in to account.
The demo starts and even if the person is idle, the classifier labels it as 'wave' after a few seconds and keeps the label no matter how still the person remains. I've tried it multiple times with various distances from camera but every time the demo labels the person's action as 'wave' and then do not change.
Can you comment or provide any guideline as this does not seems to be the intending behaviours?

only use one frame when inference with the web camera

Thanks for providing the pipeline for action recognition.
I tried the script "run_ehpi.py" and took a look on the code. It seems the actionnet only use one frame to predict the action. While in your paper, there're 32 skeletons sequence being concatenated together to get the action result. Although there is a parameter called buffer_size in this scripts, it doesn't contribute to the serial frames' action recognition.
I feel a little bit confused, is there anything I misunderstood?

Dataset dimensions

Thanks for making this great work available.
my question is how the transformation happened in the datasets.
The datasets (Jhmdb and use case) dimensions are m rows x 1440 columns.
in the paper its written

The encoded joints are assigned in a
fixed order in a 1 × n × 3 matrix, where n stands for the
number of joints. After the human pose for a frame has been
encoded into such a matrix, it is appended as last column
to a m × n × 3 matrix and replaces the first column of
this matrix if it already contains m frames.

I think i'm lost here, can you please explain it.

Real-time viability for multiple number of persons?

Since the Top-Down approach is being used for human-pose estimation, the computational power required will increase as the number of people increase, will this model be still good enough for real-time applications?

PS: I Will try this out too, but just asking if someone here has tried it out.

Problem while saving pose video

Hello, Thank you for such a nice code, I need little information regarding the saving of generated pose video, I uploaded a video file for action recognition not used a webcam. I would like to know how can I save the resultant video just like you save the image.
Please let me give some guide on code.

guide to train from video files?

hello, thank you for such a great work, would you mind to share knowledge how to train from our dataset like video or image files? I just little bit stuck here, thanks

annotated new human action

I would like to ask you how to mark your motion recognition and what tools or codes are used
I don’t know how to generate training files(.csv)

EHPI generation

Hi,

Very interesting paper.
I was looking at your code to generate the EHPI images and I have some questions.

for human_id, action_vecs in humans_for_action_rec.items():
			ehpi_img = np.zeros((32, 15, 3), dtype=np.float32)
			for frame_num, action_vec in enumerate(action_vecs):
				if action_vec is None:
					continue
				ehpi_img[frame_num] = action_vec
			ehpi_img = np.transpose(ehpi_img, (2, 0, 1))
			# Set Blue Channel to zero
			ehpi_img[2, :, :] = 0
			# Normalize EHPI
			tmp_dict = {'x': ehpi_img}
			tmp_dict['x'] = self.remove(tmp_dict)['x']
			ehpi_img = self.normalize(tmp_dict)['x']
			net_input = np.zeros((1, 3, 32, 15), dtype=np.float32)
			net_input[0] = ehpi_img

			input_seq = Variable(torch.tensor(net_input, dtype=torch.float)).cuda()
			tag_scores = self.model(input_seq).data.cpu().numpy()[0]
			outputs[human_id] = tag_scores
		return outputs

How would I change the code to encode x, y, an z information and normalize it?
I'm having some issues in understanding how did you normalize each coordinate.
Could you provide a script to generate and visualize the EHPI images?

Thanks

Shape of EHPI image

Hello,

I just want to inquire about the shape of EHPI image, the paper mentions it as 3 x 15 x 32, such that the image channels have the x and y coordinates and each row represents a joint value at each timestep. However when debugging the file "action_rec_net_ehpi.py", it shows the input to the model as (3, 32, 15 ).

When trying to run run_ehpi.py this error pop up

Traceback (most recent call last):
File "ehpi_action_recognition/run_ehpi.py", line 86, in
pose_model = pose_model.cuda()
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 223, in _apply
param_applied = fn(param)
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in
return self._apply(lambda t: t.cuda(device))
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/cuda/init.py", line 196, in _lazy_init
_check_driver()
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/cuda/init.py", line 94, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

ShuffleNetV2 vs Simple network architecture that we used to classify the EHPIs on the JHMDB dataset

Hi @noboevbo,
I wanted to know if you have done Quantitative(FPS and Accuracy) and Qualitative analysis for ShuffleNetV2 and the Simple network architecture that we used to classify the EHPIs on the JHMDB dataset models?

You have mentioned this in the paper for the reason for preferring ShuffleNetV2
"Since we have used considerably more data in our use case than is available in the JHMDB [28] dataset (see section V), the network is no longer sufficient. Expanding the network with further convolutional layers and also increasing the size of the fully connected layer would result in the network having more parameters than some existing and efficient CNNs for classification. Therefore we employ the ShuffleNetv2 [29] architecture with which we also demonstrate the application of standard computer vision algorithms to EHPIs"

What if our application is using around 4 actions, then which model do you think is better?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.