noboevbo / ehpi_action_recognition Goto Github PK
View Code? Open in Web Editor NEWSimple yet efficient real-time pose based action recognition
License: MIT License
Simple yet efficient real-time pose based action recognition
License: MIT License
Hello, @noboevbo,
Thanks for sharing your excellent works! I am quite interested in your works and have been attempting to reproduce the demo to explore potential improvements and new insights of your repository.
I discovered that all artifacts were failed to download since the domain mkiserv114.reutlingen-university.de
seems to be unavailale at the moment (please correct me if I'm mistaken). Could you kindly provide an alternative download mirror? If only temporary access could be granted, I would hapily offer you a mirror on my private cloud!
Pretrained Weights - Files inside get_models.sh
Dataset - IEEE Intelligent Transportation Systems Conference (ITSC 2019)
Dataset - IEEE Transactions on Intelligent Transportation Systems - Special Issue 21st IEEE Intelligent Transportation Systems Conference (ITSC 2018)
Hello,
I would like to understand how to read the .csv dataset files for the JHMDB dataset. What it the format? Lines, columns signification?
How was those files created from the .mat joints positions from the original JHMDB dataset?
Thank you very much in advance
Hi,
Does anyone know how to train their own model?
How to extract data format like the CSV files provided by the author?
I really want to train my own models, thanks!
I tried the demo "run_ehpi.py" and everything got setup and run very easily, well done for that. However, the result on live footage were very inaccurate. I wonder if I am doing anything wrong or if the demo code works on any assumption that I am not taking in to account.
The demo starts and even if the person is idle, the classifier labels it as 'wave' after a few seconds and keeps the label no matter how still the person remains. I've tried it multiple times with various distances from camera but every time the demo labels the person's action as 'wave' and then do not change.
Can you comment or provide any guideline as this does not seems to be the intending behaviours?
Thanks for providing the pipeline for action recognition.
I tried the script "run_ehpi.py" and took a look on the code. It seems the actionnet only use one frame to predict the action. While in your paper, there're 32 skeletons sequence being concatenated together to get the action result. Although there is a parameter called buffer_size
in this scripts, it doesn't contribute to the serial frames' action recognition.
I feel a little bit confused, is there anything I misunderstood?
Thanks for making this great work available.
my question is how the transformation happened in the datasets.
The datasets (Jhmdb and use case) dimensions are m rows x 1440 columns.
in the paper its written
The encoded joints are assigned in a
fixed order in a 1 × n × 3 matrix, where n stands for the
number of joints. After the human pose for a frame has been
encoded into such a matrix, it is appended as last column
to a m × n × 3 matrix and replaces the first column of
this matrix if it already contains m frames.
I think i'm lost here, can you please explain it.
Since the Top-Down approach is being used for human-pose estimation, the computational power required will increase as the number of people increase, will this model be still good enough for real-time applications?
PS: I Will try this out too, but just asking if someone here has tried it out.
Hello, Thank you for such a nice code, I need little information regarding the saving of generated pose video, I uploaded a video file for action recognition not used a webcam. I would like to know how can I save the resultant video just like you save the image.
Please let me give some guide on code.
hello, thank you for such a great work, would you mind to share knowledge how to train from our dataset like video or image files? I just little bit stuck here, thanks
I would like to ask you how to mark your motion recognition and what tools or codes are used
I don’t know how to generate training files(.csv)
Hi,
Very interesting paper.
I was looking at your code to generate the EHPI images and I have some questions.
for human_id, action_vecs in humans_for_action_rec.items():
ehpi_img = np.zeros((32, 15, 3), dtype=np.float32)
for frame_num, action_vec in enumerate(action_vecs):
if action_vec is None:
continue
ehpi_img[frame_num] = action_vec
ehpi_img = np.transpose(ehpi_img, (2, 0, 1))
# Set Blue Channel to zero
ehpi_img[2, :, :] = 0
# Normalize EHPI
tmp_dict = {'x': ehpi_img}
tmp_dict['x'] = self.remove(tmp_dict)['x']
ehpi_img = self.normalize(tmp_dict)['x']
net_input = np.zeros((1, 3, 32, 15), dtype=np.float32)
net_input[0] = ehpi_img
input_seq = Variable(torch.tensor(net_input, dtype=torch.float)).cuda()
tag_scores = self.model(input_seq).data.cpu().numpy()[0]
outputs[human_id] = tag_scores
return outputs
How would I change the code to encode x, y, an z information and normalize it?
I'm having some issues in understanding how did you normalize each coordinate.
Could you provide a script to generate and visualize the EHPI images?
Thanks
Hello,
I just want to inquire about the shape of EHPI image, the paper mentions it as 3 x 15 x 32, such that the image channels have the x and y coordinates and each row represents a joint value at each timestep. However when debugging the file "action_rec_net_ehpi.py", it shows the input to the model as (3, 32, 15 ).
Traceback (most recent call last):
File "ehpi_action_recognition/run_ehpi.py", line 86, in
pose_model = pose_model.cuda()
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in cuda
return self._apply(lambda t: t.cuda(device))
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 201, in _apply
module._apply(fn)
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 223, in _apply
param_applied = fn(param)
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/nn/modules/module.py", line 304, in
return self._apply(lambda t: t.cuda(device))
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/cuda/init.py", line 196, in _lazy_init
_check_driver()
File "/home/sarthak/.virtualenvs/ehpi_action_recognition/lib/python3.7/site-packages/torch/cuda/init.py", line 94, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
Hi @noboevbo,
I wanted to know if you have done Quantitative(FPS and Accuracy) and Qualitative analysis for ShuffleNetV2 and the Simple network architecture that we used to classify the EHPIs on the JHMDB dataset models?
You have mentioned this in the paper for the reason for preferring ShuffleNetV2
"Since we have used considerably more data in our use case than is available in the JHMDB [28] dataset (see section V), the network is no longer sufficient. Expanding the network with further convolutional layers and also increasing the size of the fully connected layer would result in the network having more parameters than some existing and efficient CNNs for classification. Therefore we employ the ShuffleNetv2 [29] architecture with which we also demonstrate the application of standard computer vision algorithms to EHPIs"
What if our application is using around 4 actions, then which model do you think is better?
Hi, I love your project and launch your demo "run_ehpi.py" code easily.
And I wonder if this project can training with new datasets.
I can download your datasets from this link (csv files)
https://mkiserv114.reutlingen-university.de/pub/files/2019_04_ehpi/itsc_2019_datasets.tar.gz
But, I can`t found how to extract data format like that link file from my own datasets(such as other videos or images.)
Please give me some advice. Thank you.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.