Git Product home page Git Product logo

egoprocel-egocentric-procedure-learning's Introduction

I am a first-year PhD student at the University of Bristol working with Prof. Dima Damen. My research interest is Computer Vision, Pattern Recognition, and Machine Learning. Currently, I am working on devising learning-based methods for understanding and exploring various aspects of first-person (egocentric) vision. Previously, at CVIT, IIIT Hyderabad, I worked with Prof. C.V. Jawahar and Prof. Chetan Arora on unsupervised procedure learning from egocentric videos.

Earlier, I worked on improving word recognition and retrieval in large document collection with Prof. C.V. Jawahar and on 3D Computer Vision with Prof. Shanmuganathan Raman.

My ultimate goal is to contribute to the development of systems capable of understanding the world as we do. I'm an inquisitive person, and I'm always willing to learn about fields including, but not limited to, science, technology, astrophysics, and physics.

Website / CV / Google Scholar / LinkedIn / arXiv / ORCID

GitHub Stats Siddhant's GitHub Stats

egoprocel-egocentric-procedure-learning's People

Contributors

sid2697 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

egoprocel-egocentric-procedure-learning's Issues

Questions about the code

Hi Sid, great work! I am trying to understand the codebase and have quite a few questions and confusions. Please reply-

  1. Is the TCC model trained separately for each task of each dataset? (e.g. is there a separate model trained for CMU-kitchens salad and CMU-kitchens pizza and so on?)
  2. If the above is true, then do we need to create separate data directories containing just a task of a dataset? for e.g. https://github.com/Sid2697/EgoProceL-egocentric-procedure-learning/blob/main/configs/demo_config.yaml in here, TCC DATA_PATH: /scratch/sid/CMUMMAC/videos/48448_6510211, does this dir contain videos of only task id 48448 and view id 651...?
  3. If we do not need to do the above, so the all the views of the same task videos remain in the same dir (salad_ego video and salad_top/back video are in the same dir, as currently the annotation folder is structured), in that case, what part of the code decides to read only the corresponding view's videos and ignore the other views?
  4. In general, after downloading the data, what steps do I need to do to run the code and reproduce the results (how should I structure the videos?)? Can you please suggest me the steps?

(I may add more questions as I go, thanks a lot for your help)

Confusion about data paths

Hello, i am trying to train only Sandwich videos on my pc. But it seems i am doing some wrong things. For example in demo_config.yaml what is the difference between CMU_KITCHEN: VIDEOS_PATH and TCC: DATA_PATH?

In addition, what should be the folder structure for videos? as is annotations folder structure or different?

Could you please share a detailed data folder structure explanation and example config file for Sandwich videos?

Frames extraction

Hi @Sid2697 , I am having a issue with .mp4/.avi to frames h5py extraction. Even for the pc_ass/disassembly videos, for many videos, the .h5 file seems too small (so errors out during read). Has anyone faced this issue/please suggest how to solve it? Thanks

Memory & Time Cost for the experiments.

Hello.
First of all, thank you for your interesting work.

Can you provide the training hours to run the experiment? (With the details of GPU)

Thank you.
Regards.

Code for the "Determining Order of the Key-Steps" part

Hi,

I find this work really interesting and I just started looking at the code.
As described in the paper, the last part of the model is the ranked list of key-steps for the same procedure. I'm not finding the implementation of this module in the code, am I loosing something?

Thanks in advance.
Alessandro Flaborea

About instructions for the codes

Hello @Sid2697,

I am very interested in procedure learning on egocentric videos.
I have tried to run the released codes, but it failed due to a lack of instructions.

When will you upload the instruction for the codes?

Regards,
Yu-Jung

Download links not working

Hi, thank you for sharing your work! Sounds very interersting!
I can't access the OneDrive download links. It just say 404 page not found.
Would love to take a closer look at the dataset and its annotations:)

questions about codes and data

Hi Sid,
Thanks for your wonderful work! When I ran the code, I met a few questions:

  1. some annotations were missing. For example, "Head_9.csv" in pc_assembly. A similar situation also appeared in MECCANO;
  2. For CrossTask and ProceL, some videos cannot be found due to invalid urls;
  3. The function "generate_unique_video_steps" in "RepLearn/TCC/utils.py" seems wrong and there will be an error of tensor dimensions mismatch when running this function.

Can you check these issues and provide complete your datasets so that I can smoothly follow your work? Thanks a million!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.