Git Product home page Git Product logo

Comments (5)

tmanh avatar tmanh commented on June 24, 2024 4

Oh, I read his code and found out that he had used the svm binary features from this link http://pr.cs.cornell.edu/humanactivities/data/features.tar

Once you successfully downloaded it, you need to edit and run readData.py for converting svm binary features to node and edge features (which are described in his paper). After that, you can run his model :)

from rnnexp.

pschydlo avatar pschydlo commented on June 24, 2024 2

Thank you so much for the tip! :)

The code does not make it easy to figure out what is happening at times, some comments would be really useful! Nevertheless it's already nice to have the source code to replicate the experiments!

Now I have the following directory structure:

/features_cad120_ground_truth_segmentation

  • /features_binary_svm_format
  • /segments_svm_format

Am I right to assume that this directory:
/scail/scratch/group/cvgl/ashesh/activity-anticipation/features_ground_truth'

Corresponds to the main directory?
/features_cad120_ground_truth_segmentation

or?
/features_cad120_ground_truth_segmentation/features_binary_svm_format

The author also mentions this cryptic directory:
'/scail/scratch/group/cvgl/ashesh/activity-anticipation/activityids_fold{0}.txt'

Have you figured out where I can find or how to create this file?

Thank you for the help!

EDIT: Managed to generate the pik files, in case there is anyone else with the same problem:

  • Follow Ahn's link, download the features, extract them into the folders features_binary_svm_format and segments_svm_format,
  • Substitute the file names in the readData file: where it says s='' just plug in s= path to the feature folder/features_binary_svm
  • The fold files are just new line separated lists of the activities divided in N sets. You can easily generate them executing this command: "ls | tr -d '.txt' | split -l 32 - fold" in the /segments_svm_format folder, this command just lists all the activities and stores the list in 4 files (32 activities in each file)

Good luck!

from rnnexp.

pschydlo avatar pschydlo commented on June 24, 2024

I'm having the same problem, are you using the CAD120 dataset directly? how do you preprocess the dataset? would it be possible to release the prepared dataset for training?

Thank you very much in advance!

from rnnexp.

pschydlo avatar pschydlo commented on June 24, 2024

After reading through the readData code it's really hard to decipher what the feature arrays represent, have you managed to figure it out?

For example, X_tr_human_disjoint is an array with dimensions 25x93x790 where 790 is the dimension of the feature vector, do you know what the other two dimensions are?

The same with X_tr_objects_disjoint whose dimensions are 25x226x620 where 620 stands for the object feature vector.

In the human feature structure the 25 as far as I understood stands for the maximum number of segments and 93 for the training examples (segments) set size, but this is not coherent with the dimensions of the object structure, what does the 226 stand for?

Thanks in advance for your time and attention!

EDIT: In case anyone has the same question, the mysterious 226 is a dimension that represents the concatenation of the objects, to avoid having a variable sized frame the author just concatenates the objects along the dimension of the activity. The 93 corresponds to the activities and 226 to the sum of the objects along all activities, the average number of objects in every activity is 2.43, 2.43*93 = 226 = Total number of objects ever seen along all segments (not distinct!)

Since the author never stores which object corresponds to which activity I now wonder how the author is able to reconstruct the original structure in the end?

from rnnexp.

tmanh avatar tmanh commented on June 24, 2024
  1. Dimensions of the feature is T x N x D
    where T is number of time steps (segment), N is number of training (testing) samples and D is the dimension of the feature vector

  2. Why the number of training samples of objects and human are different?

  • Actually, CAD-120 is human-object interaction dataset (in one activity, we can have 2-3 interacted objects). CAD-120 dataset is not only used for activity recognition, but the authors also used it for object affordances recognition. Briefly, affordance is the possibility of an action on an object or environment. It means one object at one time have one affordance. Because X_tr_objects_disjoint is used for both object affordance detection and anticipation, it has much more training examples than X_tr_human_disjoint which is only used for human activity labelling.
  • So, you just misunderstood the purpose of X_tr_objects_disjoint.
  1. How the author is able to reconstruct the original structure in the end?
  • I think you have not fully understood the paper of S-RNN. He did mentioned about sharing parameter mechanism in his paper. He trained human activity recognition and affordance recognition at the same time.
loss_layer_1 = self.train_layer_1(X_shared_1_minibatch,X_1_minibatch,Y_1_minibatch)
loss_layer_2 = self.train_layer_2(X_shared_2_minibatch,X_2_minibatch,Y_2_minibatch)
  • And, Human-Object relation feature (for human activity recognition) and Object-Human relation (for object affordance recognition) are fed into the same RNN node (shared_layers), while X_tr_human_disjoint is fed into layer_1 and X_tr_objects_disjoint is fed into (layer_2).
self.X = shared_layers[0].input
self.X_1 = layer_1[0].input
self.X_2 = layer_2[0].input
  • You can find the above code in sharedRNN file. To see how he used sharedRNN use can read this code in activity-rnn-full-model.
shared_input_layer = TemporalInputFeatures(inputJointFeatures)
shared_hidden_layer = LSTM('tanh', 'sigmoid', lstm_init, 4, 128, rng=rng)

shared_layers = [shared_input_layer, shared_hidden_layer]
human_layers = [ConcatenateFeatures(inputHumanFeatures), LSTM('tanh', 'sigmoid', lstm_init, 4, 256, rng=rng),

softmax(num_sub_activities, softmax_init, rng=rng)]
object_layers = [ConcatenateFeatures(inputObjectFeatures), LSTM('tanh', 'sigmoid', lstm_init, 4, 256, rng=rng),
softmax(num_affordances, softmax_init, rng=rng)]

trY_1 = T.lmatrix()
trY_2 = T.lmatrix()
sharedrnn = SharedRNN(shared_layers, human_layers, object_layers, softmax_loss, trY_1, trY_2, 1e-3)

Good luck!!!

from rnnexp.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.