hongluzhou / composer Goto Github PK
View Code? Open in Web Editor NEWCompositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality
The Volleyball data folder can not be opened.Can you update the link?Thank you very much.
Fixed
We tried to reproduce the results on Original Volleyball Dataset, but failed. We use the 2 stage training strategy as mentioned in the readme, and run it twice with the following result:
No. | stage 1 acc | stage 2 acc |
---|---|---|
paper | 93.7% | |
1 | Test Prec@1: 93.044 % | Test Prec@1: 93.119 % |
2 | Test Prec@1: 93.044 % | Test Prec@1: 92.595 % |
I am wondering if we are using the wrong configs? Since at our second trial, the 2nd stage acc even decreases. The configs and scripts used are as follows:
python main.py train --mode train --cfg configs/volleyball_stage_1.yml
python main.py train --mode train --cfg configs/volleyball_stage_2.yml --load_pretrained 1 --checkpoint checkpoints/xxxxxxxxx.pth
exp_name: composer_vd_original
# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation
horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True
image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12
J: 17
T: 10
recollect_stats_train: True
# -- Training settings
seed: -1
batch_size: 256
num_epochs: 40
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0005
weight_decay: 0.001
# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True
# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1
# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0
# -- Runtime settings
gpu:
- 0
- 1
- 2
- 3
# - 4
# - 5
# - 6
# - 7
dev: 0
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/
exp_name: composer_vd_original
# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation
horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True
image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12
J: 17
T: 10
recollect_stats_train: False
# -- Training settings
seed: -1
batch_size: 256
num_epochs: 5
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0001
weight_decay: 0.001
# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True
# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1
# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0
# -- Runtime settings
gpu:
- 0
- 1
- 2
- 3
# - 4
# - 5
# - 6
# - 7
dev: 0
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/
Hello, congrats on the amazing work!
For Table 2 of the Supplementary material, it shows that 1-scale keypoint model achieved 91.2 test accuracy.
I was wondering if you can say a little bit about what portion of the model this was. Was this a model that only used the first transformer layer "inner" and not the other 3 transformer layers (middle, outer, group)?
Hi @hongluzhou ,
Thanks for your kind open sourcing the composer code.
I downloaded the collective activity dataset from the link in your README. I am not sure if I understand what the annotation actually means. Can you please correct/add the description here below?
cheers,
xu
i hope to download volleyball dataset(joint pickle files)
i did download volleyball dataset about 3 month ago but i can't download volleyball dataset now
please check your upload files and upload fixed file.
i can download CAD dataset.
thank you
Hi, I would like to know that I reproduced the results of the volleyball data set according to your steps, and the results were: === Test Result === Prec@1: 16.904% / Prec@3: 45.176 % / Person Prec@1: 67.433 % / Person Prec@3: 82.342 @epoch-5
Different from the result repeated in the article, I would like to know where my parameters are not set correctly?
Hello @hongluzhou ,
In the paper, online clustering secion, you mentioned to enforce equipartion, the vector are set to ones. As far as I understand, the sinkhorn algorithm is to find optimal transport between two distributions given a cost matrix.
My questions:
Can you please help?
cheers,
xu
Hello Honglu Zhou,
As described in the title, this file/class/method is not found.
Can you please update the missing file?
cheers,
I try to find the model checkpoint at https://drive.google.com/file/d/1rBQSoGS7MrhvDaPRtVi0TbZygoe63Fo_/view?usp=sharing
But it turns out that 'Pages Not Found'.
I cannot appreciate it more if you could update the link. Thanks.
Hi, notice that the dataset sharing link is exceed,could you share a new one?
thanks
Hello @hongluzhou ,
When running the collective activity dataset, I found that the difference of joints coordinates of successive frames is big.
(Pdb) len(joint_dxcoords)
19293300
(Pdb) np.histogram(joint_dxcoords)
(array([ 4455, 5897, 9256, 8636, 1998641, 17242240,
7888, 7894, 5020, 3373]), array([-7.99904283e+02, -6.40013798e+02, -4.80123314e+02, -3.20232829e+02,
-1.60342345e+02, -4.51860162e-01, 1.59438624e+02, 3.19329109e+02,
4.79219593e+02, 6.39110078e+02, 7.99000562e+02]))
We can see that tens of thousands of elements in joint_dxcoords are more than 100, although the proportion of these big elements is small. Does this violate the assumptions of TOKS? Will this affect the training of the model?
cheers,
xu
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.