hongluzhou / composer Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 4.0 40.82 MB

Compositional Reasoning of Group Activity in Videos with Keypoint-Only Modality

Python 100.00%

composer's People

Contributors

Stargazers

Watchers

Forkers

kev1n3zz 0shelter0 vcip2015

composer's Issues

The Volleyball data folder can not be opened.

The Volleyball data folder can not be opened.Can you update the link?Thank you very much.

Unable to reproduce the results

We tried to reproduce the results on Original Volleyball Dataset, but failed. We use the 2 stage training strategy as mentioned in the readme, and run it twice with the following result:

No.	stage 1 acc	stage 2 acc
paper		93.7%
1	Test Prec@1: 93.044 %	Test Prec@1: 93.119 %
2	Test Prec@1: 93.044 %	Test Prec@1: 92.595 %

I am wondering if we are using the wrong configs? Since at our second trial, the 2nd stage acc even decreases. The configs and scripts used are as follows:

python main.py train --mode train --cfg configs/volleyball_stage_1.yml
python main.py train --mode train --cfg configs/volleyball_stage_2.yml --load_pretrained 1 --checkpoint checkpoints/xxxxxxxxx.pth

stage 1

exp_name: composer_vd_original


# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: True


# -- Training settings
seed: -1
batch_size: 256
num_epochs: 40
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0005
weight_decay: 0.001


# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True


# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1


# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0



# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0
  
  
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

stage 2

exp_name: composer_vd_original


# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: False


# -- Training settings
seed: -1
batch_size: 256
num_epochs: 5
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0001
weight_decay: 0.001


# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True


# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1


# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0



# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0
  
  
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

For Table 2 of the Supplementary material, it shows that 1-scale keypoint model achieved 91.2 test accuracy.
I was wondering if you can say a little bit about what portion of the model this was. Was this a model that only used the first transformer layer "inner" and not the other 3 transformer layers (middle, outer, group)?

about dataset collective activity

Hi @hongluzhou ,

Thanks for your kind open sourcing the composer code.
I downloaded the collective activity dataset from the link in your README. I am not sure if I understand what the annotation actually means. Can you please correct/add the description here below?

annotations.pkl: dict which is like {video_index: sub_dict}. sub_dict is like {index_of_every_10_frames: sub_sub_dict}, sub_sub_dict has dict_keys(['frame_id', 'group_activity', 'actions', 'bboxes']). NOT sure of what the 'group_activity', 'actions', 'bboxes' really refer to.
joints: sub folders are video_index whose items are index_of_every_10_frames.pickle. The pickle file contains a dict like {frame_index: value_item} the value_item is of shape (13,17,3). NOT sure of what this value_item is.
tracks_normalized.pkl: dict whose keys are like (video_index, index_of_every_10_frames), and whose values are sub_dicts. sub_dict is like {index_of_frame: value_item}. the value_item is of shape (13, 4). NOT sure of what this value_time is.
tracks_normalized_with_person_action_label.pkl: similar to tracks_normalized.pk, but the final value_item is of shape (13, 1). NOT sure of what this value_time is.
videos: the 44 videos, grouped by 10 frames.

cheers,
xu

volleyball.zip can't download

i hope to download volleyball dataset(joint pickle files)

i did download volleyball dataset about 3 month ago but i can't download volleyball dataset now

please check your upload files and upload fixed file.

i can download CAD dataset.

thank you

hello

Hi, I would like to know that I reproduced the results of the volleyball data set according to your steps, and the results were: === Test Result === Prec@1: 16.904% / Prec@3: 45.176 % / Person Prec@1: 67.433 % / Person Prec@3: 82.342 @epoch-5
Different from the result repeated in the article, I would like to know where my parameters are not set correctly?

The sinkhorn algorithm

Hello @hongluzhou ,

In the paper, online clustering secion, you mentioned to enforce equipartion, the vector are set to ones. As far as I understand, the sinkhorn algorithm is to find optimal transport between two distributions given a cost matrix.

My questions:

why equipartition is done?
Is there a cost matrix here? if yes, which one?

Can you please help?

cheers,
xu

ModuleNotFoundError: No module named 'datasets.collective'

Hello Honglu Zhou,

As described in the title, this file/class/method is not found.
Can you please update the missing file?

cheers,

Cannot find the checkpoint link

I try to find the model checkpoint at https://drive.google.com/file/d/1rBQSoGS7MrhvDaPRtVi0TbZygoe63Fo_/view?usp=sharing

But it turns out that 'Pages Not Found'.

I cannot appreciate it more if you could update the link. Thanks.

The dataset sharing link has exceed, could you please share a new one ?

Hi, notice that the dataset sharing link is exceed,could you share a new one?
thanks

the difference of joints coordinates of successive frames is big

Hello @hongluzhou ,

When running the collective activity dataset, I found that the difference of joints coordinates of successive frames is big.

(Pdb) len(joint_dxcoords)
19293300
(Pdb) np.histogram(joint_dxcoords)
(array([ 4455, 5897, 9256, 8636, 1998641, 17242240,
7888, 7894, 5020, 3373]), array([-7.99904283e+02, -6.40013798e+02, -4.80123314e+02, -3.20232829e+02,
-1.60342345e+02, -4.51860162e-01, 1.59438624e+02, 3.19329109e+02,
4.79219593e+02, 6.39110078e+02, 7.99000562e+02]))

We can see that tens of thousands of elements in joint_dxcoords are more than 100, although the proportion of these big elements is small. Does this violate the assumptions of TOKS? Will this affect the training of the model?

cheers,
xu

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.