Git Product home page Git Product logo

composer's People

Contributors

hongluzhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

composer's Issues

Unable to reproduce the results

We tried to reproduce the results on Original Volleyball Dataset, but failed. We use the 2 stage training strategy as mentioned in the readme, and run it twice with the following result:

No. stage 1 acc stage 2 acc
paper 93.7%
1 Test Prec@1: 93.044 % Test Prec@1: 93.119 %
2 Test Prec@1: 93.044 % Test Prec@1: 92.595 %

I am wondering if we are using the wrong configs? Since at our second trial, the 2nd stage acc even decreases. The configs and scripts used are as follows:

python main.py train --mode train --cfg configs/volleyball_stage_1.yml
python main.py train --mode train --cfg configs/volleyball_stage_2.yml --load_pretrained 1 --checkpoint checkpoints/xxxxxxxxx.pth

stage 1

exp_name: composer_vd_original


# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: True


# -- Training settings
seed: -1
batch_size: 256
num_epochs: 40
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0005
weight_decay: 0.001


# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True


# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1


# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0



# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0
  
  
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/


stage 2

exp_name: composer_vd_original


# -- Dataset settings
dataset_name: volleyball
dataset_dir: /home/guangyi.chen/workspace/yifan/composer/volleyball/volleyball
olympic_split: False
ball_trajectory_use: True
joints_folder_name: joints
tracklets_file_name: tracks_normalized.pkl
person_action_label_file_name: tracks_normalized_with_person_action_label.pkl
ball_trajectory_folder_name: volleyball_ball_annotation

horizontal_flip_augment: True
horizontal_flip_augment_purturb: True
horizontal_move_augment: True
horizontal_move_augment_purturb: True
vertical_move_augment: True
vertical_move_augment_purturb: True
agent_dropout_augment: True

image_h: 720
image_w: 1280
num_classes: 8
num_person_action_classes: 10
frame_start_idx: 5
frame_end_idx: 14
frame_sampling: 1
N: 12 
J: 17
T: 10
recollect_stats_train: False


# -- Training settings
seed: -1
batch_size: 256
num_epochs: 5
num_workers: -1
optimizer: 'adam'
learning_rate: 0.0001
weight_decay: 0.001


# -- Learning objective settings
loss_coe_fine: 1
loss_coe_mid: 1
loss_coe_coarse: 1
loss_coe_group: 1
loss_coe_last_TNT: 3
loss_coe_person: 1
use_group_activity_weights: True
use_person_action_weights: True


# -- Contrastive cluster assignment
nmb_prototypes: 100
temperature: 0.1
sinkhorn_iterations: 3
loss_coe_constrastive_clustering: 1


# -- Model settings
model_type: composer
group_person_frame_idx: 5
joint_initial_feat_dim: 8
joint2person_feat_dim: 2
num_gcn_layers: 3
max_num_tokens: 10
max_times_embed: 100
time_position_embedding_type: absolute_learned_1D 
max_image_positions_h: 1000
max_image_positions_w: 1500
image_position_embedding_type: learned_fourier_2D
# ------ Multiscale Transformer settings
projection_batchnorm: False
projection_dropout: 0
TNT_hidden_dim: 256
TNT_n_layers: 2
innerTx_nhead: 2 
innerTx_dim_feedforward: 1024
innerTx_dropout: 0.5
innerTx_activation: relu 
middleTx_nhead: 8
middleTx_dim_feedforward: 1024
middleTx_dropout: 0.2
middleTx_activation: relu 
outerTx_nhead: 2
outerTx_dim_feedforward: 1024
outerTx_dropout: 0.2
outerTx_activation: relu 
groupTx_nhead: 2
groupTx_dim_feedforward: 1024
groupTx_dropout: 0
groupTx_activation: relu 
# ------ Final classifier settings
classifier_use_batchnorm: False
classifier_dropout: 0



# -- Runtime settings
gpu:
  - 0
  - 1
  - 2
  - 3
#   - 4
#   - 5
#   - 6
#   - 7
dev: 0
  
  
# -- Output settings
checkpoint_dir: ./checkpoints/
log_dir: ./logs/

Q about ablations in paper

Hello, congrats on the amazing work!

For Table 2 of the Supplementary material, it shows that 1-scale keypoint model achieved 91.2 test accuracy.
I was wondering if you can say a little bit about what portion of the model this was. Was this a model that only used the first transformer layer "inner" and not the other 3 transformer layers (middle, outer, group)?

about dataset collective activity

Hi @hongluzhou ,

Thanks for your kind open sourcing the composer code.
I downloaded the collective activity dataset from the link in your README. I am not sure if I understand what the annotation actually means. Can you please correct/add the description here below?

  • annotations.pkl: dict which is like {video_index: sub_dict}. sub_dict is like {index_of_every_10_frames: sub_sub_dict}, sub_sub_dict has dict_keys(['frame_id', 'group_activity', 'actions', 'bboxes']). NOT sure of what the 'group_activity', 'actions', 'bboxes' really refer to.
  • joints: sub folders are video_index whose items are index_of_every_10_frames.pickle. The pickle file contains a dict like {frame_index: value_item} the value_item is of shape (13,17,3). NOT sure of what this value_item is.
  • tracks_normalized.pkl: dict whose keys are like (video_index, index_of_every_10_frames), and whose values are sub_dicts. sub_dict is like {index_of_frame: value_item}. the value_item is of shape (13, 4). NOT sure of what this value_time is.
  • tracks_normalized_with_person_action_label.pkl: similar to tracks_normalized.pk, but the final value_item is of shape (13, 1). NOT sure of what this value_time is.
  • videos: the 44 videos, grouped by 10 frames.

cheers,
xu

volleyball.zip can't download

i hope to download volleyball dataset(joint pickle files)

i did download volleyball dataset about 3 month ago but i can't download volleyball dataset now

please check your upload files and upload fixed file.

i can download CAD dataset.

thank you

hello

Hi, I would like to know that I reproduced the results of the volleyball data set according to your steps, and the results were: === Test Result === Prec@1: 16.904% / Prec@3: 45.176 % / Person Prec@1: 67.433 % / Person Prec@3: 82.342 @epoch-5
Different from the result repeated in the article, I would like to know where my parameters are not set correctly?

The sinkhorn algorithm

Hello @hongluzhou ,

In the paper, online clustering secion, you mentioned to enforce equipartion, the vector are set to ones. As far as I understand, the sinkhorn algorithm is to find optimal transport between two distributions given a cost matrix.

My questions:

  1. why equipartition is done?
  2. Is there a cost matrix here? if yes, which one?

Can you please help?

cheers,
xu

the difference of joints coordinates of successive frames is big

Hello @hongluzhou ,

When running the collective activity dataset, I found that the difference of joints coordinates of successive frames is big.

(Pdb) len(joint_dxcoords)
19293300
(Pdb) np.histogram(joint_dxcoords)
(array([ 4455, 5897, 9256, 8636, 1998641, 17242240,
7888, 7894, 5020, 3373]), array([-7.99904283e+02, -6.40013798e+02, -4.80123314e+02, -3.20232829e+02,
-1.60342345e+02, -4.51860162e-01, 1.59438624e+02, 3.19329109e+02,
4.79219593e+02, 6.39110078e+02, 7.99000562e+02]))

We can see that tens of thousands of elements in joint_dxcoords are more than 100, although the proportion of these big elements is small. Does this violate the assumptions of TOKS? Will this affect the training of the model?

cheers,
xu

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.