naraysa / 3c-net Goto Github PK

Weakly-supervised Action Localization

Python 100.00%

action-localization weakly-supervised-detection iccv2019

3c-net's Introduction

3C-Net

Overview

This package is a PyTorch implementation of our paper 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization, to be published at ICCV 2019.

Data

We use the same I3D features, for the Thumos14 and ActivityNet 1.2 datasets, released by Sujoy Paul. The annotations are already included in this repository.

Training 3C-Net

The model can be trained using the following commands. See options.py for additional parse arguments

# Running on Thumos14 
python main.py --dataset-name Thumos14
# Running on ActivityNet 1.2
python main.py --dataset-name ActivityNet1.2 --activity-net --num-class 100

Citation

Please cite the following work if you use this package.

@inproceedings{narayan20193c,
  title={3c-net: Category count and center loss for weakly-supervised action localization},
  author={Narayan, Sanath and Cholakkal, Hisham and Khan, Fahad Shahbaz and Shao, Ling},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2019}
}

Dependencies

This codebase was built on the W-TALC package found here and has the following dependencies.

PyTorch 0.4.1, Tensorboard Logger 0.1.0
Python 3.6
numpy, scipy among others

3c-net's People

Contributors

Stargazers

Watchers

Forkers

yangwf1 wanboyang starkyyy g-morgen xujinglin felixy12 itgirls mymuli artificial-intelligence-office

3c-net's Issues

Values are not matching the reported values

Hi,

I'm trying to reproduce the results reported in the paper using this repo. But the value for action classification on Thumos14 dataset is somewhat not matching the reported result (I am getting 71.67 mAP instead of 86.9 reported in the paper). I have use all the default parameters. Is there something I am missing?

Other values are somewhat close to the values reported.

I am waiting for your reply. Thanks in advance.

FileNotFoundError: [Errno 2] No such file or directory: 'Thumos14-I3D-JOINTFeatures.npy'

so where is the .npy ?

about the center-loss

the
if (labels[i] > 0).sum() == 0 or ((labels[i] > 0).sum() != 1 and itr < itr_th): continue
in

def CENTERLOSS(features, logits, labels, seq_len, criterion, itr, device):
    lab = torch.zeros(0).to(device)
    feat = torch.zeros(0).to(device)
    itr_th = 5000    
    for i in range(features.size(0)):
        if (labels[i] > 0).sum() == 0 or ((labels[i] > 0).sum() != 1 and itr < itr_th):
            continue
        # categories present in the video
        labi = torch.arange(labels.size(1))[labels[i]>0]
        atn = F.softmax(logits[i][:seq_len[i]], dim=0)
        atni = atn[:,labi]
        # aggregate features category-wise
        for l in range(len(labi)):
            labl = labi[[l]].float()
            atnl = atni[:,[l]]
            atnl[atnl<atnl.mean()] = 0
            sum_atn = atnl.sum()
            if sum_atn > 0:
                atnl = atnl.expand(seq_len[i],features.size(2))
                # attention-weighted feature aggregation
                featl = torch.sum(features[i][:seq_len[i]]*atnl,dim=0,keepdim=True)/sum_atn
                feat = torch.cat([feat, featl], dim=0)
                lab = torch.cat([lab, labl], dim=0)
        
    if feat.numel() > 0:
        # Compute loss
        loss = criterion(feat, lab)
        return loss / feat.size(0)
    else:
        return 0

Does it mean center-loss use on multi-label training video?

How to generate the figure 4 of the paper?

Hi,

Is it possible to provide us with the code for reproducing the figure 4 from the paper?

Do you utilize additional data from THUMOS14 val set for temporal action localization?

Nice work and congrats to your ICCV paper. Thanks for sharing your code.

As you mentioned in the paper, for THUMOS14, you follow the setting in STPN(CVPR 18): use 200 videos(20 categories) in the val set for training.

I did not carefully go through every line of your code. It seems that you use all 1010 videos in the val set to train your classifier, this is fair for the action classification task. But it seems that you use the same network to perform the temporal action localization task. I don't think this is the standard protocol for weakly-supervised temporal action localization.

How do you make predictions for videos?

How to make prediction of a video? What is the threshold you choose usually? I am talking about the following line in the paper

After training the 3C-Net, the CLS module (see Fig. 2
and Eq. 2) is used to compute the action-class scores (pmf)
at the video-level using the final T-CAM, for the action classification task

Evaluation Code Problem

Hi, Sanath

I ran the code recently using I3D feature and got the same result reported in the paper with [email protected] = 26.6 on THUMOS14 dataset.
However, I saved the predictions and used the official code for evaluation, I got 23.
Are the results reported in the paper using the evaluation code in detectionMAP.py or the official code?
Could please release the model checkpoint if you are using the official code?

Thank you
Fan

a

Can't achieve mAP as reported in the paper

I run the code for a several times, the mAP at IOU=0.5 is 25.34%. Then I use exponential lr scheduling, and it increases to 26.06%. But still lower than 26.6% as reported. Can you possibly explain what could be wrong? Thanks in advance.