Git Product home page Git Product logo

3c-net's Introduction

3C-Net

Overview

This package is a PyTorch implementation of our paper 3C-Net: Category Count and Center Loss for Weakly-Supervised Action Localization, to be published at ICCV 2019.

Data

We use the same I3D features, for the Thumos14 and ActivityNet 1.2 datasets, released by Sujoy Paul. The annotations are already included in this repository.

Training 3C-Net

The model can be trained using the following commands. See options.py for additional parse arguments

# Running on Thumos14 
python main.py --dataset-name Thumos14
# Running on ActivityNet 1.2
python main.py --dataset-name ActivityNet1.2 --activity-net --num-class 100

Citation

Please cite the following work if you use this package.

@inproceedings{narayan20193c,
  title={3c-net: Category count and center loss for weakly-supervised action localization},
  author={Narayan, Sanath and Cholakkal, Hisham and Khan, Fahad Shahbaz and Shao, Ling},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2019}
}

Dependencies

This codebase was built on the W-TALC package found here and has the following dependencies.

  1. PyTorch 0.4.1, Tensorboard Logger 0.1.0
  2. Python 3.6
  3. numpy, scipy among others

3c-net's People

Contributors

naraysa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

3c-net's Issues

Values are not matching the reported values

Hi,

I'm trying to reproduce the results reported in the paper using this repo. But the value for action classification on Thumos14 dataset is somewhat not matching the reported result (I am getting 71.67 mAP instead of 86.9 reported in the paper). I have use all the default parameters. Is there something I am missing?

Other values are somewhat close to the values reported.

I am waiting for your reply. Thanks in advance.

about the center-loss

the
if (labels[i] > 0).sum() == 0 or ((labels[i] > 0).sum() != 1 and itr < itr_th): continue
in

def CENTERLOSS(features, logits, labels, seq_len, criterion, itr, device):
    lab = torch.zeros(0).to(device)
    feat = torch.zeros(0).to(device)
    itr_th = 5000    
    for i in range(features.size(0)):
        if (labels[i] > 0).sum() == 0 or ((labels[i] > 0).sum() != 1 and itr < itr_th):
            continue
        # categories present in the video
        labi = torch.arange(labels.size(1))[labels[i]>0]
        atn = F.softmax(logits[i][:seq_len[i]], dim=0)
        atni = atn[:,labi]
        # aggregate features category-wise
        for l in range(len(labi)):
            labl = labi[[l]].float()
            atnl = atni[:,[l]]
            atnl[atnl<atnl.mean()] = 0
            sum_atn = atnl.sum()
            if sum_atn > 0:
                atnl = atnl.expand(seq_len[i],features.size(2))
                # attention-weighted feature aggregation
                featl = torch.sum(features[i][:seq_len[i]]*atnl,dim=0,keepdim=True)/sum_atn
                feat = torch.cat([feat, featl], dim=0)
                lab = torch.cat([lab, labl], dim=0)
        
    if feat.numel() > 0:
        # Compute loss
        loss = criterion(feat, lab)
        return loss / feat.size(0)
    else:
        return 0

Does it mean center-loss use on multi-label training video?

Do you utilize additional data from THUMOS14 val set for temporal action localization?

Nice work and congrats to your ICCV paper. Thanks for sharing your code.

As you mentioned in the paper, for THUMOS14, you follow the setting in STPN(CVPR 18): use 200 videos(20 categories) in the val set for training.

I did not carefully go through every line of your code. It seems that you use all 1010 videos in the val set to train your classifier, this is fair for the action classification task. But it seems that you use the same network to perform the temporal action localization task. I don't think this is the standard protocol for weakly-supervised temporal action localization.

How do you make predictions for videos?

How to make prediction of a video? What is the threshold you choose usually? I am talking about the following line in the paper

After training the 3C-Net, the CLS module (see Fig. 2
and Eq. 2) is used to compute the action-class scores (pmf)
at the video-level using the final T-CAM, for the action classification task

Evaluation Code Problem

Hi, Sanath

I ran the code recently using I3D feature and got the same result reported in the paper with [email protected] = 26.6 on THUMOS14 dataset.
However, I saved the predictions and used the official code for evaluation, I got 23.
Are the results reported in the paper using the evaluation code in detectionMAP.py or the official code?
Could please release the model checkpoint if you are using the official code?

Thank you
Fan

Can't achieve mAP as reported in the paper

I run the code for a several times, the mAP at IOU=0.5 is 25.34%. Then I use exponential lr scheduling, and it increases to 26.06%. But still lower than 26.6% as reported. Can you possibly explain what could be wrong? Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.