moabitcoin / ig65m-pytorch Goto Github PK

View Code? Open in Web Editor NEW

264.0 264.0 30.0 20.57 MB

PyTorch 3D video classification models pre-trained on 65 million Instagram videos

License: MIT License

Makefile 4.28% Python 95.62% Shell 0.10%

action-recognition deeplearnig machine-learning pytorch torchvision

ig65m-pytorch's People

Contributors

Stargazers

Watchers

ig65m-pytorch's Issues

Port missing r(2+1)d 34-layer IG65-M model

Blocked by official weights not being released so far:

facebookresearch/VMZ#87

Citation information

I was wondering how i should cite this work in a paper. I have used pretrained weights provided by this repository in my work

Fix released weight asset filenames and broken pb file

@sandhawalia looks like one of the onnx pb files are broken:

and the filenames do not match the syntax for torch hub:

https://github.com/pytorch/pytorch/blob/v1.2.0/torch/hub.py#L53-L54

Tasks

re-upload broken pb file
change release file name hash suffixes from .._ig65m_0aa0550b.pth to _ig65m-0aa0550b.pth (last delimiter has to be -)
adapt urls in readme

Provide r(2+1)d 152-layer models and weights

Right nowe we only provide the model and weights for r(2+1)d 34-layer which strikes a perfect balance between model size/runtime and accuracy in practise (sorry Kaggle folks).

We should make our code modular so that we can provide e.g. the CSN models or the 152-layered r(2+1)d model (although in practise it is quite heavy for the tiny bump in accuracy).

Upstream our changes to PyTorch r(2+1)d architecture

The torchvision r(2+1)d architecture needs two modifications to get it in sync with the official Caffe2 implementation (see facebookresearch/VMZ#89) and our provided code:

Number of midplanes in the downsampling blocks
Batchnorm

We should upstream both modifications to torchvision.

Add Travis CI integration

We should add Travis for CI.

Travis should check (amongst other)

original weights can still be downloaded / url reachable
tools work as expected, tests
flake8, black

other pretrained models?

Dear author:
thanks for sharing the large scale pretrained checkpint. it helps. I wonder any lightweight models can be shared? such as TSM, X3D, or R(2+1)D-18? thank you very much.

are there any preprocessing to the input video clips?

Hi, great work and gave my lots of help ! However, I still need some help.
I am really not familiar with caffe2 and could not find out whether the caffe2 version IG65M model used any pre-processing to the input video clips or not.
In my experiment，I just simply normalized the pixel to [0,1]. but the performance didn't look very good (about 92% on ucf101, with ig65m pretrained model, I did some finetune on ucf101, or the performance even worse)。 So I wonder if we need to do some specify pre-processing to the video clips like substract the means or somethings else ?
Thanks for your attention and kindly help :)

Need help with prediction codes

Hi, I am working on using a pretrained model to do video classifications and I'm a beginner. I borrowed codes from extract.py in cli and other sources. Following codes did produce some results, but seemed not correct. In addition, for some videos, there were indices larger than 400 in max_indices. Appreciate if anyone could help with the codes!

classes.json is from here.

import sys

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.autograd import Variable

from torchvision.transforms import Compose

import numpy as np
from einops.layers.torch import Rearrange, Reduce
from tqdm import tqdm

from ig65m.models import r2plus1d_34_32_kinetics
from ig65m.datasets import VideoDataset
from ig65m.transforms import ToTensor, Resize, Normalize
from pathlib import Path
import json

class VideoModel(nn.Module):
    def __init__(self, pool_spatial="mean", pool_temporal="mean"):
        super().__init__()

        self.model = r2plus1d_34_32_kinetics(num_classes=400, pretrained=True, progress=True)

        self.pool_spatial = Reduce("n c t h w -> n c t", reduction=pool_spatial)
        self.pool_temporal = Reduce("n c t -> n c", reduction=pool_temporal)

    def forward(self, x):
        x = self.model.stem(x)
        x = self.model.layer1(x)
        x = self.model.layer2(x)
        x = self.model.layer3(x)
        x = self.model.layer4(x)

        x = self.pool_spatial(x)
        x = self.pool_temporal(x)

        return x

if __name__ == "__main__":
    if torch.cuda.is_available():
        print("🐎 Running on GPU(s)", file=sys.stderr)
        device = torch.device("cuda")
        torch.backends.cudnn.benchmark = True
    else:
        print("🐌 Running on CPU(s)", file=sys.stderr)
        device = torch.device("cpu")

    model = VideoModel(pool_spatial="mean",
                       pool_temporal="mean")
    
    model.eval()

    for params in model.parameters():
        params.requires_grad = False

    model = model.to(device)
    model = nn.DataParallel(model)
    with open('classes.json','r') as load_f:
        load_dict = json.load(load_f)
    class_names = np.array(load_dict)

    transform = Compose([
        ToTensor(),
        Rearrange("t h w c -> c t h w"),
        Resize(128),
        Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
    ])

    dataset = VideoDataset(Path("./Yoga3.mp4"), clip=32, transform=transform)
    loader = DataLoader(dataset, batch_size=1, num_workers=0, shuffle=False, pin_memory=True)

    video_outputs = []

    with torch.no_grad():
        for inputs in tqdm(loader, total=len(dataset) // 1):
            inputs = inputs.to(device)
            outputs = model(inputs)

            video_outputs.append(outputs.cpu().data)


    video_outputs = torch.cat(video_outputs)

    results = {
        'video': "Yoga-3.mp4",
        'clips': []
    }
    
    _, max_indices = video_outputs.max(dim=1)

    for i in range(video_outputs.size(0)):
        clip_results = {}
        clip_results['label'] = class_names[max_indices[i]]
        results['clips'].append(clip_results)
    
    print(results)

How can I fine-tune the whole model using the torch-hub ?

fine tune the pre-trained model on UCF101

if I want to fine tune the pre-trained model on UCF101, how could I get the performance of 96.8%?
In my settings, fine-tuning was only performed to train layer4 and the fully connected layer, and the learning rate is 0.0001, am I wrong? Cause the result I got just 78.5%, can you help me? Thank you!

Provide CSN models and weights

Similar to #8: provide CSN models and weights.

r2plus1d_34_8_ig65m with 16 frames input

Hi
I am new to training video models. I have been reading papers which work on action recognition using new models like R3D, R2plus1D with 16 frame inputs. Is there a way to use the R2plus1D_34 using IG65 pretrained weights and finetune it on kinetics400 dataset.

Provide convenient fine-tuning tool for users

We should provide a tool where the user can load up a pre-trained model, point it at a dataset (e.g. directory of labeled videos), and have it fine-tune the model automatically.

Look into PyTorch Hub integration

Make downloading and using the model and weights easier.

Right now users have to copy our model architecture function, download the weights from our Github releases, and then load them into the model - all manually.

It should be easy and convenient to get started.

Something wrong when run code : torch.hub.list("moabitcoin/ig65m-pytorch")

hi, l'm a new hand in pytorch and when l run the code : torch.hub.list("moabitcoin/ig65m-pytorch")
I got Run error: IncompleteRead: IncompleteRead(0 bytes read)
Then I have try run code : import http.client
http.client.HTTPConnection._http_vsn = 1
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
torch.hub.list("moabitcoin/ig65m-pytorch")
but I got another run error: BadZipFile: File is not a zip file
Can someone help me solve this problem！
THANKS

Run on full Kinetics-400 dataset to verify accuracy claims

We validated the ported weights and model only on subset of Kinetics-400 we had at hand.

We should run over the full Kinetics-400 dataset and verify what the folks claim in:

https://github.com/facebookresearch/vmz

Let CI build docker images

We should provide pre-built docker images for convenience and ease of use.

Once we have Travis CI (#13) we should let Travis build and push images automatically.

Users then should be able to say something along the lines of

docker run --runtime=nvidia --ipc=host moabitcoin/ig65m extract myvideo.mp4

without having to install our requirements or even cloning this repository.

See e.g. https://github.com/mapbox/robosat/blob/master/.travis.yml for how this can be set up.

Provide nvidia-docker based Dockerfile

The Dockerfile we provide right now is cpu based only.

We used that for porting and toying around with the example.

We should provide an additional gpu Dockerfile as an example and to get folks started.

Take inspiration e.g. from my https://github.com/daniel-j-h/efficientnet

using R(2+1)D features extraction instead of C3D

I am a newbie in machine learning and working on the implementation of [this paper] (https://github.com/WaqasSultani/AnomalyDetectionCVPR2018), it uses C3D for the feature extraction phase of its abnormal detection network and I am trying to use R(2+1)D instead of C3D.
Can you please tell me about the overall steps of replacing?

Thanks so much

Thanks!

Thanks for porting the models and sharing the code in such an easy to use manner. It's great.

about extract features

Thank you for your wanderful work!

Example for running on GPUs via nvidia-docker:

docker run --runtime=nvidia --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-gpu \
    extract /data/myvideo.mp4 /data/myfeatures.npy

Could this docker extract one batch feature one time? 😥

Provide convenient and efficient feature extraction tool for users

Right now our extract.py tool can extract features for a single video at a time, and parallelizes the data loading across ranges of frames. We should change that:

Extract features for a dataset of videos / directory of videos
Dump features into file keyed by e.g. frame ids
Allow extracting by means of PyTorch hooks for arbitrary activations
Parallelize on video level not range of frames per video

can I download IG65M dataset?

I tried looking for the IG65M dataset to download and pre-train my model, do you know if it's public?
sorry for a naive question!

Script to convert caffe to pytorch

Can you share any tool by which we can convert Caffe model to pytorch? Or if you can provide the same features for Ip-csn model.

The models 'r2plus1d_34_clip32_ft_kinetics_from_ig65m' fine-tuned on UCF101 can't get good performance, just 78%, why?

Thanks for your models. I met with some problem when using the model 'IG-65M+KINETICS+32X112X112'. I fine-tuned on UCF101 dataset using it, but the result is 78% approximately, which is far away from the results in the paper. Did you get the same results of paper's author? If so, can you share more details about your experiments. Thanks very much!

Update project to recent stable dependencies: pytorch, cuda, pip deps

This project needs an update to the recent stable releases e.g. for

cuda dockerfiles
dependencies in dockerfiles we compile from source
pytorch
dependencies in the requirements lockfile

otherwise nothing should have changed and it should Just Work ™️

cc @sandhawalia aren't you using this at the moment?

moabitcoin / ig65m-pytorch Goto Github PK

ig65m-pytorch's People

Contributors

Stargazers

Watchers

Forkers

ig65m-pytorch's Issues

Recommend Projects

Recommend Topics

Recommend Org