Git Product home page Git Product logo

ig65m-pytorch's People

Contributors

daniel-j-h avatar hsandhawalia avatar sandhawalia avatar yunseokjang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ig65m-pytorch's Issues

Citation information

I was wondering how i should cite this work in a paper. I have used pretrained weights provided by this repository in my work

Provide r(2+1)d 152-layer models and weights

Right nowe we only provide the model and weights for r(2+1)d 34-layer which strikes a perfect balance between model size/runtime and accuracy in practise (sorry Kaggle folks).

We should make our code modular so that we can provide e.g. the CSN models or the 152-layered r(2+1)d model (although in practise it is quite heavy for the tiny bump in accuracy).

Add Travis CI integration

We should add Travis for CI.

Travis should check (amongst other)

  • original weights can still be downloaded / url reachable
  • tools work as expected, tests
  • flake8, black

other pretrained models?

Dear author:
thanks for sharing the large scale pretrained checkpint. it helps. I wonder any lightweight models can be shared? such as TSM, X3D, or R(2+1)D-18? thank you very much.

are there any preprocessing to the input video clips?

Hi, great work and gave my lots of help ! However, I still need some help.
I am really not familiar with caffe2 and could not find out whether the caffe2 version IG65M model used any pre-processing to the input video clips or not.
In my experiment,I just simply normalized the pixel to [0,1]. but the performance didn't look very good (about 92% on ucf101, with ig65m pretrained model, I did some finetune on ucf101, or the performance even worse)。 So I wonder if we need to do some specify pre-processing to the video clips like substract the means or somethings else ?
Thanks for your attention and kindly help :)

Need help with prediction codes

Hi, I am working on using a pretrained model to do video classifications and I'm a beginner. I borrowed codes from extract.py in cli and other sources. Following codes did produce some results, but seemed not correct. In addition, for some videos, there were indices larger than 400 in max_indices. Appreciate if anyone could help with the codes!

classes.json is from here.

import sys

import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.autograd import Variable

from torchvision.transforms import Compose

import numpy as np
from einops.layers.torch import Rearrange, Reduce
from tqdm import tqdm

from ig65m.models import r2plus1d_34_32_kinetics
from ig65m.datasets import VideoDataset
from ig65m.transforms import ToTensor, Resize, Normalize
from pathlib import Path
import json

class VideoModel(nn.Module):
    def __init__(self, pool_spatial="mean", pool_temporal="mean"):
        super().__init__()

        self.model = r2plus1d_34_32_kinetics(num_classes=400, pretrained=True, progress=True)

        self.pool_spatial = Reduce("n c t h w -> n c t", reduction=pool_spatial)
        self.pool_temporal = Reduce("n c t -> n c", reduction=pool_temporal)

    def forward(self, x):
        x = self.model.stem(x)
        x = self.model.layer1(x)
        x = self.model.layer2(x)
        x = self.model.layer3(x)
        x = self.model.layer4(x)

        x = self.pool_spatial(x)
        x = self.pool_temporal(x)

        return x

if __name__ == "__main__":
    if torch.cuda.is_available():
        print("🐎 Running on GPU(s)", file=sys.stderr)
        device = torch.device("cuda")
        torch.backends.cudnn.benchmark = True
    else:
        print("🐌 Running on CPU(s)", file=sys.stderr)
        device = torch.device("cpu")

    model = VideoModel(pool_spatial="mean",
                       pool_temporal="mean")
    
    model.eval()

    for params in model.parameters():
        params.requires_grad = False

    model = model.to(device)
    model = nn.DataParallel(model)
    with open('classes.json','r') as load_f:
        load_dict = json.load(load_f)
    class_names = np.array(load_dict)

    transform = Compose([
        ToTensor(),
        Rearrange("t h w c -> c t h w"),
        Resize(128),
        Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
    ])

    dataset = VideoDataset(Path("./Yoga3.mp4"), clip=32, transform=transform)
    loader = DataLoader(dataset, batch_size=1, num_workers=0, shuffle=False, pin_memory=True)

    video_outputs = []

    with torch.no_grad():
        for inputs in tqdm(loader, total=len(dataset) // 1):
            inputs = inputs.to(device)
            outputs = model(inputs)

            video_outputs.append(outputs.cpu().data)


    video_outputs = torch.cat(video_outputs)

    results = {
        'video': "Yoga-3.mp4",
        'clips': []
    }
    
    _, max_indices = video_outputs.max(dim=1)

    for i in range(video_outputs.size(0)):
        clip_results = {}
        clip_results['label'] = class_names[max_indices[i]]
        results['clips'].append(clip_results)
    
    print(results)

fine tune the pre-trained model on UCF101

if I want to fine tune the pre-trained model on UCF101, how could I get the performance of 96.8%?
In my settings, fine-tuning was only performed to train layer4 and the fully connected layer, and the learning rate is 0.0001, am I wrong? Cause the result I got just 78.5%, can you help me? Thank you!

r2plus1d_34_8_ig65m with 16 frames input

Hi
I am new to training video models. I have been reading papers which work on action recognition using new models like R3D, R2plus1D with 16 frame inputs. Is there a way to use the R2plus1D_34 using IG65 pretrained weights and finetune it on kinetics400 dataset.

Provide convenient fine-tuning tool for users

We should provide a tool where the user can load up a pre-trained model, point it at a dataset (e.g. directory of labeled videos), and have it fine-tune the model automatically.

Look into PyTorch Hub integration

Make downloading and using the model and weights easier.

Right now users have to copy our model architecture function, download the weights from our Github releases, and then load them into the model - all manually.

It should be easy and convenient to get started.

Something wrong when run code : torch.hub.list("moabitcoin/ig65m-pytorch")

hi, l'm a new hand in pytorch and when l run the code : torch.hub.list("moabitcoin/ig65m-pytorch")
I got Run error: IncompleteRead: IncompleteRead(0 bytes read)
Then I have try run code : import http.client
http.client.HTTPConnection._http_vsn = 1
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
torch.hub.list("moabitcoin/ig65m-pytorch")
but I got another run error: BadZipFile: File is not a zip file
Can someone help me solve this problem!
THANKS

Let CI build docker images

We should provide pre-built docker images for convenience and ease of use.

Once we have Travis CI (#13) we should let Travis build and push images automatically.

Users then should be able to say something along the lines of

docker run --runtime=nvidia --ipc=host moabitcoin/ig65m extract myvideo.mp4

without having to install our requirements or even cloning this repository.

See e.g. https://github.com/mapbox/robosat/blob/master/.travis.yml for how this can be set up.

Thanks!

Thanks for porting the models and sharing the code in such an easy to use manner. It's great.

about extract features

Thank you for your wanderful work!

Example for running on GPUs via nvidia-docker:

docker run --runtime=nvidia --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-gpu \
    extract /data/myvideo.mp4 /data/myfeatures.npy

Could this docker extract one batch feature one time? 😥

Provide convenient and efficient feature extraction tool for users

Right now our extract.py tool can extract features for a single video at a time, and parallelizes the data loading across ranges of frames. We should change that:

  • Extract features for a dataset of videos / directory of videos
  • Dump features into file keyed by e.g. frame ids
  • Allow extracting by means of PyTorch hooks for arbitrary activations
  • Parallelize on video level not range of frames per video

can I download IG65M dataset?

I tried looking for the IG65M dataset to download and pre-train my model, do you know if it's public?
sorry for a naive question!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.