moabitcoin / ig65m-pytorch Goto Github PK
View Code? Open in Web Editor NEWPyTorch 3D video classification models pre-trained on 65 million Instagram videos
License: MIT License
PyTorch 3D video classification models pre-trained on 65 million Instagram videos
License: MIT License
Blocked by official weights not being released so far:
I was wondering how i should cite this work in a paper. I have used pretrained weights provided by this repository in my work
@sandhawalia looks like one of the onnx pb files are broken:
and the filenames do not match the syntax for torch hub:
https://github.com/pytorch/pytorch/blob/v1.2.0/torch/hub.py#L53-L54
Tasks
.._ig65m_0aa0550b.pth
to _ig65m-0aa0550b.pth
(last delimiter has to be -
)Right nowe we only provide the model and weights for r(2+1)d 34-layer which strikes a perfect balance between model size/runtime and accuracy in practise (sorry Kaggle folks).
We should make our code modular so that we can provide e.g. the CSN models or the 152-layered r(2+1)d model (although in practise it is quite heavy for the tiny bump in accuracy).
The torchvision r(2+1)d architecture needs two modifications to get it in sync with the official Caffe2 implementation (see facebookresearch/VMZ#89) and our provided code:
We should upstream both modifications to torchvision.
We should add Travis for CI.
Travis should check (amongst other)
Dear author:
thanks for sharing the large scale pretrained checkpint. it helps. I wonder any lightweight models can be shared? such as TSM, X3D, or R(2+1)D-18? thank you very much.
Hi, great work and gave my lots of help ! However, I still need some help.
I am really not familiar with caffe2 and could not find out whether the caffe2 version IG65M model used any pre-processing to the input video clips or not.
In my experiment,I just simply normalized the pixel to [0,1]. but the performance didn't look very good (about 92% on ucf101, with ig65m pretrained model, I did some finetune on ucf101, or the performance even worse)。 So I wonder if we need to do some specify pre-processing to the video clips like substract the means or somethings else ?
Thanks for your attention and kindly help :)
Hi, I am working on using a pretrained model to do video classifications and I'm a beginner. I borrowed codes from extract.py
in cli and other sources. Following codes did produce some results, but seemed not correct. In addition, for some videos, there were indices larger than 400 in max_indices
. Appreciate if anyone could help with the codes!
classes.json
is from here.
import sys
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.autograd import Variable
from torchvision.transforms import Compose
import numpy as np
from einops.layers.torch import Rearrange, Reduce
from tqdm import tqdm
from ig65m.models import r2plus1d_34_32_kinetics
from ig65m.datasets import VideoDataset
from ig65m.transforms import ToTensor, Resize, Normalize
from pathlib import Path
import json
class VideoModel(nn.Module):
def __init__(self, pool_spatial="mean", pool_temporal="mean"):
super().__init__()
self.model = r2plus1d_34_32_kinetics(num_classes=400, pretrained=True, progress=True)
self.pool_spatial = Reduce("n c t h w -> n c t", reduction=pool_spatial)
self.pool_temporal = Reduce("n c t -> n c", reduction=pool_temporal)
def forward(self, x):
x = self.model.stem(x)
x = self.model.layer1(x)
x = self.model.layer2(x)
x = self.model.layer3(x)
x = self.model.layer4(x)
x = self.pool_spatial(x)
x = self.pool_temporal(x)
return x
if __name__ == "__main__":
if torch.cuda.is_available():
print("🐎 Running on GPU(s)", file=sys.stderr)
device = torch.device("cuda")
torch.backends.cudnn.benchmark = True
else:
print("🐌 Running on CPU(s)", file=sys.stderr)
device = torch.device("cpu")
model = VideoModel(pool_spatial="mean",
pool_temporal="mean")
model.eval()
for params in model.parameters():
params.requires_grad = False
model = model.to(device)
model = nn.DataParallel(model)
with open('classes.json','r') as load_f:
load_dict = json.load(load_f)
class_names = np.array(load_dict)
transform = Compose([
ToTensor(),
Rearrange("t h w c -> c t h w"),
Resize(128),
Normalize(mean=[0.43216, 0.394666, 0.37645], std=[0.22803, 0.22145, 0.216989]),
])
dataset = VideoDataset(Path("./Yoga3.mp4"), clip=32, transform=transform)
loader = DataLoader(dataset, batch_size=1, num_workers=0, shuffle=False, pin_memory=True)
video_outputs = []
with torch.no_grad():
for inputs in tqdm(loader, total=len(dataset) // 1):
inputs = inputs.to(device)
outputs = model(inputs)
video_outputs.append(outputs.cpu().data)
video_outputs = torch.cat(video_outputs)
results = {
'video': "Yoga-3.mp4",
'clips': []
}
_, max_indices = video_outputs.max(dim=1)
for i in range(video_outputs.size(0)):
clip_results = {}
clip_results['label'] = class_names[max_indices[i]]
results['clips'].append(clip_results)
print(results)
if I want to fine tune the pre-trained model on UCF101, how could I get the performance of 96.8%?
In my settings, fine-tuning was only performed to train layer4 and the fully connected layer, and the learning rate is 0.0001, am I wrong? Cause the result I got just 78.5%, can you help me? Thank you!
Similar to #8: provide CSN models and weights.
Hi
I am new to training video models. I have been reading papers which work on action recognition using new models like R3D, R2plus1D with 16 frame inputs. Is there a way to use the R2plus1D_34 using IG65 pretrained weights and finetune it on kinetics400 dataset.
We should provide a tool where the user can load up a pre-trained model, point it at a dataset (e.g. directory of labeled videos), and have it fine-tune the model automatically.
Make downloading and using the model and weights easier.
Right now users have to copy our model architecture function, download the weights from our Github releases, and then load them into the model - all manually.
It should be easy and convenient to get started.
hi, l'm a new hand in pytorch and when l run the code : torch.hub.list("moabitcoin/ig65m-pytorch")
I got Run error: IncompleteRead: IncompleteRead(0 bytes read)
Then I have try run code : import http.client
http.client.HTTPConnection._http_vsn = 1
http.client.HTTPConnection._http_vsn_str = 'HTTP/1.0'
torch.hub.list("moabitcoin/ig65m-pytorch")
but I got another run error: BadZipFile: File is not a zip file
Can someone help me solve this problem!
THANKS
We validated the ported weights and model only on subset of Kinetics-400 we had at hand.
We should run over the full Kinetics-400 dataset and verify what the folks claim in:
We should provide pre-built docker images for convenience and ease of use.
Once we have Travis CI (#13) we should let Travis build and push images automatically.
Users then should be able to say something along the lines of
docker run --runtime=nvidia --ipc=host moabitcoin/ig65m extract myvideo.mp4
without having to install our requirements or even cloning this repository.
See e.g. https://github.com/mapbox/robosat/blob/master/.travis.yml for how this can be set up.
The Dockerfile we provide right now is cpu based only.
We used that for porting and toying around with the example.
We should provide an additional gpu Dockerfile as an example and to get folks started.
Take inspiration e.g. from my https://github.com/daniel-j-h/efficientnet
I am a newbie in machine learning and working on the implementation of [this paper] (https://github.com/WaqasSultani/AnomalyDetectionCVPR2018), it uses C3D for the feature extraction phase of its abnormal detection network and I am trying to use R(2+1)D instead of C3D.
Can you please tell me about the overall steps of replacing?
Thanks so much
Thanks for porting the models and sharing the code in such an easy to use manner. It's great.
Thank you for your wanderful work!
Example for running on GPUs via nvidia-docker:
docker run --runtime=nvidia --ipc=host -v $PWD:/data moabitcoin/ig65m-pytorch:latest-gpu \
extract /data/myvideo.mp4 /data/myfeatures.npy
Could this docker extract one batch feature one time? 😥
Right now our extract.py
tool can extract features for a single video at a time, and parallelizes the data loading across ranges of frames. We should change that:
I tried looking for the IG65M dataset to download and pre-train my model, do you know if it's public?
sorry for a naive question!
Can you share any tool by which we can convert Caffe model to pytorch? Or if you can provide the same features for Ip-csn model.
Thanks for your models. I met with some problem when using the model 'IG-65M+KINETICS+32X112X112'. I fine-tuned on UCF101 dataset using it, but the result is 78% approximately, which is far away from the results in the paper. Did you get the same results of paper's author? If so, can you share more details about your experiments. Thanks very much!
This project needs an update to the recent stable releases e.g. for
otherwise nothing should have changed and it should Just Work ™️
cc @sandhawalia aren't you using this at the moment?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.