Git Product home page Git Product logo

moments_models's People

Contributors

alexandonian avatar cabreraalex avatar jqueguiner avatar mmonfort avatar zhoubolei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moments_models's Issues

When can this dataset can be downloaded?

Hi Bolei,
It is exciting to see this large video dataset and I want to try our model on this dataset.
However, I can not find where to download this dataset. Have this dataset been released or is it still in preparation?

Suggestion for Readme.md

When trying to follow the instructions on the current Readme.md, it occurred to me that while it's not explicitly stated, it's assumed that the user has already installed wget, opencv, pytorch, torchvision, and moviepy.

It would be awesome if the Readme can be updated to reflect these software dependencies!

License

Hi,

Thanks for sharing the pre-trained models.
Can you include a License with which you want your code and models to be used?

I wouldn't be able to use it otherwise.

Thanks,
Raghav

Error running test_model.py

I ran into the following error message:

Traceback (most recent call last):
  File "test_model.py", line 64, in <module>
    model = load_model(modelID, categories)
  File "test_model.py", line 33, in load_model
    state_dict = {str.replace(k, 'module.', ''): v for k, v in checkpoint['state_dict'].items()}
  File "test_model.py", line 33, in <dictcomp>
    state_dict = {str.replace(k, 'module.', ''): v for k, v in checkpoint['state_dict'].items()}
TypeError: descriptor 'replace' requires a 'str' object but received a 'unicode'

My guess is that it's related to python27(which I am using) and python36. Do you have a version of the code tested on python27?

Pre-recorded Streaming video

Due to having most of the videos on the web and not locally on the computer, do we have an ability to use test_video on such media ?

resnet3d50 model architecture

Hello and thanks for this amazing code,

Is the resnet3d50 network architecture the same as I3D as described in your paper released with moments_models here? Or is this strictly a 3D convnet that wasn't documented in the publication linked above?

If this is a different model what sort of performance should we expect compared to I3D?

Thanks again.

Number of classes

Hello,
the original dataset had 339 classes. Why do you have 305.
Best Regards

Data prep scripts for the Moments in Time dataset

Thank you for releasing the pre-trained models. Could you point me to any scripts/repos that I could use for the data preparation involved with the Moments in Time dataset. so scripts to extract the frames from the videos, generate the index files for train, val, and test split, and set up the train, validation, and category meta files.

Thanks,

2D or 3D

Is the model of moments_RGB_resnet50_imagenetpretrained.pth.tar 2D or 3D?

issue in str replace when loading model

python 2.7
root@16e6dd7bd439:/torch/moments_models# python test_model.py
Traceback (most recent call last):
File "test_model.py", line 54, in
model = load_model(modelID, categories)
File "test_model.py", line 31, in load_model
state_dict = {str.replace(k,'module.',''): v for k,v in checkpoint['state_dict'].items()}
File "test_model.py", line 31, in
state_dict = {str.replace(k,'module.',''): v for k,v in checkpoint['state_dict'].items()}
TypeError: descriptor 'replace' requires a 'str' object but received a 'unicode'

Difference in result on TRN-pytorch example

Hi all,

For the example mentioned, I am able to observe a slight difference in the result in comparison to what is mentioned on the website.

python test_video.py --arch InceptionV3 --dataset moments --weight pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar --frame_folder sample_data/bolei_juggling
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
Loading frames in sample_data/bolei_juggling
RESULT ON sample_data/bolei_juggling
0.994 -> juggling
0.002 -> flipping
0.001 -> spinning
0.001 -> stacking
0.001 -> drumming

Though, the difference is minor wanted to understand what could be a potential difference as I receive no error while running the example nor the installation packages. I am using python 3.6 with dependencies installed.

Looking forward to your response.

Kind Regards,
Ankit Shah

test_model.py update

I got the following message when running test_model.py, although it still returned a list of top actions:

"test_model.py:75: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead."

Should I go ahead and make a pull request with the suggested change?

test_video.py - AttributeError: 'NoneType' object has no attribute 'groups'

Hi,

I am trying to inference against a test video /test_1.mp4 and I am getting the following error any idea I am running on Windows, Anaconda, Python 3.6, Pytorch 0.3.0 etc. The test_model.py works but not video.

(momentsintime) C:\Users\Pablo\moments_models>python test_video.py --video_file /test_1.mp4
Extracting frames using ffmpeg...
Traceback (most recent call last):
File "test_video.py", line 120, in
frames = extract_frames(args.video_file, args.num_segments)
File "test_video.py", line 50, in extract_frames
duration = re_duration.search(str(output[1])).groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'

The mAP for res3d50 model is much lower than 0.6

I tried to use "test_video.py" file to load the pretrained "multi_moments_resnet3d50_wlsep.pth.tar" parameters and test on the validation set of multi-moments in time dataset. However, the mAP result showed by torchnet.metric is 0.3948 which is noticeable lower than 0.6 reported in the paper.

In the test script, I set the frames to 16 and use decord library to extract frames from video. Before feeding the data to model, basic preprocess is done, like resize and normalize. And after calculating, I use torchnet's mAP and AP meter to do mAP calculating. However, the result is frustrating, I got 0.4 instead of roughly 0.6.

So, is there anything I'm missing?

"""Test pre-trained RGB model on a single video.

Date: 01/15/18
Authors: Bolei Zhou and Alex Andonian

This script accepts an mp4 video as the command line argument --video_file
and averages ResNet50 (trained on Moments) predictions on num_segment equally
spaced frames (extracted using ffmpeg).

Alternatively, one may instead provide the path to a directory containing
video frames saved as jpgs, which are sorted and forwarded through the model.

ResNet50 trained on Moments is used to predict the action for each frame,
and these class probabilities are average to produce a video-level predction.

Optionally, one can generate a new video --rendered_output from the frames
used to make the prediction with the predicted category in the top-left corner.

"""

import os
import argparse
#import moviepy.editor as mpy

import torch.optim
import torch.nn.parallel
from torch.nn import functional as F

import models
from utils import extract_frames, load_frames, render_frames
from torch.utils.data import DataLoader,Dataset

from decord import VideoReader
from decord import cpu, gpu
import numpy as np
import PIL
from PIL import Image
import torch
import torchnet.meter as meter


# options
parser = argparse.ArgumentParser(description="test TRN on a single video")
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument('--video_file', type=str, default=None)
group.add_argument('--frame_folder', type=str, default=None)
parser.add_argument('--rendered_output', type=str, default=None)
parser.add_argument('--num_segments', type=int, default=16)
parser.add_argument('--arch', type=str, default='resnet3d50', choices=['resnet50', 'resnet3d50'])
args = parser.parse_args()

# Load model
model = models.load_model(args.arch)

# Get dataset categories
categories = models.load_categories()

# Load the video frame transform
transform = models.load_transform()

# Obtain video frames
"""
if args.frame_folder is not None:
    print('Loading frames in {}'.format(args.frame_folder))
    import glob
    # here make sure after sorting the frame paths have the correct temporal order
    frame_paths = sorted(glob.glob(os.path.join(args.frame_folder, '*.jpg')))
    frames = load_frames(frame_paths)
    print("Loading videos in {}".format(args.frame_folder))
else:
    print('Extracting frames using ffmpeg...')
    frames = extract_frames(args.video_file, args.num_segments)
"""
data_path = "/workdir/wwn/Multi_Moments_in_Time/videos/"
annotation_path = "./validationSet.txt"
class TestVideo(Dataset):
    def __init__(self, data_path, annotation_path):
        self.data_path = data_path
        self.records = []
        with open(annotation_path) as f:
            lines = f.readlines()
            for line in lines:
                self.records.append(line.strip().replace("\n",""))

    def __len__(self):
        return len(self.records)

    def __getitem__(self, index):
        single_record = self.records[index]
        parts = single_record.split(",")
        video_path = parts[0]
        video_path = os.path.join(self.data_path, video_path)
        vr = VideoReader(video_path, ctx=cpu(0))
        gap = int(len(vr)/16) #TODO change this to 5(5 fps) to see results
        index_list = [i * gap for i in range(16)]
        frames = vr.get_batch(index_list).asnumpy()
        frames = [Image.fromarray(frames[i]).convert("RGB") for i in range(16)]
        frames = torch.stack([transform(frame) for frame in frames],1)

        label = [0] * 313
        target_classes = parts[1:]
        for item in target_classes:
            label[int(item)] = 1
        label = torch.LongTensor(label)
        return {"frames":frames, "label":label}

"""
# Prepare input tensor
if args.arch == 'resnet3d50':
    # [1, num_frames, 3, 224, 224]
    input = torch.stack([transform(frame) for frame in frames], 1).unsqueeze(0)
else:
    # [num_frames, 3, 224, 224]
    input = torch.stack([transform(frame) for frame in frames])
"""
#dataset
val_dataset = TestVideo(data_path, annotation_path)
val_loader = DataLoader(val_dataset, batch_size=1, shuffle=False, num_workers=4, pin_memory=True)

device = torch.device("cuda", 0)
model = model.to(device)

mAP = meter.mAPMeter()
mAP.reset()
AP = meter.APMeter()
AP.reset()

# Make video prediction
for index, data in enumerate(val_loader):
    frames = data["frames"].to(device)
    target = data["label"].to(device)
    with torch.no_grad():
        logits = model(frames)
        
        #h_x = F.softmax(logits, 1).mean(dim=0)#TODO check output value range
        #probs, idx = h_x.sort(0, True)
        mAP.add(logits.data, target.data)
        AP.add(logits.data, target.data)
print(mAP.value())
print(AP.value())
np.save("res3d50-ap.npy", AP.value().data.cpu().numpy())

"""
# Output the prediction.
video_name = args.frame_folder if args.frame_folder is not None else args.video_file
print('RESULT ON ' + video_name)
for i in range(0, 5):
    print('{:.3f} -> {}'.format(probs[i], categories[idx[i]]))

# Render output frames with prediction text.
if args.rendered_output is not None:
    prediction = categories[idx[0]]
    rendered_frames = render_frames(frames, prediction)
    clip = mpy.ImageSequenceClip(rendered_frames, fps=4)
    clip.write_videofile(args.rendered_output)
"""

Not an issue, but a question

For the videos, does test_video.py support only local files? In other words, if one wants to analyze videos from the web, does one have to download the videos first, then call "python test_video.py --video_file directory_path"?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.