zhoubolei / moments_models Goto Github PK

View Code? Open in Web Editor NEW

355.0 355.0 68.0 83 KB

The pretrained models trained on Moments in Time Dataset

License: BSD 2-Clause "Simplified" License

Python 100.00%

moments_models's People

Contributors

Stargazers

Watchers

moments_models's Issues

Fine-tuning the TRN model (InceptionV3 backbone) trained on the Moments for something-something-v2 dataset

Hi Bolei:
Thanks for your great work!

I am using the TRN model trained on the Moments to fine-tune it on the sth-sth-V2 dataset.
The model seems to overfit after 25 epochs. Right now I am using the default hyper parameters settings in TRN opts.

I am wondering if you did transferring TRN on the sth-sth dataset or any suggestions would be appreciate! Thanks!

No such file or directory:'moments_RGB_resnet50_imagenetpretrained.pth.tar'

...lib\site-packages\torch\serialization.py
f = open(f, 'rb')
FileNotFoundError: [Errno 2] No such file or directory: 'moments_RGB_resnet50_imagenetpretrained.pth.tar'

test_model.py update

I got the following message when running test_model.py, although it still returned a list of top actions:

"test_model.py:75: UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead."

Should I go ahead and make a pull request with the suggested change?

Which activation function to apply before wLSEP loss?

Not an issue, but a question

For the videos, does test_video.py support only local files? In other words, if one wants to analyze videos from the web, does one have to download the videos first, then call "python test_video.py --video_file directory_path"?

2D or 3D

Is the model of moments_RGB_resnet50_imagenetpretrained.pth.tar 2D or 3D?

Number of classes

Hello,
the original dataset had 339 classes. Why do you have 305.
Best Regards

Help:Reg Action Recognition Dataset

issue in str replace when loading model

python 2.7
root@16e6dd7bd439:/torch/moments_models# python test_model.py
Traceback (most recent call last):
File "test_model.py", line 54, in
model = load_model(modelID, categories)
File "test_model.py", line 31, in load_model
state_dict = {str.replace(k,'module.',''): v for k,v in checkpoint['state_dict'].items()}
File "test_model.py", line 31, in
state_dict = {str.replace(k,'module.',''): v for k,v in checkpoint['state_dict'].items()}
TypeError: descriptor 'replace' requires a 'str' object but received a 'unicode'

Data prep scripts for the Moments in Time dataset

Thank you for releasing the pre-trained models. Could you point me to any scripts/repos that I could use for the data preparation involved with the Moments in Time dataset. so scripts to extract the frames from the videos, generate the index files for train, val, and test split, and set up the train, validation, and category meta files.

Thanks,

where is warp_r in loss_functions.py #44

Difference in result on TRN-pytorch example

Hi all,

For the example mentioned, I am able to observe a slight difference in the result in comparison to what is mentioned on the website.

python test_video.py --arch InceptionV3 --dataset moments --weight pretrain/TRN_moments_RGB_InceptionV3_TRNmultiscale_segment8_best.pth.tar --frame_folder sample_data/bolei_juggling
Multi-Scale Temporal Relation Network Module in use ['8-frame relation', '7-frame relation', '6-frame relation', '5-frame relation', '4-frame relation', '3-frame relation', '2-frame relation']
Freezing BatchNorm2D except the first one.
Loading frames in sample_data/bolei_juggling
RESULT ON sample_data/bolei_juggling
0.994 -> juggling
0.002 -> flipping
0.001 -> spinning
0.001 -> stacking
0.001 -> drumming

Though, the difference is minor wanted to understand what could be a potential difference as I receive no error while running the example nor the installation packages. I am using python 3.6 with dependencies installed.

Looking forward to your response.

Kind Regards,
Ankit Shah

When can this dataset can be downloaded?

Hi Bolei,
It is exciting to see this large video dataset and I want to try our model on this dataset.
However, I can not find where to download this dataset. Have this dataset been released or is it still in preparation？

test_video.py - AttributeError: 'NoneType' object has no attribute 'groups'

Hi,

I am trying to inference against a test video /test_1.mp4 and I am getting the following error any idea I am running on Windows, Anaconda, Python 3.6, Pytorch 0.3.0 etc. The test_model.py works but not video.

(momentsintime) C:\Users\Pablo\moments_models>python test_video.py --video_file /test_1.mp4
Extracting frames using ffmpeg...
Traceback (most recent call last):
File "test_video.py", line 120, in
frames = extract_frames(args.video_file, args.num_segments)
File "test_video.py", line 50, in extract_frames
duration = re_duration.search(str(output[1])).groups()[0]
AttributeError: 'NoneType' object has no attribute 'groups'

The mAP for res3d50 model is much lower than 0.6

I tried to use "test_video.py" file to load the pretrained "multi_moments_resnet3d50_wlsep.pth.tar" parameters and test on the validation set of multi-moments in time dataset. However, the mAP result showed by torchnet.metric is 0.3948 which is noticeable lower than 0.6 reported in the paper.

In the test script, I set the frames to 16 and use decord library to extract frames from video. Before feeding the data to model, basic preprocess is done, like resize and normalize. And after calculating, I use torchnet's mAP and AP meter to do mAP calculating. However, the result is frustrating, I got 0.4 instead of roughly 0.6.

So, is there anything I'm missing?

"""Test pre-trained RGB model on a single video.

Date: 01/15/18
Authors: Bolei Zhou and Alex Andonian

This script accepts an mp4 video as the command line argument --video_file
and averages ResNet50 (trained on Moments) predictions on num_segment equally
spaced frames (extracted using ffmpeg).

Alternatively, one may instead provide the path to a directory containing
video frames saved as jpgs, which are sorted and forwarded through the model.

ResNet50 trained on Moments is used to predict the action for each frame,
and these class probabilities are average to produce a video-level predction.

Optionally, one can generate a new video --rendered_output from the frames
used to make the prediction with the predicted category in the top-left corner.

"""

import os
import argparse
#import moviepy.editor as mpy

import torch.optim
import torch.nn.parallel
from torch.nn import functional as F

import models
from utils import extract_frames, load_frames, render_frames
from torch.utils.data import DataLoader,Dataset

from decord import VideoReader
from decord import cpu, gpu
import numpy as np
import PIL
from PIL import Image
import torch
import torchnet.meter as meter


# options
parser = argparse.ArgumentParser(description="test TRN on a single video")
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument('--video_file', type=str, default=None)
group.add_argument('--frame_folder', type=str, default=None)
parser.add_argument('--rendered_output', type=str, default=None)
parser.add_argument('--num_segments', type=int, default=16)
parser.add_argument('--arch', type=str, default='resnet3d50', choices=['resnet50', 'resnet3d50'])
args = parser.parse_args()

# Load model
model = models.load_model(args.arch)

# Get dataset categories
categories = models.load_categories()

# Load the video frame transform
transform = models.load_transform()

# Obtain video frames
"""
if args.frame_folder is not None:
    print('Loading frames in {}'.format(args.frame_folder))
    import glob
    # here make sure after sorting the frame paths have the correct temporal order
    frame_paths = sorted(glob.glob(os.path.join(args.frame_folder, '*.jpg')))
    frames = load_frames(frame_paths)
    print("Loading videos in {}".format(args.frame_folder))
else:
    print('Extracting frames using ffmpeg...')
    frames = extract_frames(args.video_file, args.num_segments)
"""
data_path = "/workdir/wwn/Multi_Moments_in_Time/videos/"
annotation_path = "./validationSet.txt"
class TestVideo(Dataset):
    def __init__(self, data_path, annotation_path):
        self.data_path = data_path
        self.records = []
        with open(annotation_path) as f:
            lines = f.readlines()
            for line in lines:
                self.records.append(line.strip().replace("\n",""))

    def __len__(self):
        return len(self.records)

    def __getitem__(self, index):
        single_record = self.records[index]
        parts = single_record.split(",")
        video_path = parts[0]
        video_path = os.path.join(self.data_path, video_path)
        vr = VideoReader(video_path, ctx=cpu(0))
        gap = int(len(vr)/16) #TODO change this to 5(5 fps) to see results
        index_list = [i * gap for i in range(16)]
        frames = vr.get_batch(index_list).asnumpy()
        frames = [Image.fromarray(frames[i]).convert("RGB") for i in range(16)]
        frames = torch.stack([transform(frame) for frame in frames],1)

        label = [0] * 313
        target_classes = parts[1:]
        for item in target_classes:
            label[int(item)] = 1
        label = torch.LongTensor(label)
        return {"frames":frames, "label":label}

"""
# Prepare input tensor
if args.arch == 'resnet3d50':
    # [1, num_frames, 3, 224, 224]
    input = torch.stack([transform(frame) for frame in frames], 1).unsqueeze(0)
else:
    # [num_frames, 3, 224, 224]
    input = torch.stack([transform(frame) for frame in frames])
"""
#dataset
val_dataset = TestVideo(data_path, annotation_path)
val_loader = DataLoader(val_dataset, batch_size=1, shuffle=False, num_workers=4, pin_memory=True)

device = torch.device("cuda", 0)
model = model.to(device)

mAP = meter.mAPMeter()
mAP.reset()
AP = meter.APMeter()
AP.reset()

# Make video prediction
for index, data in enumerate(val_loader):
    frames = data["frames"].to(device)
    target = data["label"].to(device)
    with torch.no_grad():
        logits = model(frames)
        
        #h_x = F.softmax(logits, 1).mean(dim=0)#TODO check output value range
        #probs, idx = h_x.sort(0, True)
        mAP.add(logits.data, target.data)
        AP.add(logits.data, target.data)
print(mAP.value())
print(AP.value())
np.save("res3d50-ap.npy", AP.value().data.cpu().numpy())

"""
# Output the prediction.
video_name = args.frame_folder if args.frame_folder is not None else args.video_file
print('RESULT ON ' + video_name)
for i in range(0, 5):
    print('{:.3f} -> {}'.format(probs[i], categories[idx[i]]))

# Render output frames with prediction text.
if args.rendered_output is not None:
    prediction = categories[idx[0]]
    rendered_frames = render_frames(frames, prediction)
    clip = mpy.ImageSequenceClip(rendered_frames, fps=4)
    clip.write_videofile(args.rendered_output)
"""

Frames are not sorted properly in test_video.py

Link to the code location

Frames are not sorted properly in test_video.py

[https://github.com/metalbubble/moments_models/commit/1412d6e86bd2a1c540eb1b2a49747c7f1505ecfe#r33436497]

Pre-recorded Streaming video

Due to having most of the videos on the web and not locally on the computer, do we have an ability to use test_video on such media ?

How do you compute Top-1/5 accuracies for Multi-Moments (multi-label)?

Thanks.

Can you post some accuracy results of the pretrained models?

Thanks for your jobs! It's nice to release the pretrained models.
I think it will be useful if you release some performances of the released models.

resnet3d50 model architecture

Hello and thanks for this amazing code,

Is the resnet3d50 network architecture the same as I3D as described in your paper released with moments_models here? Or is this strictly a 3D convnet that wasn't documented in the publication linked above?

If this is a different model what sort of performance should we expect compared to I3D?

Thanks again.

Suggestion for Readme.md

When trying to follow the instructions on the current Readme.md, it occurred to me that while it's not explicitly stated, it's assumed that the user has already installed wget, opencv, pytorch, torchvision, and moviepy.

It would be awesome if the Readme can be updated to reflect these software dependencies!

Error running test_model.py

I ran into the following error message:

Traceback (most recent call last):
  File "test_model.py", line 64, in <module>
    model = load_model(modelID, categories)
  File "test_model.py", line 33, in load_model
    state_dict = {str.replace(k, 'module.', ''): v for k, v in checkpoint['state_dict'].items()}
  File "test_model.py", line 33, in <dictcomp>
    state_dict = {str.replace(k, 'module.', ''): v for k, v in checkpoint['state_dict'].items()}
TypeError: descriptor 'replace' requires a 'str' object but received a 'unicode'

My guess is that it's related to python27(which I am using) and python36. Do you have a version of the code tested on python27?

How to use this pretrained resnet50 model in android, is there any specific conversion steps?

License

Hi,

Thanks for sharing the pre-trained models.
Can you include a License with which you want your code and models to be used?

I wouldn't be able to use it otherwise.

Thanks,
Raghav

zhoubolei / moments_models Goto Github PK

moments_models's People

Contributors

Stargazers

Watchers

Forkers

moments_models's Issues

Recommend Projects

Recommend Topics

Recommend Org