torchvideo / torchvideo Goto Github PK

View Code? Open in Web Editor NEW

86.0 7.0 11.0 262 KB

:movie_camera: Datasets, transforms and samplers for video in PyTorch

Home Page: https://torchvideo.rtfd.org

License: Mozilla Public License 2.0

Makefile 0.49% Python 98.94% Shell 0.57%

pytorch video gulpio lintel machine-learning dataset samplers transformations

torchvideo's Introduction

torchvideo

A PyTorch library for video-based computer vision tasks. torchvideo provides dataset loaders specialised for video, video frame samplers, and transformations specifically for video.

Get started

Set up an accelerated environment in conda

$ conda env create -f environment.yml -n torchvideo
$ conda activate torchvideo

# The following steps are taken from
# https://docs.fast.ai/performance.html#installation

$ conda uninstall -y --force pillow pil jpeg libtiff
$ pip uninstall -y pillow pil jpeg libtiff
$ conda install -y -c conda-forge libjpeg-turbo
$ CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
$ conda install -y jpeg libtiff

NOTE: If the installation of pillow-simd fails, you can try installing GCC from conda-forge and trying the install again:

$ conda install -y gxx_linux-64
$ export CXX=x86_64-conda_cos6-linux-gnu-g++
$ export CC=x86_64-conda_cos6-linux-gnu-gcc
$ CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
$ conda install -y jpeg libtiff

If you install any new packages, check that pillow-simd hasn't be overwritten by an alternate pillow install by running:

$ python -c "from PIL import Image; print(Image.PILLOW_VERSION)"

You should see something like

6.0.0.post0

Pillow doesn't release with post suffixes, so if you have post in the version name, it's likely you have pillow-simd installed.

Install torchvideo

$ pip install git+https://github.com/willprice/torchvideo.git@master

Learn how to use `torchvideo`

Check out the example notebooks, you can launch these on binder without having to install anything locally!

Acknowledgements

Thanks to the following people and projects

yjxiong for his work on TSN and publicly available pytorch implementation from which many of the transforms in this project started from.
dukebw for his excellent lintel FFmpeg video loading library.
hypothesis and the team behind it. This has been used heavily in testing the project.

torchvideo's People

Contributors

Stargazers

Watchers

Forkers

alexandonian bityangke stanxii teymour-aldridge magicknight bradevandavis tchang1997 bryan85le d4l3k daninem

torchvideo's Issues

Label set: CSV file

CSV file backed label set

Don't raise exceptions from samplers

If a video has too few frames, we shouldn't throw an exception; this kills the process, it should be up to the user what to do, either discard the video or oversample it somehow. Consider how oversampling should work for each of the samplers.

Improve hypothesis sampling to prevent errors

=================================== FAILURES ===================================
___________ TestClipSampler.test_produces_frame_idx_of_given_length ____________
[gw1] linux -- Python 3.7.1 /home/travis/miniconda/envs/torchvideo/bin/python
self = <test_samplers.TestClipSampler object at 0x7fe8e94c1780>
    @given(st.integers(1, 100), st.integers(1, 100), st.integers(1, 5))
>   def test_produces_frame_idx_of_given_length(
        self, video_length, clip_length, step_size
    ):
E   hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check found 50 filtered examples but only 9 good ones. This will make your tests much slower, and also will probably distort the data generation quite a lot. You should adapt your strategy to filter less. This can also be caused by a low max_leaves parameter in recursive() calls
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.filter_too_much to the suppress_health_check settings for this test.
tests/unit/samplers/test_samplers.py:50: FailedHealthCheck

https://travis-ci.org/willprice/torchvideo/jobs/476864126

Label set: Dummy

Dummy label set for unlabelled dataset

Add List[Image] support to `datasets.vid.show_video`

Support TCHW ordering in PILVideoToTensor

Issues while installing

Hi! The below 2 errors come up when installing from source

ERROR: Could not find a version that satisfies the requirement gulpio>=540.66 (from torchvideo) (from versions: none)

TemporalSegmentSampler doesn't contain test kwarg in its repr

Add dataset transform tests

VideoFolderDataset wasn't calling transform on the frames. Ensure that this is being called by adding unit tests

Add Flow JPEG dataset

Support loading frames from datasets stored in one of the following formats:

dataset_root
├─ video_1
│  └─ u
│     ├─ frame_000001.jpg
│     ├─ ...
│     └─ frame_000010.jpg
└─ video_2
   └─ v
      ├─ frame_000001.jpg
      ├─ ...
      └─ frame_000010.jpg

dataset_root
├─ u
│  ├─ frame_u_000001.jpg
│  ├─ frame_v_000001.jpg
│  ├─ ...
│  ├─ frame_u_000010.jpg
│  └─ frame_v_000010.jpg

dataset_root
├─ u
│  ├─ video_1
│  │  ├─ frame_000001.jpg
│  │  ├─ ...
│  │  └─ frame_000010.jpg
│  └─ video_2
│     ├─ frame_000001.jpg
│     ├─ ...
│     └─ frame_000010.jpg
└─ v
   ├─ video_1
   │  ├─ frame_000001.jpg
   │  ├─ ...
   │  └─ frame_000010.jpg
   └─ video_2
      ├─ frame_000001.jpg
      ├─ ...
      └─ frame_000010.jpg

Some users also use x and y instead of u and v, so we should also support that too.

The dataset should yield a List[PIL.Image] containing u frames at even indices and v frames at odd indices.

The samplers should still work with this dataset like others, i.e. the frame indices used for loading RGB images should load the corresponding flow images with this dataset.

Support optional target_transfrom in transforms API

Add mypy to CI

Error applying NormalizeVideo transform to ImageFolderVideoDataset

Raising AttributeError when trying to apply NormalizeVideo transform. Notebook showing minimal reproducible code and output is here: https://gist.github.com/gngdb/3101d6078f711b8d2e3fc2f77da33390

Or, as a script:

import tempfile
import os
from PIL import Image
import numpy as np
from torchvideo import datasets, transforms
from pathlib import Path

def generate_image():
    x = np.uint8(np.random.rand(112, 112, 3)*255.)
    return Image.fromarray(x)

def generate_data(tempdir):
    for i in range(2):
        clip_dir = os.path.join(tempdir, f"clip{i}")
        os.mkdir(clip_dir)
        for t in range(1,5):
            im = generate_image()
            impath = os.path.join(tempdir, f"clip{i}", f"frame_{t:03d}.jpg")
            im.save(impath)

with tempfile.TemporaryDirectory() as tmpdirname:
    generate_data(tmpdirname)
    mean = [0.43216, 0.394666, 0.37645]
    std = [0.22803, 0.22145, 0.216989]
    transform = transforms.NormalizeVideo(mean, std)
    dataset = datasets.ImageFolderVideoDataset(tmpdirname, "frame_{:03d}.jpg", transform=transform)
    print(dataset[0])

Add `target_transform` to dataset classes

VideoDataset filtering

Support filtering which videos within a folder are part of the dataset (e.g. train/test/val split all videos in same folder)

VideoFolderDataset
ImageFolderVideoDataset
GulpVideoDataset

Compose doesn't have a readable repr

Compose should provide a valid repr

Unify video_id API of datasets

Current we have video_dirs video_paths and _video_ids for unique video identification in each dataset. We want to unify this and expose it publicly (probably also make it a requirement on the super class)

PyPI package is empty

They PyPI package doesn't contain anything.

Support passing in GulpDirectory to GulpVideoDataset

The current implementation sucks if you want to define a label set on a GulpDirectory, it's very difficult as you have to first instantiate a GulpDirectory, then create your label set on this, then pass that in to the GulpVideoDataset, which then creates its own GulpDirectory.

Manually test gulp video dataset

ClipSampler should support deterministic testing

ClipSampler should support a test mode which samples the central clip from a video

Label set: File list

Caffe style file list backed label set

Set up circle ci

MultiScaleCropVideo random offset are incorrect

The random offsets for crops sampled in MultiScaleCropVideo are incorrect, they return w_offset, h_offset

torchvideo/src/torchvideo/transforms/transforms/multiscale_crop_video.py

Line 155 in 0b260c5

return w_offset, h_offset

instead of h_offset, w_offset as is expected when the return value is unpacked

torchvideo/src/torchvideo/transforms/transforms/multiscale_crop_video.py

Line 128 in 0b260c5

h_offset, w_offset = offset

Access to labels from dataset object

It'd be nice to get the label for an example by index, however currently the labels are pulled on the fly. This is fine most of the time, and pre-computing labels for examples up front might be desirable in the action recognition case, however segmentation masks and the like are quite expensive and large, so these might cause issues.

Run examples on travis

Currently examples are broken due to show_video moving package, we should run the examples on travis to ensure they always work.

Manually test video folder dataset

Make example notebooks runnable on binder

Write targeted unit tests for MultiScaleCropVideo

Whilst we basically have a fuzzing test now that ensures that the cropped video is of the correct length, width and height, we're lacking tests that actually ensure the settings do what they claim to.

This transform might be too large, i.e. doing too many things, perhaps we need to split it up?
This transform has quite a lot of random behaviour and there's no clean separation between sampling the random data and using it, if we can separate it then we can check the randomly generated data is within bounds and also test that the function is doing what we desire using a deterministic interface.

Support horizontal flipping of flow

Determine flow strategy

This issue covers the broad strategy for supporting optical flow in the library. We can split the concerns that need to be considered into two broad categories: data loading and transforms.

Data loading

Optical flow for action recognition is typically stored as separate u/v grayscale JPEGs. This means we have to support loading data of this format, but we should also step back and consider whether this is optimal, if not we should propose alternative data storage strategies and support these too.

Data storage

An optical flow F describes the apparent motion between two frames I_t and I_{t+1} in terms of pixels. F[x, y] is a 2 element real motion vector. Typically F is split into 2 separate components, motion in the x (or u) direction, and motion in the y (or v) direction. These are then quantised in the range [0, 255] for storage as greyscale JPEGs as storing these fields uncompressed is prohibitive. Using image formats for storing the flow fields are definitely the way to go.

CNNs typically take in stacked (u, v) flow frames, so you always want the u, v frame pair together. Most people store these frames as separate files on disk which is suboptimal given that we know that if we want to load one we will want to load the other and hence we should aggregate the files into a single file in some way.

Regardless of the optimal format, we have to support storing u, v frames as separate files as this is how most people have their data and we don't want to force them to restructure it.

When using GulpIO, we have previously stored flow in a [u_0, v_0, u_1, v_1, ...] format.

Transformations

Some of the transforms are different for optical flow frames than RGB. Cropping is the same for optical flow frames as RGB, but horizontal flipping isn't.

For a (u, v) flow frame pair with values ranging from 0--255, each of size W x H, indexed with i[x, y], the horizontal flip is:

v_flip[x, y] = v[W - x, y]
h_flip[x, y] = 255 - h[W - x, y]

The reason that the u flow frame is 'negated' is since you are effectively flipping the axis and thus to preserve consistency, the flow vector horizontal component also needs to be flipped.

Whilst this is only a single transform, it does highlight the need to despatch different transform variants based on modality which may also crop up in future with new transforms that do different things to u/v flow components.

Flow video representation

RGB videos are represented as List[PIL.Image]. To maintain maximum compatibility with RGB transforms, we should probably use the same format for flow videos. These videos will be approximately twice as long as their RGB counterparts as we will be storing alternating u, v frames. u frames will be stored at even indices and v frames at odd indices, following the original proposal in 2SCNN (note they use 1-based indexing so they store u frames in odd indices and v frame in even indices)

Concrete actions

Implement data loading from separate u, v flow frames (#12)
Implement data loading from flow stored in [u_0, v_0, u_1, v_1, ...] format in GulpIO. (#30)
Implement horizontal flipping for flow (#31)

Support new torchvision.video.io backend

Add TSN transforms

https://github.com/metalbubble/TRN-pytorch/blob/master/transforms.py

Create optical flow demo notebook

Create a notebook demonstrating the library features supporting optical flow.

Make rescaling to 0-1 optional when creating a tensor from a PIL video

Add IdentityTransform

An IdentityTransform would be useful as using lambda xs: xs shows up with a useless repr when printing out a Compose-d transform pipeline, whereas IdentityTransform lets the user know that no transform is being applied.

_get_videofile_frame_count will get N/A ins some case

https://github.com/torchvideo/torchvideo/blob/master/src/torchvideo/internal/readers.py#L71

According to this https://stackoverflow.com/questions/2017843/fetch-frame-count-with-ffmpeg, Not all formats (such as Matroska) will report the number of frames resulting in the output of N/A.

My video information:

ffprobe version 4.0 Copyright (c) 2007-2018 the FFmpeg developers
  built with gcc 7.2.0 (crosstool-NG fa8859cb)
  configuration: --prefix=/opt/conda --cc=/opt/conda/conda-bld/ffmpeg_1531088893642/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-shared --enable-static --enable-zlib --enable-pic --enable-gpl --enable-version3 --disable-nonfree --enable-hardcoded-tables --enable-avresample --enable-libfreetype --disable-openssl --disable-gnutls --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --disable-libx264
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
  libpostproc    55.  1.100 / 55.  1.100
Input #0, flv, from 'xxx.flv':
  Metadata:
    metadatacreator : Agora.io SDK
    encoder         : Agora.io Encoder
  Duration: 00:07:31.22, start: 0.000000, bitrate: 1335 kb/s
    Stream #0:0: Audio: aac (LC), 44100 Hz, mono, fltp
    Stream #0:1: Video: h264 (High), yuv420p(progressive), 480x864, 20 fps, 20 tbr, 1k tbn, 40 tbc

Add tools for dealing with file lists

File lists were originally part of Caffe, subsequently used by TSN, TRN, and TSM. Since they're quite prevalent we should provide tools for extracting the number of frames and label of examples.

Lintel video loader

Read frame size and height from video container