Git Product home page Git Product logo

torchvideo's Introduction

torchvideo

Build Status PyPI versions codecov Documentation Status

A PyTorch library for video-based computer vision tasks. torchvideo provides dataset loaders specialised for video, video frame samplers, and transformations specifically for video.

Get started

Set up an accelerated environment in conda

$ conda env create -f environment.yml -n torchvideo
$ conda activate torchvideo

# The following steps are taken from
# https://docs.fast.ai/performance.html#installation

$ conda uninstall -y --force pillow pil jpeg libtiff
$ pip uninstall -y pillow pil jpeg libtiff
$ conda install -y -c conda-forge libjpeg-turbo
$ CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
$ conda install -y jpeg libtiff

NOTE: If the installation of pillow-simd fails, you can try installing GCC from conda-forge and trying the install again:

$ conda install -y gxx_linux-64
$ export CXX=x86_64-conda_cos6-linux-gnu-g++
$ export CC=x86_64-conda_cos6-linux-gnu-gcc
$ CFLAGS="${CFLAGS} -mavx2" pip install --upgrade --no-cache-dir --force-reinstall --no-binary :all: --compile pillow-simd
$ conda install -y jpeg libtiff

If you install any new packages, check that pillow-simd hasn't be overwritten by an alternate pillow install by running:

$ python -c "from PIL import Image; print(Image.PILLOW_VERSION)"

You should see something like

6.0.0.post0

Pillow doesn't release with post suffixes, so if you have post in the version name, it's likely you have pillow-simd installed.

Install torchvideo

$ pip install git+https://github.com/willprice/torchvideo.git@master

Learn how to use torchvideo

Check out the example notebooks, you can launch these on binder without having to install anything locally!

Acknowledgements

Thanks to the following people and projects

  • yjxiong for his work on TSN and publicly available pytorch implementation from which many of the transforms in this project started from.
  • dukebw for his excellent lintel FFmpeg video loading library.
  • hypothesis and the team behind it. This has been used heavily in testing the project.

torchvideo's People

Contributors

d4l3k avatar willprice avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

torchvideo's Issues

Don't raise exceptions from samplers

If a video has too few frames, we shouldn't throw an exception; this kills the process, it should be up to the user what to do, either discard the video or oversample it somehow. Consider how oversampling should work for each of the samplers.

Improve hypothesis sampling to prevent errors

=================================== FAILURES ===================================
___________ TestClipSampler.test_produces_frame_idx_of_given_length ____________
[gw1] linux -- Python 3.7.1 /home/travis/miniconda/envs/torchvideo/bin/python
self = <test_samplers.TestClipSampler object at 0x7fe8e94c1780>
    @given(st.integers(1, 100), st.integers(1, 100), st.integers(1, 5))
>   def test_produces_frame_idx_of_given_length(
        self, video_length, clip_length, step_size
    ):
E   hypothesis.errors.FailedHealthCheck: It looks like your strategy is filtering out a lot of data. Health check found 50 filtered examples but only 9 good ones. This will make your tests much slower, and also will probably distort the data generation quite a lot. You should adapt your strategy to filter less. This can also be caused by a low max_leaves parameter in recursive() calls
E   See https://hypothesis.readthedocs.io/en/latest/healthchecks.html for more information about this. If you want to disable just this health check, add HealthCheck.filter_too_much to the suppress_health_check settings for this test.
tests/unit/samplers/test_samplers.py:50: FailedHealthCheck

https://travis-ci.org/willprice/torchvideo/jobs/476864126

Issues while installing

Hi! The below 2 errors come up when installing from source

  1. ERROR: Could not find a version that satisfies the requirement gulpio>=540.66 (from torchvideo) (from versions: none)

Add dataset transform tests

VideoFolderDataset wasn't calling transform on the frames. Ensure that this is being called by adding unit tests

Add Flow JPEG dataset

Support loading frames from datasets stored in one of the following formats:

dataset_root
├─ video_1
│  └─ u
│     ├─ frame_000001.jpg
│     ├─ ...
│     └─ frame_000010.jpg
└─ video_2
   └─ v
      ├─ frame_000001.jpg
      ├─ ...
      └─ frame_000010.jpg
dataset_root
├─ u
│  ├─ frame_u_000001.jpg
│  ├─ frame_v_000001.jpg
│  ├─ ...
│  ├─ frame_u_000010.jpg
│  └─ frame_v_000010.jpg
dataset_root
├─ u
│  ├─ video_1
│  │  ├─ frame_000001.jpg
│  │  ├─ ...
│  │  └─ frame_000010.jpg
│  └─ video_2
│     ├─ frame_000001.jpg
│     ├─ ...
│     └─ frame_000010.jpg
└─ v
   ├─ video_1
   │  ├─ frame_000001.jpg
   │  ├─ ...
   │  └─ frame_000010.jpg
   └─ video_2
      ├─ frame_000001.jpg
      ├─ ...
      └─ frame_000010.jpg

Some users also use x and y instead of u and v, so we should also support that too.

The dataset should yield a List[PIL.Image] containing u frames at even indices and v frames at odd indices.

The samplers should still work with this dataset like others, i.e. the frame indices used for loading RGB images should load the corresponding flow images with this dataset.

Error applying NormalizeVideo transform to ImageFolderVideoDataset

Raising AttributeError when trying to apply NormalizeVideo transform. Notebook showing minimal reproducible code and output is here: https://gist.github.com/gngdb/3101d6078f711b8d2e3fc2f77da33390

Or, as a script:

import tempfile
import os
from PIL import Image
import numpy as np
from torchvideo import datasets, transforms
from pathlib import Path

def generate_image():
    x = np.uint8(np.random.rand(112, 112, 3)*255.)
    return Image.fromarray(x)

def generate_data(tempdir):
    for i in range(2):
        clip_dir = os.path.join(tempdir, f"clip{i}")
        os.mkdir(clip_dir)
        for t in range(1,5):
            im = generate_image()
            impath = os.path.join(tempdir, f"clip{i}", f"frame_{t:03d}.jpg")
            im.save(impath)

with tempfile.TemporaryDirectory() as tmpdirname:
    generate_data(tmpdirname)
    mean = [0.43216, 0.394666, 0.37645]
    std = [0.22803, 0.22145, 0.216989]
    transform = transforms.NormalizeVideo(mean, std)
    dataset = datasets.ImageFolderVideoDataset(tmpdirname, "frame_{:03d}.jpg", transform=transform)
    print(dataset[0])

VideoDataset filtering

Support filtering which videos within a folder are part of the dataset (e.g. train/test/val split all videos in same folder)

  • VideoFolderDataset
  • ImageFolderVideoDataset
  • GulpVideoDataset

Unify video_id API of datasets

Current we have video_dirs video_paths and _video_ids for unique video identification in each dataset. We want to unify this and expose it publicly (probably also make it a requirement on the super class)

Support passing in GulpDirectory to GulpVideoDataset

The current implementation sucks if you want to define a label set on a GulpDirectory, it's very difficult as you have to first instantiate a GulpDirectory, then create your label set on this, then pass that in to the GulpVideoDataset, which then creates its own GulpDirectory.

Access to labels from dataset object

It'd be nice to get the label for an example by index, however currently the labels are pulled on the fly. This is fine most of the time, and pre-computing labels for examples up front might be desirable in the action recognition case, however segmentation masks and the like are quite expensive and large, so these might cause issues.

Run examples on travis

Currently examples are broken due to show_video moving package, we should run the examples on travis to ensure they always work.

Write targeted unit tests for MultiScaleCropVideo

Whilst we basically have a fuzzing test now that ensures that the cropped video is of the correct length, width and height, we're lacking tests that actually ensure the settings do what they claim to.

  1. This transform might be too large, i.e. doing too many things, perhaps we need to split it up?
  2. This transform has quite a lot of random behaviour and there's no clean separation between sampling the random data and using it, if we can separate it then we can check the randomly generated data is within bounds and also test that the function is doing what we desire using a deterministic interface.

Determine flow strategy

This issue covers the broad strategy for supporting optical flow in the library. We can split the concerns that need to be considered into two broad categories: data loading and transforms.

Data loading

Optical flow for action recognition is typically stored as separate u/v grayscale JPEGs. This means we have to support loading data of this format, but we should also step back and consider whether this is optimal, if not we should propose alternative data storage strategies and support these too.

Data storage

An optical flow F describes the apparent motion between two frames I_t and I_{t+1} in terms of pixels. F[x, y] is a 2 element real motion vector. Typically F is split into 2 separate components, motion in the x (or u) direction, and motion in the y (or v) direction. These are then quantised in the range [0, 255] for storage as greyscale JPEGs as storing these fields uncompressed is prohibitive. Using image formats for storing the flow fields are definitely the way to go.

CNNs typically take in stacked (u, v) flow frames, so you always want the u, v frame pair together. Most people store these frames as separate files on disk which is suboptimal given that we know that if we want to load one we will want to load the other and hence we should aggregate the files into a single file in some way.

Regardless of the optimal format, we have to support storing u, v frames as separate files as this is how most people have their data and we don't want to force them to restructure it.

When using GulpIO, we have previously stored flow in a [u_0, v_0, u_1, v_1, ...] format.

Transformations

Some of the transforms are different for optical flow frames than RGB. Cropping is the same for optical flow frames as RGB, but horizontal flipping isn't.

For a (u, v) flow frame pair with values ranging from 0--255, each of size W x H, indexed with i[x, y], the horizontal flip is:

  • v_flip[x, y] = v[W - x, y]
  • h_flip[x, y] = 255 - h[W - x, y]

The reason that the u flow frame is 'negated' is since you are effectively flipping the axis and thus to preserve consistency, the flow vector horizontal component also needs to be flipped.

Whilst this is only a single transform, it does highlight the need to despatch different transform variants based on modality which may also crop up in future with new transforms that do different things to u/v flow components.

Flow video representation

RGB videos are represented as List[PIL.Image]. To maintain maximum compatibility with RGB transforms, we should probably use the same format for flow videos. These videos will be approximately twice as long as their RGB counterparts as we will be storing alternating u, v frames. u frames will be stored at even indices and v frames at odd indices, following the original proposal in 2SCNN (note they use 1-based indexing so they store u frames in odd indices and v frame in even indices)

Concrete actions

  • Implement data loading from separate u, v flow frames (#12)
  • Implement data loading from flow stored in [u_0, v_0, u_1, v_1, ...] format in GulpIO. (#30)
  • Implement horizontal flipping for flow (#31)

Add IdentityTransform

An IdentityTransform would be useful as using lambda xs: xs shows up with a useless repr when printing out a Compose-d transform pipeline, whereas IdentityTransform lets the user know that no transform is being applied.

_get_videofile_frame_count will get N/A ins some case

https://github.com/torchvideo/torchvideo/blob/master/src/torchvideo/internal/readers.py#L71

According to this https://stackoverflow.com/questions/2017843/fetch-frame-count-with-ffmpeg, Not all formats (such as Matroska) will report the number of frames resulting in the output of N/A.

My video information:

ffprobe version 4.0 Copyright (c) 2007-2018 the FFmpeg developers
  built with gcc 7.2.0 (crosstool-NG fa8859cb)
  configuration: --prefix=/opt/conda --cc=/opt/conda/conda-bld/ffmpeg_1531088893642/_build_env/bin/x86_64-conda_cos6-linux-gnu-cc --disable-doc --enable-shared --enable-static --enable-zlib --enable-pic --enable-gpl --enable-version3 --disable-nonfree --enable-hardcoded-tables --enable-avresample --enable-libfreetype --disable-openssl --disable-gnutls --enable-libvpx --enable-pthreads --enable-libopus --enable-postproc --disable-libx264
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
  libpostproc    55.  1.100 / 55.  1.100
Input #0, flv, from 'xxx.flv':
  Metadata:
    metadatacreator : Agora.io SDK
    encoder         : Agora.io Encoder
  Duration: 00:07:31.22, start: 0.000000, bitrate: 1335 kb/s
    Stream #0:0: Audio: aac (LC), 44100 Hz, mono, fltp
    Stream #0:1: Video: h264 (High), yuv420p(progressive), 480x864, 20 fps, 20 tbr, 1k tbn, 40 tbc

Add tools for dealing with file lists

File lists were originally part of Caffe, subsequently used by TSN, TRN, and TSM. Since they're quite prevalent we should provide tools for extracting the number of frames and label of examples.

Optimise travis build

Travis takes ages to build... maybe try caching conda env as this takes ~5 mins (although caching might take just as long)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.