Git Product home page Git Product logo

continual-inference's Introduction

A Python library for Continual Inference Networks in PyTorch

Quick-startDocsPrinciplesPaperExamplesModulesModel ZooContributeLicense

Continual Inference Networks ensure efficient stream processing

Many of our favorite Deep Neural Network architectures (e.g., CNNs and Transformers) were built with offline-processing for offline processing. Rather than processing inputs one sequence element at a time, they require the whole (spatio-)temporal sequence to be passed as a single input. Yet, many important real-life applications need online predictions on a continual input stream. While CNNs and Transformers can be applied by re-assembling and passing sequences within a sliding window, this is inefficient due to the redundant intermediary computations from overlapping clips.

Continual Inference Networks (CINs) are built to ensure efficient stream processing by employing an alternative computational ordering, which allows sequential computations without the use of sliding window processing. In general, CINs requires approx. L × fewer FLOPs per prediction compared to sliding window-based inference with non-CINs, where L is the corresponding sequence length of a non-CIN network. For more details, check out the videos below describing Continual 3D CNNs [1] and Transformers [2].

News

  • 2022-12-02: ONNX compatibility for all modules is available from v1.0.0. See test_onnx.py for examples.

Quick-start

Install

pip install continual-inference

Example

co modules are weight-compatible drop-in replacement for torch.nn, enhanced with the capability of efficient continual inference:

import torch
import continual as co
                                                           
#                      B, C, T, H, W
example = torch.randn((1, 1, 5, 3, 3))

conv = co.Conv3d(in_channels=1, out_channels=1, kernel_size=(3, 3, 3))

# Same exact computation as torch.nn.Conv3d ✅
output = conv(example)

# But can also perform online inference efficiently 🚀
firsts = conv.forward_steps(example[:, :, :4])
last = conv.forward_step(example[:, :, 4])

assert torch.allclose(output[:, :, : conv.delay], firsts)
assert torch.allclose(output[:, :, conv.delay], last)

# Temporal properties
assert conv.receptive_field == 3
assert conv.delay == 2

See the network composition and model zoo sections for additional examples.

Library principles

Forward modes

The library components feature three distinct forward modes, which are handy for different situations, namely forward, forward_step, and forward_steps:

forward(input)

Performs a forward computation over multiple time-steps. This function is identical to the corresponding module in torch.nn, ensuring cross-compatibility. Moreover, it's handy for efficient training on clip-based data.

         O            (O: output)
         ↑ 
         N            (N: network module)
         ↑ 
 -----------------    (-: aggregation)
 P   I   I   I   P    (I: input frame, P: padding)

forward_step(input, update_state=True)

Performs a forward computation for a single frame and (optionally) updates internal states accordingly. This function performs efficient continual inference.

O+S O+S O+S O+S   (O: output, S: updated internal state)
 ↑   ↑   ↑   ↑ 
 N   N   N   N    (N: network module)
 ↑   ↑   ↑   ↑ 
 I   I   I   I    (I: input frame)

forward_steps(input, pad_end=False, update_state=True)

Performs a forward computation across multiple time-steps while updating internal states for continual inference (if update_state=True). Start-padding is always accounted for, but end-padding is omitted per default in expectance of the next input step. It can be added by specifying pad_end=True. If so, the output-input mapping the exact same as that of forward.

         O            (O: output)
         ↑ 
 -----------------    (-: aggregation)
 O  O+S O+S O+S  O    (O: output, S: updated internal state)
 ↑   ↑   ↑   ↑   ↑
 N   N   N   N   N    (N: network module)
 ↑   ↑   ↑   ↑   ↑
 P   I   I   I   P    (I: input frame, P: padding)

__call__

Per default, the __call__ function operates identically to torch.nn and executes forward. We supply two options for changing this behavior, namely the call_mode property and the call_mode context manager. An example of their use follows:

timeseries = torch.randn(batch, channel, time)
timestep = timeseries[:, :, 0]

net(timeseries)  # Invokes net.forward(timeseries)

# Assign permanent call_mode property
net.call_mode = "forward_step"
net(timestep)  # Invokes net.forward_step(timestep)

# Assign temporary call_mode with context manager
with co.call_mode("forward_steps"):
    net(timeseries)  # Invokes net.forward_steps(timeseries)

net(timestep)  # Invokes net.forward_step(timestep) again

Composition

Continual Inference Networks require strict handling of internal data delays to guarantee correspondence between forward modes. While it is possible to compose neural networks by defining forward, forward_step, and forward_steps manually, correct handling of delays is cumbersome and time-consuming. Instead, we provide a rich interface of container modules, which handles delays automatically. On top of co.Sequential (which is a drop-in replacement of torch.nn.Sequential), we provide modules for handling parallel and conditional dataflow.

  • co.Sequential: Invoke modules sequentially, passing the output of one module onto the next.
  • co.Broadcast: Broadcast one stream to multiple.
  • co.Parallel: Invoke modules in parallel given each their input.
  • co.ParallelDispatch: Dispatch multiple input streams to multiple output streams flexibly.
  • co.Reduce: Reduce multiple input streams to one.
  • co.BroadcastReduce: Shorthand for Sequential(Broadcast, Parallel, Reduce).
  • co.Residual: Residual connection.
  • co.Conditional: Conditionally checks whether to invoke a module (or another) at runtime.

Composition examples:

Residual module

Short-hand:

residual = co.Residual(co.Conv3d(32, 32, kernel_size=3, padding=1))

Explicit:

residual = co.Sequential(
    co.Broadcast(2),
    co.Parallel(
        co.Conv3d(32, 32, kernel_size=3, padding=1),
        co.Delay(2),
    ),
    co.Reduce("sum"),
)
3D MobileNetV2 Inverted residual block

Continual 3D version of the MobileNetV2 Inverted residual block.


MobileNetV2 Inverted residual block. Source: https://arxiv.org/pdf/1801.04381.pdf
mb_conv = co.Residual(
    co.Sequential(
      co.Conv3d(32, 64, kernel_size=(1, 1, 1)),
      nn.BatchNorm3d(64),
      nn.ReLU6(),
      co.Conv3d(64, 64, kernel_size=(3, 3, 3), padding=(1, 1, 1), groups=64),
      nn.ReLU6(),
      co.Conv3d(64, 32, kernel_size=(1, 1, 1)),
      nn.BatchNorm3d(32),
    )
)
3D Squeeze-and-Excitation module

Continual 3D version of the Squeeze-and-Excitation module


Squeeze-and-Excitation block. Scale refers to a broadcasted element-wise multiplication. Adapted from: https://arxiv.org/pdf/1709.01507.pdf
se = co.Residual(
    co.Sequential(
        OrderedDict([
            ("pool", co.AdaptiveAvgPool3d((1, 1, 1), kernel_size=7)),
            ("down", co.Conv3d(256, 16, kernel_size=1)),
            ("act1", nn.ReLU()),
            ("up", co.Conv3d(16, 256, kernel_size=1)),
            ("act2", nn.Sigmoid()),
        ])
    ),
    reduce="mul",
)
3D Inception module

Continual 3D version of the Inception module:


Inception module. Source: https://arxiv.org/pdf/1409.4842v1.pdf
def norm_relu(module, channels):
    return co.Sequential(
        module,
        nn.BatchNorm3d(channels),
        nn.ReLU(),
    )

inception_module = co.BroadcastReduce(
    co.Conv3d(192, 64, kernel_size=1),
    co.Sequential(
        norm_relu(co.Conv3d(192, 96, kernel_size=1), 96),
        norm_relu(co.Conv3d(96, 128, kernel_size=3, padding=1), 128),
    ),
    co.Sequential(
        norm_relu(co.Conv3d(192, 16, kernel_size=1), 16),
        norm_relu(co.Conv3d(16, 32, kernel_size=5, padding=2), 32),
    ),
    co.Sequential(
        co.MaxPool3d(kernel_size=(1, 3, 3), padding=(0, 1, 1), stride=1),
        norm_relu(co.Conv3d(192, 32, kernel_size=1), 32),
    ),
    reduce="concat",
)

Input shapes

We enforce a unified ordering of input dimensions for all library modules, namely:

(batch, channel, time, optional_dim2, optional_dim3)

Outputs

The outputs produces by forward_step and forward_steps are identical to those of forward, provided the same data was input beforehand and state update was enabled. We know that input and output shapes aren't necessarily the same when using forward in the PyTorch library, and generally depends on padding, stride and receptive field of a module.

For the forward_step function, this comes to show by some None-valued outputs. Specifically, modules with a delay (i.e. with receptive fields larger than the padding + 1) will produce None until the input count exceeds the delay. Moreover, stride > 1 will produce Tensor outputs every stride steps and None the remaining steps. A visual example is shown below:


A mixed example of delay and outputs under padding and stride. Here, we illustrate the step-wise operation of two co module layers, l1 with with receptive_field = 3, padding = 2, and stride = 2 and l2 with receptive_field = 3, no padding and stride = 1. ⧇ denotes a padded zero, ■ is a non-zero step-feature, and ☒ is an empty output.

For more information, please see the library paper.

Handling state

During stream processing, network modules which operate over multiple time-steps, e.g., a convolution with kernel_size > 1 in the temporal dimension, will aggregate and cache state internally. Each module has its own local state, which can be inspected using module.get_state(). During forward_step and forward_steps, the state is updated unless the forward_step(s) is invoked with an update_state = False argument.

A state cleanup can be accomplished via module.clean_state().

Module library

Continual Inference features a rich collection of modules for defining Continual Inference Networks. Specific care was taken to create CIN versions of the PyTorch modules found in torch.nn:

Convolutions
Pooling
Linear
Recurrent
Transformers

Modules for composing and converting networks. Both composition and utility modules can be used for regular definition of PyTorch modules as well.

Composition modules
  • co.Sequential: Invoke modules sequentially, passing the output of one module onto the next.
  • co.Broadcast: Broadcast one stream to multiple.
  • co.Parallel: Invoke modules in parallel given each their input.
  • co.ParallelDispatch: Dispatch multiple input streams to multiple output streams flexibly.
  • co.Reduce: Reduce multiple input streams to one.
  • co.BroadcastReduce: Shorthand for Sequential(Broadcast, Parallel, Reduce).
  • co.Residual: Residual connection.
  • co.Conditional: Conditionally checks whether to invoke a module (or another) at runtime.
Utility modules
  • co.Delay: Pure delay module (e.g. needed in residuals).
  • co.Skip: Skip a predefined number of input steps.
  • co.Reshape: Reshape non-temporal dimensions.
  • co.Lambda: Lambda module which wraps any function.
  • co.Constant: Maps input to and output with constant value.
  • co.Zero: Maps input to output of zeros.
  • co.One: Maps input to output of ones.
Converters
  • co.continual: conversion function from torch.nn modules to co modules.
  • co.forward_stepping: functional wrapper, which enhances temporally local torch.nn modules with the forward_stepping functions.

We support drop-in interoperability with with the following torch.nn modules:

Activation
  • nn.Threshold
  • nn.ReLU
  • nn.RReLU
  • nn.Hardtanh
  • nn.ReLU6
  • nn.Sigmoid
  • nn.Hardsigmoid
  • nn.Tanh
  • nn.SiLU
  • nn.Hardswish
  • nn.ELU
  • nn.CELU
  • nn.SELU
  • nn.GLU
  • nn.GELU
  • nn.Hardshrink
  • nn.LeakyReLU
  • nn.LogSigmoid
  • nn.Softplus
  • nn.Softshrink
  • nn.PReLU
  • nn.Softsign
  • nn.Tanhshrink
  • nn.Softmin
  • nn.Softmax
  • nn.Softmax2d
  • nn.LogSoftmax
Normalization
  • nn.BatchNorm1d
  • nn.BatchNorm2d
  • nn.BatchNorm3d
  • nn.GroupNorm,
  • nn.InstanceNorm1d (affine=True, track_running_stats=True required)
  • nn.InstanceNorm2d (affine=True, track_running_stats=True required)
  • nn.InstanceNorm3d (affine=True, track_running_stats=True required)
  • nn.LayerNorm (only non-temporal dimensions must be specified)
Dropout
  • nn.Dropout
  • nn.Dropout1d
  • nn.Dropout2d
  • nn.Dropout3d
  • nn.AlphaDropout
  • nn.FeatureAlphaDropout

Model Zoo and Benchmarks

Continual 3D CNNs

Benchmark results for 1-view testing on Kinetics400. For reference, X3D-L scores 69.3% top-1 acc with 19.2 GFLOPs per prediction.

Arch Avg. pool size Top 1 (%) FLOPs (G) per step FLOPs reduction Params (M) Code Weights
CoX3D-L 64 71.6 1.25 15.3x 6.2 link link
CoX3D-M 64 71.0 0.33 15.1x 3.8 link link
CoX3D-S 64 64.7 0.17 12.1x 3.8 link link
CoSlow 64 73.1 6.90 8.0x 32.5 link link
CoI3D 64 64.0 5.68 5.0x 28.0 link link

FLOPs reduction is noted relative to non-continual inference. Note that on-hardware inference doesn't reach the same speedups as "FLOPs reductions" might suggest due to overhead of state reads and writes. This overhead is less important for large batch sizes. This applies to all models in the model zoo.

Continual ST-GCNs

Benchmark results for on NTU RGB+D 60 for the joint modality. For reference, ST-GCN achieves 86% X-Sub and 93.4 X-View accuracy with 16.73 GFLOPs per prediction.

Arch Receptive field X-Sub Acc (%) X-View Acc (%) FLOPs (G) per step FLOPs reduction Params (M) Code
CoST-GCN 300 86.3 93.8 0.16 107.7x 3.1 link
CoA-GCN 300 84.1 92.6 0.17 108.7x 3.5 link
CoST-GCN 300 86.3 92.4 0.15 107.6x 3.1 link

Here, you can download pre-trained,model weights for the above architectures on NTU RGB+D 60, NTU RGB+D 120, and Kinetics-400 on joint and bone modalities.

Continual Transformers

Benchmark results for on THUMOS14 on top of features extracted using a TSN-ResNet50 backbone pre-trained on Kinetics400. For reference, OadTR achieves 64.4 % mAP with 2.5 GFLOPs per prediction.

Arch Receptive field mAP (%) FLOPs (G) per step Params (M) Code
CoOadTR-b1 64 64.2 0.41 15.9 link
CoOadTR-b2 64 64.4 0.01 9.6 link

The library features complete implementations of the one- and two-block continual transformer encoders as well.

Compatibility

The library modules are built to integrate seamlessly with other PyTorch projects. Specifically, extra care was taken to ensure out-of-the-box compatibility with:

Citation

@inproceedings{hedegaard2022colib,
  title={Continual Inference: A Library for Efficient Online Inference with Deep Neural Networks in PyTorch},
  author={Lukas Hedegaard and Alexandros Iosifidis},
  booktitle={European Conference on Computer Vision Workshops (ECCVW)},
  year={2022}
}

Acknowledgement

This work has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871449 (OpenDR).

continual-inference's People

Contributors

lukashedegaard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

continual-inference's Issues

Auto shrink

Hi there again,

is this actually correct?

return input[:, :, self.delay : -self.delay]

For my understanding it does not make sense that for auto_shrink="lagging" we throw away self.delay samples, but for auto_shrink="centered" we throw away 2*self.delay samples?!

I also think a third option

if self.auto_shrink == "online":
    return input[:, :, self.delay:]

would make sense, at least I could use it ;)

Clear memory

Hi @LukasHedegaard,

Is there any way to quickly reset the memory when I change the visual context, for example, I fed one video as input and then changed it to another. At the same time I want to reset the delay and ignore the previous context.
Should I fill this delay with empty frames? Forgive me in advance if I missed it in the documentation.

Thank you

Advanced routing

In advanced use cases, it would be beneficial to have a set of modules, which can handle parallel streams.
The old co.Parallel essentially performs a BroadcastReduce operation, and could be named accordingly.

A new Parallel module would then take a sequence of tensors, and map one to each of its children, repacking the outputs in a list.
Moreover, we would need

  • Broadcast: one to multiple streams
  • Reduce: multiple to one stream
    and could consider
  • Route: many to may mapping of parallel streams
  • Shuffle: shuffle order of streams
  • Split: Split input into multiple streams (e.g. by channel)

Context managers

An idea worth considering is to create context managers to change network behaviour.

with co.no_temporal_padding():
    y = net(x)

with co.temporal_pooling_size(16):
    y = net(x)

Docs

The project doesn't supply any documentation other than the README (and the code itself).

Before Reaching v1.0.0, this is an absolute must-have.

Continual pooling should use same interface as torch.nn pooling

Currently, separate temporal_xxx parameters must be specified in the definition of pooling modules, e.g.:

nn.MaxPool3d(kernel_size=(2, 2, 2))

co.MaxPool3d(temporal_kernel_size=2, kernel_size=(2, 2))

Ideally, both of the above should be initialised as MaxPool3d(kernel_size=(2, 2, 2))

Receptive field property

Motivation

It would be nice to have a receptive field property, as this has important implications for the model dynamics.

Behaviour

Discussion

It may not be feasible to implement it for other dimensions than the temporal.
Adaptive average pool in the spatial dimensions would effectively make the receptive field infinite.
In the temporal dimension, the receptive field would be finite for all current modules (v0.11.0). Adding RNN support (#4), would make the temporal receptive field infinite in some cases as well.

Proposed behaviour

If the receptive field is finite, return an integer value. If it is infinite, return math.inf.

Implementation

The property's implementation could be close to that of the delay property.

TorchScript support

Hi,

are there any plans to make the continual model (i.e. with call_mode == "forward_step") exportable as TorchScript?
My use case would be using the framework in a C++ environment.
I guess the TensorPlaceholder concept would need to be changed?

Best

Change default value of pad_end to False

Currently, the pad_end option in forward_steps is set to True by default.

Since forward_steps is mostly intended as an initialisation function or to catch up with lost computational steps, it would make better sense to not return values corresponding to a padded end.

The pad_end option is mostly there to serve as an easy way to check the forward and fordward_steps implementations against one another.

Additional BatchNorm modules

As of yet, there is only a single BatchNorm module, which hasn't been tested thoroughly.
Additional modules and tests should be added.

Weight loading

Hi Lukas,

I've played around a bit with your framework and it looks great and speeds up my inference stage a lot.

But I'm not really sure what's the optimal way to train the model and then load the weights for inference.

What I've done so far:

  • I have a Pytorch (non-continual) model with trained weights (all layers are supported by your framework and there is no padding on the time-dimension)
  • I have (re-)implemented the model in your framework.
  • I realized I cant just load the weights from the Pytorch model into the continual model, since the weight names are different.
  • I have manually mapped the weights from the Pytorch model to the weights of the continual model. This works now but it's really a big mess and it took me way too long.

Is there an easier way? How are you doing it? ;)

Best wishes,
Sean

Container instance naming

For large networks built with the Sequential and BroadcastReduce modules, the __repr__ string may become very cluttered.
It might be nice to have the option of overloading this __repr__ with a custom name.

E.g.

my_module = co.Sequential(
    ...,
    name="MyModule",
)

assert my_module.__repr__() == "MyModule()"

Support GroupNorm

I've been wondering whether this framework supports GroupNorm?
Since LayerNorm is supported, it seems to me that it might be at least possible?

Great work by the way

Containers don't account for TensorPlaceholder

Currently, the container modules do not take into account whether a contained module output a TensorPlaceholder during a step rather than an actual Tensor.

This makes co.Sequential unusable for modules with temporal stride larger than 1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.