zer0n / deepframeworks Goto Github PK

Evaluation of Deep Learning Frameworks

deepframeworks's Introduction

Evaluation of Deep Learning Toolkits

Warning: this research was done in late 2015 with slight modifications in early 2016. Many toolkits have improved significantly since then.

Abstract. In this study, I evaluate some popular deep learning toolkits. The candidates are listed in alphabetical order: Caffe, CNTK, TensorFlow, Theano, and Torch.

I also provide ratings in some areas because for a lot of people, ratings are useful. However, keep in mind that ratings are inherently subjective [1].

If you find something wrong or inadequate, please help improve by filing an issue.

Table of contents

Modeling Capability

Interfaces
Model Deployment
Performance
Architecture
Ecosystem
Cross-platform

Modeling Capability

In this section, we evaluate each toolkit's ability to train common and state-of-the-art networks without writing too much code. Some of these networks are:

ConvNets: AlexNet, OxfordNet, GoogleNet
RecurrentNets: plain RNN, LSTM/GRU, bidirectional RNN
Sequential modeling with attention.

In addition, we also evaluate the flexibility to create a new type of model.

Caffe

Caffe is perhaps the first mainstream industry-grade deep learning toolkit, started in late 2013, due to its excellent convnet implementation (at the time). It is still the most popular toolkit within the computer vision community, with many extensions being actively added.

However, its support for recurrent networks and language modeling in general is poor, due to its legacy architecture, which's limitations are detailed in the architecture section.

CNTK

CNTK is a deep learning system started by the speech people who started the deep learning craze and grown into a more general platform-independent deep learning system. It is better known in the speech community than in the general deep learning community.

In CNTK (as in TensorFlow and Theano), a network is specified as a symbolic graph of vector operations, such as matrix add/multiply or convolution. A layer is just a composition of those operations. The fine granularity of the building blocks (operations) allows users to invent new complex layer types without implementing them in a low-level language (as in Caffe).

TensorFlow

State-of-the-art models

RNN API and implementation are suboptimal. The team also commented about it here and here.
Bidirectional RNN not available yet
No 3D convolution, which is useful for video recognition

New models Since TF uses symbolic graph of vector operations approach, specifying a new network is fairly easy. Although it doesn't support symbolic loop yet (at least not well tested/documented, as of 05/2016), RNNs can be made easy and efficient using the bucketing trick.

However, TF has a major weakness in terms of modeling flexibility. Every computational flow has be constructed as a static graph. That makes some computations difficult, such as beam search (which is used frequently in sequence prediction tasks).

Theano

State-of-the-art models. Theano has implementation for most state-of-the-art networks, either in the form of a higher-level framework (e.g. Blocks, Keras, etc.) or in pure Theano.

New models. Theano pioneered the trend of using symbolic graph for programming a network. Theano's symbolic API supports looping control, so-called scan, which makes implementing RNNs easy and efficient. Users don't always have to define a new model at the tensor operations level. There are a few higher-level frameworks, mentioned above, which make model definition and training simpler.

Torch

State-of-the-art models

Excellent for conv nets. It's worth noting that temporal convolution can be done in TensorFlow/Theano via conv2d but that's a trick. The native interface for temporal convolution in Torch makes it slightly more intuitive to use.
Rich set of RNNs available through a non-official extension [2]

New models. In Torch, there are multiple ways (stack of layers or graph of layers) to define a network but essentially, a network is defined as a graph of layers. Because of this coarser granularity, Torch is sometimes considered less flexible because for new layer types, users have to implement the full forward, backward, and gradient input update.

However, unlike Caffe, defining a new layer in Torch is much easier because you don't have to program in C++. Plus, in Torch, the difference between new layer definition and network definition is minimal. In Caffe, layers are defined in C++ while networks are defined via Protobuf.

Torch is more flexible than TensorFlow and Theano in that it is imperative while TF/Theano are declarative (i.e. one has to declare a computational graph). That makes some operations, e.g. beam search, much easier to do in Torch.

Left: graph model of CNTK/Theano/TensorFlow; Right: graph model of Caffe/Torch