Git Product home page Git Product logo

deepframeworks's Introduction

Evaluation of Deep Learning Toolkits

Warning: this research was done in late 2015 with slight modifications in early 2016. Many toolkits have improved significantly since then.

Abstract. In this study, I evaluate some popular deep learning toolkits. The candidates are listed in alphabetical order: Caffe, CNTK, TensorFlow, Theano, and Torch.

I also provide ratings in some areas because for a lot of people, ratings are useful. However, keep in mind that ratings are inherently subjective [1].

If you find something wrong or inadequate, please help improve by filing an issue.

Table of contents

  1. Modeling Capability

Modeling Capability

In this section, we evaluate each toolkit's ability to train common and state-of-the-art networks without writing too much code. Some of these networks are:

  • ConvNets: AlexNet, OxfordNet, GoogleNet
  • RecurrentNets: plain RNN, LSTM/GRU, bidirectional RNN
  • Sequential modeling with attention.

In addition, we also evaluate the flexibility to create a new type of model.

Caffe

Caffe is perhaps the first mainstream industry-grade deep learning toolkit, started in late 2013, due to its excellent convnet implementation (at the time). It is still the most popular toolkit within the computer vision community, with many extensions being actively added.

However, its support for recurrent networks and language modeling in general is poor, due to its legacy architecture, which's limitations are detailed in the architecture section.

CNTK

CNTK is a deep learning system started by the speech people who started the deep learning craze and grown into a more general platform-independent deep learning system. It is better known in the speech community than in the general deep learning community.

In CNTK (as in TensorFlow and Theano), a network is specified as a symbolic graph of vector operations, such as matrix add/multiply or convolution. A layer is just a composition of those operations. The fine granularity of the building blocks (operations) allows users to invent new complex layer types without implementing them in a low-level language (as in Caffe).

TensorFlow

State-of-the-art models

  • RNN API and implementation are suboptimal. The team also commented about it here and here.
  • Bidirectional RNN not available yet
  • No 3D convolution, which is useful for video recognition

New models Since TF uses symbolic graph of vector operations approach, specifying a new network is fairly easy. Although it doesn't support symbolic loop yet (at least not well tested/documented, as of 05/2016), RNNs can be made easy and efficient using the bucketing trick.

However, TF has a major weakness in terms of modeling flexibility. Every computational flow has be constructed as a static graph. That makes some computations difficult, such as beam search (which is used frequently in sequence prediction tasks).

Theano

State-of-the-art models. Theano has implementation for most state-of-the-art networks, either in the form of a higher-level framework (e.g. Blocks, Keras, etc.) or in pure Theano.

New models. Theano pioneered the trend of using symbolic graph for programming a network. Theano's symbolic API supports looping control, so-called scan, which makes implementing RNNs easy and efficient. Users don't always have to define a new model at the tensor operations level. There are a few higher-level frameworks, mentioned above, which make model definition and training simpler.

Torch

State-of-the-art models

  • Excellent for conv nets. It's worth noting that temporal convolution can be done in TensorFlow/Theano via conv2d but that's a trick. The native interface for temporal convolution in Torch makes it slightly more intuitive to use.
  • Rich set of RNNs available through a non-official extension [2]

New models. In Torch, there are multiple ways (stack of layers or graph of layers) to define a network but essentially, a network is defined as a graph of layers. Because of this coarser granularity, Torch is sometimes considered less flexible because for new layer types, users have to implement the full forward, backward, and gradient input update.

However, unlike Caffe, defining a new layer in Torch is much easier because you don't have to program in C++. Plus, in Torch, the difference between new layer definition and network definition is minimal. In Caffe, layers are defined in C++ while networks are defined via Protobuf.

Torch is more flexible than TensorFlow and Theano in that it is imperative while TF/Theano are declarative (i.e. one has to declare a computational graph). That makes some operations, e.g. beam search, much easier to do in Torch.



Left: graph model of CNTK/Theano/TensorFlow; Right: graph model of Caffe/Torch

Interfaces

Caffe

Caffe has pycaffe interface but that's a mere secondary alternative to the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you use pycaffe.

CNTK

The way to use CNTK, similar to Caffe, is to specify a config file and run command line. CNTK has Python support since V2.0 and C# support in progress.

TensorFlow

TF supports two interfaces: Python and C++. This means that you can do experiments in a rich, high-level environment and deploy your model in an environment that requires native code or low latency.

It would be perfect if TF supports F# or TypeScript. The lack of static type in Python is just ... painful :).

Theano

Python

Torch

Torch runs on LuaJIT, which is amazingly fast (comparable with industrial languages such as C++/C#/Java). Hence developers don't have to think about symbolic programming, which can be limited. They can just write all kinds of computations without worrying about performance penalty.

However, let's face it, Lua is not yet a mainstream language.

Model Deployment

How easy to deploy a new model?

Caffe

Caffe is C++ based, which can be compiled on a variety of devices. It is cross-platform (windows port is available and maintained here). Which makes Caffe the best choice with respect deployment.

CNTK

Like Caffe, CNTK is also C++ based and is cross-platform. Hence, deployment should be easy in most cases. However, to my understanding, it doesn't work on ARM architecture, which limits its its capability on mobile devices.

TensorFlow

TF supports C++ interface and the library can be compiled/optimized on ARM architectures because it uses Eigen (instead of a BLAS library). This means that you can deploy your trained models on a variety of devices (servers or mobile devices) without having to implement a separate model decoder or load Python/LuaJIT interpreter [3].

TF doesn't work on Windows yet so TF models can't be deployed on Windows devices though.

Theano

The lack of low-level interface and the inefficiency of Python interpreter makes Theano less attractive for industrial users. For a large model, the overhead of Python isn’t too bad but the dogma is still there.

The cross-platform nature (mentioned below) enables a Theano model to be deployed in a Windows environment. Which helps it gain some points.

Torch

Torch require LuaJIT to run models. This makes it less attractive than bare bone C++ support of Caffe/CNTK/TF. It’s not just the performance overhead, which is minimal. The bigger problem is integration, at API level, with a larger production pipeline.

Performance

Single-GPU

All of these toolkits call cuDNN so as long as there’s no major computations or memory allocations at the outer level, they should perform similarly.

Soumith@FB has done some benchmarking for ConvNets. Deep Learning is not just about feedforward convnets, not just about ImageNet, and certainly not just about a few passes over the network. However, Soumith’s benchmark is the only notable one as of today. So we will base the Single-GPU performance rating based on his benchmark.

TensorFlow and Torch

TensorFlow used to be slow when it first came out but as of 05/2016, it has reached the ballpark of other frameworks in terms of ConvNet speed. This is not surprising because every framework nowadays calls CuDNN for the actual computations.

Here's my latest micro benchmark of TensorFlow 0.8 vs before. The measurement is latency, in milliseconds, for one full minibatch forward-backward pass on a single Titan X GPU.

Network TF 0.6 [ref] TF 0.8 [my run] Torch FP32 [my run]
AlexNet 292 97 81
Inception v1 1237 518 470

Theano

On big networks, Theano’s performance is on par with Torch7, according to this benchmark. The main issue of Theano is startup time, which is terrible, because Theano has to compile C/CUDA code to binary. We don’t always train big models. In fact, DL researchers often spend more time debugging than training big models. TensorFlow doesn’t have this problem. It simply maps the symbolic tensor operations to the already-compiled corresponding function calls.

Even import theano takes time because this import apparently does a lot of stuffs. Also, after import Theano, you are stuck with a pre-configured device (e.g. GPU0).

Multi-GPU

TBD

Architecture

Developer Zone

Caffe

Caffe's architecture was considered excellent when it was born but in the modern standard, it is considered average. The main pain points of Caffe are its layer-wise design in C++ and the protobuf interface for model definition.

Layer-wise design. The building block of a network in Caffe is layer.

  • For new layer types, you have to define the full forward, backward, and gradient update. You can see an already long-list of layers implemented in (official) caffe.
  • What's worse is that if you want to support both CPU and GPU, you need to implement extra functions, e.g. Forward_gpu and Backward_gpu.
  • Worse, you need to assign an int id to your layer type and add that to the proto file. If your pull request is not merged early, you may need to change the id because someone else already claims that.

Protobuf. Caffe has pycaffe interface but that's a mere replacement of the command line interface. The model has to be defined in protobuf (usually with a plain text editor), even if you use pycaffe.

[Copied from my own answer on Quora]

CNTK

To be updated ...

TensorFlow

TF has a clean, modular architecture with multiple frontends and execution platforms. Details are in the white paper.

Theano

The architecture is fairly hacky: the whole code base is Python where C/CUDA code is packaged as Python string. This makes it hard to navigate, debug, refactor, and hence contribute as developers.

Torch

Torch7 and nn libraries are also well-designed with clean, modular interfaces.

Ecosystem

  • Caffe and CNTK: C++
  • TensorFlow: Python and C++
  • Theano: Python
  • Torch: Lua is not a mainstream language and hence libraries built for it are not as rich as ones built for Python.

Cross-platform

Caffe, CNTK, TensorFlow and Theano work on all OSes. Torch does not work on Windows and there's no known plan to port from either camp.


___

Footnotes

[1] Note that I don’t aggregate ratings because different users/developers have different priorities.

[2] Disclaimer: I haven’t analyzed this extension carefully.

[3] See my blog post for why this is desirable.

deepframeworks's People

Contributors

dwiel avatar junjieqian avatar zer0n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deepframeworks's Issues

Comments about Multi-GPU and Torch

Thanks for creating this comparison page. I think it will be usefull for many people.
Few comments:

  1. CNTK Multi-GPU. The paper you mentioned only presents results for fully-connected networks. And it can not be compared to distributed training of CNNs, for example. It turns out that distributed training of CNNs is much harder in a sense of acheiving high scalability factor. Currently noone demonstrated predictable and fast distributed training of CNNs.
  2. Model deployment. I think Torch mark could be 0.5 higher. The reason is since it is written on C and Lua, and lua in turn also written on C, very compact and embeddable, you actually can make Torch models run on exotic processors or DSPs (and even FPGA) which only have C compiler available.
  3. Architecture. This might be subjective but I would give Torch smaller mark. nn module alone looks good, but Torch is more then just nn, so as an inheritance of using Lua you have to use a lot of other modules.

MATLAB Support

Great writing!

Could you add whether a package supports (Or can Interface) with MATLAB?
Moreover, does it support Windows or Linux only?

Thank You.

Java platforms?

You've done a great review of deep learning toolkits and I thank you for that. However, they all are a pain to interop from Java world. Have you stumbled upon such a framework that would compare well to them and would be closer to the JVM? I can think of deeplearning4j but I'm not sure it fits the kind of evaluation you are doing.

Need update

Torch recently launched pytorch that supports python. Would love a revisit :)

How about Keras?

Hi zer0n
Can you add Keras to the list and compare them.
Thanks!

Tensorflow runs under bash for ubuntu for windows

Hi there. Well written, just wanted to add that even though there isn't any official support for tensorflow under windows, I can confirm it does run under the bash for ubuntu for windows beta. Haven't tried GPU though.

Module deployment

I disagree with your review of Torch on Module deployement. Lua and Torch are easily deployed on smart phones running iOS and Android. As for server to server, Lua can be interfaced with C, C++, CUDA, Java, Python and many other languages. This makes it easy to integrate into existing systems, i.e. to deploy. I agree that it has poor visualisation libraries. So 4 and 1/2 stars instead of 3?

Deployment Stars

Well written article!

I would differ on the Model Deployment stars for Torch though : Torch runs on LuaJIT, and LuaJIT runs on a shocking number of platforms: x86, ARM, PS3, PS4, XBox, PPC, MIPS, Android, iOS, embedded devices.

I would suggest boosting the stars to reflect this.

Computational graph approach for Torch

Torch does have a computational graph approach available via nngraph, or Autograd. On the flip side, Lasagne was created to add layers for Theano. I also prefer the computational graph as an abstraction for neural networks, but clearly both sides have their pros and cons. It should be worth mentioning this, especially considering a merge of nn and nngraph is on the roadmap.

On the other weakness, it's not part of the distributed packages, but there is also a Python interface of sorts. If you can at least write a #subjective tag that would be good - programming languages and the ecosystems that come with them are subject to a wide range of opinions.

Is there this type of deep learning model?

Is there this type of deep learning model?
There are two labeled folders for binary classification.
ex) men and women, cats oand dogs, etc.
And then inserting the images to each folders as training data.
And then just run a simple command to train.
That’s all.
I need these simple training network model. Is there any?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.