hey, one does not need to implement full forward/backward for every new layer in torch

It looks like that the backward pass is implemented <a href="https://github.com/karpat

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

torch LSTM about deepframeworks HOT 17 CLOSED

zer0n commented on June 11, 2024

torch LSTM

from deepframeworks.

Comments (17)

zer0n commented on June 11, 2024

It looks like that the backward pass is implemented here.

If you don't implement full forward/backward for new layers, how does Torch know what to compute? As far as I know, it doesn't have auto-diff.

Please correct if I'm missing something. Thanks @soumith !

from deepframeworks.

soumith commented on June 11, 2024

it's because he manually manages his rnn (as a design choice for his project), instead of adding it into a standard container.
If he used standard nn or nngraph containers he would only need a :backward call that handles everything for him.

from deepframeworks.

soumith commented on June 11, 2024

The only difference between Theano's autograd and Torch's autograd used to be the granularity at which autograd was being done. Theano does it at individual operation level, torch does it at nn.* operations. Torch now has a package that does autograd similar to theano: https://github.com/twitter/torch-autograd which takes in an arbitrary function of forward operations and generates the backward function

from deepframeworks.

zer0n commented on June 11, 2024

OK, I've withheld my comment about Torch's RNNs until further exams. I think we both agree that Theano and Torch does autograd at different granularity. Since Theano does at operation level, it gives higher flexibility although the network definition could be more verbose.

I'm aware of autograd but it's an extension (which I haven't looked into yet). As said in the abstract, I only compared the current state of the libraries.

from deepframeworks.

soumith commented on June 11, 2024

@zer0n if you "only compared the current state of the libraries.", Torch is an ndarray library and nothing more. nn is an extension, cutorch is an extension, nngraph is an extension and so on. Torch is nothing without it's ecosystem, neither is Caffe, ignoring them is actually a fairly unfair comparison just because they are packaged in a modular way.

from deepframeworks.

zer0n commented on June 11, 2024

I hear you. I'll dig into autograd and update later. Give me time.

from deepframeworks.

pranv commented on June 11, 2024

You should also give some points to torch for having stateful RNNs. It's sort of hard to do in Theano and popular theano frameworks don't support it fully.

from deepframeworks.

zer0n commented on June 11, 2024

@soumith I've been running this example of torch autograd. The performance is horrendous. I haven't dug into the code but it seems that torch autograd doesn't do symbolic differentiation a priori. Can you verify the performance on your system?

Mine takes about 1h just to go through 1 epoch. No GPU (I'm testing on a VM) but it's eating all the CPU cores.

from deepframeworks.

soumith commented on June 11, 2024

@zer0n just ran it. Are you looking at performance wrt speed or accuracy?

from deepframeworks.

soumith commented on June 11, 2024

it seems to be taking 2.2s per 1000 samples, quite horribly slow. The example is also doing batch size = 1 on a tiny network 3 layers, 16 feature maps. I dont know what realistic benchmark can be run for that. Modifying it to:

mini-batch inputs
a slightly larger network

will see the throughput increase by a non-trivial amount.

Modifying the existing network for mini-batch size 100, takes 0.5 s / 1000 samples.

I haven't dug into the code but it seems that torch autograd doesn't do symbolic differentiation a priori.

By a priori, do you mean, compiling the backward graph before-hand? No it doesn't do that. It runs backward on the fly.

from deepframeworks.

zer0n commented on June 11, 2024

Yeah, I saw that it was using mini-batch size 1 as well.

Yes, I meant that it doesn't compile the backprop before-hand. So I don't see how it is competitive with the symbolic differentiation approach.

from deepframeworks.

soumith commented on June 11, 2024

Hmm, I dont quite understand. It is doing symbolic differentiation. It can do symbolic differentiation at the granularity of either torch.* operations (which are any ndarray operations) or at the granularity of nn.* operations (which have compiled backward calls that are already written in the nn package).

I hope I didn't misunderstand "I don't see how it is competitive with the symbolic differentiation approach."

from deepframeworks.

soumith commented on June 11, 2024

it doesn't optimize it's symbolic graph by fusing symbols together etc., if that's what you mean. In that sense, yea it doesn't have those optimizations yet, looks like it.

from deepframeworks.

zer0n commented on June 11, 2024

@soumith I didn't even mean that level of optimization yet. The lack of compilation means that many computational steps are repeated for every pass over the network (this and this).

They also highlighted those performance issues here, in the Performance section.

from deepframeworks.

soumith commented on June 11, 2024

@zer0n yea it seems to suffer perf issues from being an early release, just like TF. Coincidentally, they seemed to have JUST overhauled the library with better perf and debugging. Maybe it's a rapid work in progress. https://twitter.com/clmt/status/666348268327186433

from deepframeworks.

zer0n commented on June 11, 2024

FYI, I've just revamped the review. Please check if my comments on Torch are fair.

from deepframeworks.

soumith commented on June 11, 2024

@zer0n the review overall looks correct now. Not just for Torch, but for TensorFlow and for Theano. Thanks a lot for revamping it. And thanks for spending all the time on it.

from deepframeworks.

torch LSTM about deepframeworks HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent