I did fast look too the code. And my first idea to add allocator, but I'm not sure. <a

First observations <a target="_blank" rel="noopener noreferrer nofollow" href="htt

I will start profile example from <a class="issue-link js-issue-link" data-error-text=

Excellent <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Optimize reverse mode about autodiff HOT 12 OPEN

autodiff commented on June 2, 2024

Optimize reverse mode

from autodiff.

Comments (12)

supersega commented on June 2, 2024 2

First observations

from autodiff.

supersega commented on June 2, 2024 1

I will start profile example from #37, and that write results here!

from autodiff.

allanleal commented on June 2, 2024

One thing that we need to address is to directly populate the derivatives in the variables itself, instead of collecting these using Derivatives, which is a std::function that make a search in a std::unordered_set for the variable we are fetching the derivative from.

What I mean is to replace this:

var x = 1.0;
var y = 2.0;
var z = 3.0;
var u = f(x, y, z);

Derivatives dud = derivatives(u);

double dudx = dud(x);
double dudy = dud(y);
double dudz = dud(z);

by this:

var x = 1.0;
var y = 2.0;
var z = 3.0;
var u = f(x, y, z);

derivatives(u);

double dudx = x.derivative();
double dudy = y.derivative();
double dudz = z.derivative();

Not sure yet if the method names above are the appropriate ones.

Other things to check is how much dynamic memory allocation we could prevent during the evaluation of the expressions involving var types (or maybe think about a pool of pre-allocated var objects from which the variables are taken. With dual, this is not a concern, because dual numbers are allocated in the stack, and the expressions optimized with template meta-programming techniques.

Remarking that var and dual serve different purposes. var for gradient computations of scalar functions of multi-variables, and dual for directional derivatives of both scalar or vector functions, of either single or multi-variables.

from autodiff.

allanleal commented on June 2, 2024

Excellent @supersega ! This shows indeed that we need to get rid of this usage of std::unordered_map inside Derivatives type. I think we can save the derivative values in the variables themselves.

from autodiff.

allanleal commented on June 2, 2024

By the way, which profiling software is this?

from autodiff.

supersega commented on June 2, 2024

This is profiler from visual studio 2017.

from autodiff.

supersega commented on June 2, 2024

Yes, I think that we can store derivative in var. But I'm not clear how to deal with derivativesx().

from autodiff.

allanleal commented on June 2, 2024

I think derivativesx() (which is to allow higher-order derivatives using var) could become deprecated, because I think reverse mode is not the best approach for higher-order derivatives. But still, I think there should be an equivalent way for this, which we could think about later.

from autodiff.

supersega commented on June 2, 2024

Hi, I've come back from vacation and continued working on this issue. I implemented approch with storing derivatives in expression and found that we have boost performance two times.

from autodiff.

allanleal commented on June 2, 2024

That's great! I'm on vacation now; I'll be back into the office next Monday, 29 July.

…

On Sat, Jul 20, 2019 at 8:41 AM supersega ***@***.***> wrote: Hi, I've come back from vacation and continued working on this issue. I implemented approch with storing derivatives in expression and found that we have boost performance two times. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#40?email_source=notifications&email_token=ABMOINFUW2W4LZK2DL3GTK3QAKXS7A5CNFSM4H436K5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2NID4Y#issuecomment-513442291>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABMOINBX3SUWFGRKIYYSAADQAKXS7ANCNFSM4H436K5A> .

from autodiff.

ludkinm commented on June 2, 2024

Hi @allanleal and @supersega

I've been thinking about this too -- I have one small idea and one blue sky idea.

We could move the map to be a member of var, then have popagate run during construction w.r.t an expression. That way, the derivative map is stored u, the "assigned-to" variable in u = f(x).
This won't save the heap allocations, nor decrease the hashing (the bottleneck in @supersega 's profiling above), but could remove the dependency on std::function.
A basic program might look like:

var x = 2.0
var u = f(x) \\ builds Expr tree and propogates derivative info
double dudx = u.derivative(x); \\ value of du/dx(x=2.0);

Let me know what you think and I can push my ideas to my fork.

My biggest issue with reverse mode is that we often want to update an input variable and find the derivative at the new point (ie. gradient ascent). In the current implementation (as far as I can tell) this involves creating a complete tree of Expr variables again, on the heap, then building a new DerivativesMap using propagate on the var containing the VariableExpr.

It would be interesting if there was a way of updating input variables using the already-built tree and map! (I guess some form of tree where the leaves know their parents to propagate value information).

from autodiff.

allanleal commented on June 2, 2024

Hi @ludkinm - the reverse mode algorithm does need improvement at the moment (and I think we need minor to small changes to improve significantly from where we are, as you're suggesting that the derivative info is collected from the variable itself, instead from the DerivativesMap object (like you show in the example).

As to the re-using the expression tree - instead of building a new one each time. This will only work as long as there are no branches in the computation (there could be possibly other conditions too that would demand the full tree reconstruction). Below is an example of a function (pseudo-code) that would need the expression tree to be constructed at each function call (at the current implementation of the reverse algorithm, the re-construction would happen anyway even if there are no branches):

var f(var x) { return (x > 0) ? x**x : x*x*x; }

Consider what happens below:

var x, u;

x = 1.0; 
u = f(x); // expression tree constructed for x*x

x = -1.0; 
u = f(x); // expression tree constructed for x*x*x

I'm not saying it is not possible to kind of cache the expression tree. It would be great if we could improve further and further the reverse mode as the forward mode (which has received more attention from my side given that this is what I need most. There is a big PR coming in with even more interesting things on the forward mode support (higher-order directional derivatives, Taylor series, etc.).

@supersega has also proposed a new reverse algorithm (via a new number type) which would complement with the existing var type.

We can all discuss more in-depth ideas perhaps on autodiff`s Gitter Community Channel.

Just thought about something: we could have a ConditionExpr:

var f(var x) { return condition(x > 0, x**x, x*x*x); }

We should then overload condition so that if number types are generic (using templates), then condition would act as the standard ternary operator. That is, condition(x > 0, x**x, x*x*x) === (x > 0)x**x : x*x*x.

from autodiff.

Optimize reverse mode about autodiff HOT 12 OPEN

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent