Comments (9)
It took 2s here :)
Below is the compilation time for the mentioned examples using reverse.hpp
and forward.hpp
header files:
example-reverse-single-variable-function.cpp
real 0m2.007s
user 0m1.840s
sys 0m0.084s
example-forward-single-variable-function.cpp
real 0m0.666s
user 0m0.584s
sys 0m0.040s
Clearly the forward is a lot faster, which is interesting, given that it uses a lot of template metaprogramming.
I'll see what I can do to speed up the compilation when using reverse.hpp
, but I can't promise when! ;)
Hopefully it should be just a few minor things to change (fingers crossed)!
from autodiff.
I'd like to use forward.hpp
but I find it a bit trickier to use...
Thanks for taking a look!
from autodiff.
I'd like to use forward.hpp but I find it a bit trickier to use...
I'll be investing quite some time in the future in the foward algorithm, because this is what I'll actually need for my research. But, yes, I'll also support the reverse algorithm and improve it as much as I can.
Let me know what makes the forward mode more difficult for your use case, and I'll try to simplify it.
from autodiff.
About forward mode:
If I understand correctly, in forward mode you calculate the derivative of a function rather than a variable. So
dual u = f(x); // the output variable u
VectorXd dudx = gradient(f, x); // evaluate the gradient vector du/dx
where as in reverse you use a variable, i.e.
var y = f(x); // the output variable y
VectorXd dydx = gradient(y, x); // evaluate the gradient vector dy/dx
Reverse mode is easier to use, I think, because you don't have to worry about how the final variable was built. So for example I could say
var y = f(x) * 2.0 * x;
It's a lot less restrictive. I've finding it quite hard converting my code to use forward mode because of this.
Am I making sense?
from autodiff.
To further illustrate, here is some code I am using in reverse mode:
var w = energy(m, D);
VectorXvar u(2*m.nn);
int i; for (i=0; i<m.nn; i++)
{
u(2*i) = m.one[i];
u(2*i+1) = m.two[i];
}
VectorXd dwdu = gradient(w,u);
So my objective variable is w
which is calculated from my energy
function. Inside is a structure m
which contains all the dependent variables, two lists called one
and two
.
I want to get the derivatives of w
w.r.t. these dependent variables.
If I understand correctly, in order to do this with forward mode I would have to construct the energy function to take in each variable, i.e. energy(double var1, double var2, double var3
... etc... in order to get the gradient, which I cannot do since the number of dependent variables, well, varies.
from autodiff.
I am making some assumptions: D is a constant that you don't want derivatives wrt (so I've dropped it for now). Your struct m
has only one
and two
which are both (raw) arrays of VectorXd, both of the same length.
I think if you refactor a bit and use std::vector (or std::array if you know the size at compile time rather than run time) you could do:
#include <iostream>
#include <eigen3/Eigen/Core>
#include <autodiff/autodiff/forward/forward.hpp>
#include <autodiff/autodiff/forward/eigen.hpp>
using namespace std;
using namespace Eigen;
using namespace autodiff;
using namespace autodiff::forward;
// a version of the struct - could use std::array if you know the size nn at compile time.
struct M{
uint nn;
vector<VectorXdual> one;
vector<VectorXdual> two;
M(uint n) : nn(n), one(vector<VectorXdual>(n)), two(vector<VectorXdual>(n)) {}
};
// Just an example energy function
dual energy(M m)
{
dual out = 0.0;
for(uint i =0; i < m.nn; ++i){
out += m.one[i].squaredNorm();
out -= m.two[i].squaredNorm();
}
return out;
}
int main(void)
{
// construct the struct and fill in the vectors
M m(10);
for(uint i =0; i < m.nn; ++i){
m.one[i] = VectorXdual::Ones(5);
m.two[i] = 2.0 * VectorXdual::Ones(5);
}
dual w;
VectorXd g;
g = gradient(energy, wrt(m.one[0]), at(m), w);
cout << g << endl; // expecting 2 2 2 2 2
g = gradient(energy, wrt(m.two[0]), at(m), w);
cout << g << endl; // expecting -4 -4 -4 -4 -4
return 0;
}
Now to get the derivitive wrt each variable requires a loop, but you would need one in the reverse case also I believe
from autodiff.
Now to get the derivitive wrt each variable requires a loop, but you would need one in the reverse case also I believe
This is true in the forward case, whose strength is to compute directional derivatives, with the derivative along each individual variable being a specific case. To compute directional derivatives, only one function evaluation is then necessary. If one wants this along all unit/variable directions, then it is necessary a function evaluation for each such variable (i.e., for each unit direction, along x1
, x2
, ..., xn
).
Reverse algorithm can compute the gradient of a multi-variable scalar function in one go. However, this does not mean it is going to be more efficient than forward algorithm! Reason: the reverse algorithm usually relies on run-time memory allocation, construction of expression trees at run-time; things that can slow down the computation a lot.
I have an idea of using directional derivative evaluations (and thus using forward automatic differentiation) to compute gradient of scalar functions (multi-variable) without requiring n
function evaluations. In case the gradient is constantly updated (e.g., in a optimization calculation), the previous gradient vector would be used to calculate the new one, perhaps requiring just a few extra forward directional derivatives. However, I have no time to implement this now. In fact, this would not exist in autodiff
, but in another numerical library I'm planning, which would use autodiff::dual
types.
autodiff
would then be specialized for forward mode derivative calculations. Those requiring gradient of scalar multi-variable functions would then be able to do so with the other lib, without constructing these expression trees at run-time and deal with dynamic memory allocation (i.e., using var
type or equivalent in other automatic differentiation libraries). By relying always in the forward mode algorithm of autodiff
, this would imply the use of only stack allocated variables and the use of template meta-programming techniques to optimize the compile-time expression trees (e.g., avoiding temporaries, optimizing trivial expressions and identities, organizing the order of operations, etc.).
from autodiff.
@kewp said
I want to get the derivatives of w w.r.t. these dependent variables.
All I meant was that one would require something like derivative[i] = ...
. I was not implying it would be more efficient than backward mode - just that accessing the derivatives would require a loop.
autodiff would then be specialized for forward mode derivative calculations.
Are you implying that you will drop backward mode from the library?
from autodiff.
Are you implying that you will drop backward mode from the library?
var
should continue to be maintained here, but users would be informed of a potentially more efficient gradient evaluation algorithm (that actually uses forward algorithm, but not requiring n
function evaluations) in that other numerical lib mentioned above.
from autodiff.
Related Issues (20)
- Combined gradients HOT 2
- cmake compile error: numbertraits.hpp:70:16 HOT 1
- Directional Cross Derivatives HOT 1
- Is non-const argument in `gradient` necessary? HOT 1
- Dual will not work correctly for complex numbers HOT 3
- How to get the fastest performance of the autodiff. HOT 2
- Having trouble adding autodiff and Eigen to a project HOT 1
- Gitter link on website HOT 2
- Unable to build Python bindings with current master HOT 7
- An unexpected result when getting derivative of simple square function HOT 6
- [feature] GPU support HOT 2
- Adding const quietly invalidates results
- add gamma function
- Stack overflow HOT 3
- Derivative wrt x returns nan, but function does not depend on x at all. HOT 3
- CUDA: thrust::reduce fails for autodiff::real HOT 6
- Reverse-mode Hessian crashing on memory error
- `dual` and `real`, what's the difference? HOT 2
- current autodiff main branch does not work with current eigen master branch HOT 3
- Reverse mode asymptotically too slow O(N^14)? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autodiff.