Hi! Thanks for making this package, I'm finding really helpful for some optimisation p

Classical newton methods about optimistix HOT 6 CLOSED

ataras2 commented on June 19, 2024

Classical newton methods

from optimistix.

Comments (6)

patrick-kidger commented on June 19, 2024

~~This should be what you get when doing optx.minimise(..., solver=optx.Newton()). :)~~ (EDIT: wrong way around, whoops, see below.)

On line searchs: yup, we support a few of these. In particular backtracking.

from optimistix.

ataras2 commented on June 19, 2024

My understanding was that optx.Newton() was a root finder, not a minimiser (though I guess minimising = root solving for the gradient function...). I'm looking for something like this - there are almost too many Newton algorithms*

Minimum example below showcases some of my confusion - works with BFGS solver but not with Newton (since the function has no root but a minimum).

import optimistix as optx
import jax.numpy as np

gamma = 10

def fn(x,args):
    return x[0]**2 + gamma*x[1]**2 + 5

# solver = optx.Newton(1e-3,1e-3)
solver = optx.BFGS(1e-3,1e-3)
res = optx.minimise(fn, solver, y0=np.array([1, 1]), max_steps=int(1e4))

print(res.stats["num_steps"], res.value)

Though I now also see that this works:

import jax
solver = optx.Newton(1e-3,1e-3)
res = optx.root_find(jax.grad(fn), solver, y0=np.array([1, 1]), max_steps=int(1e4))

print(res.stats["num_steps"], res.value)

*please eliminate three. P.S. I am not a crackpot

from optimistix.

packquickly commented on June 19, 2024

We deliberately kept Newton as a root-finder and not a general-purpose minimisation algorithm. Newton's method for minimisation refers to the minimiser you get by applying Newton's method for root-finding to the gradient of the function, as you've done in your example. This means for non-convex problems Newton is happy to converge to any critical point, and not just minima. We've excluded it to deter people from using it without knowing this, and to encourage use of quasi-Newton methods instead. Quasi-Newton methods converge to non-minimal critical points less frequently.

If you want to apply Newton's method regardless, you can apply it to the gradient as you've done. Unfortunately this doesn't support line searches. You could implement a custom Newton solver with line search quite easily by subclassing AbstractMinimiser and providing the true Hessian in the function info. Take a look at the code for any of the existing minimisers for how to do this, and let me know if you run into trouble.

from optimistix.

ataras2 commented on June 19, 2024

That all makes sense, thanks for clarifying!
Would Newton solvers that exploit structure (perhaps at the cost of some ease of use) be in scope too? or maybe these are already out there and I'm missing them? e.g. solver for a min. problem with Hessian as arrow matrix. In my mind this could involve reformatting the loss function from fn(x) to f(sparse_params, dense_params) and providing a function that can invert the Hessian of the sparse params using the structure.

from optimistix.

patrick-kidger commented on June 19, 2024

I think this should be doable fairly straightforwardly.

First of all, as a similar example for what is already supported, here's a root-find using Newton's method and a diagonal Jacobian. (Which takes the place of the Hessian when doing a root-find rather than a minimisation.)

import lineax as lx
import optimistix as optx
import jax.numpy as jnp
from jaxtyping import Array, Float

def fn(x: Float[Array, "2"], _) -> Float[Array, "2"]:
    x0, x1 = x
    return jnp.stack([x0**2, (x1 - 3)**2])

solver = optx.Newton(rtol=1e-5, atol=1e-5, linear_solver=lx.Diagonal())
y0 = jnp.array([1., 2.])
sol = optx.root_find(fn, solver, y0, tags=frozenset({lx.diagonal_tag}))
print(sol.value)

Here you can see that we use Lineax to handle linear operators and linear solves.

So in order to support the example you have in mind, you would need to:

Add an "arrow tag" to indicate the structure of your matrix: arrow_tag = object(). (Akin to the lx.diagonal_tag above.)
Create an Arrow solver that knows how to solve matrices with that structure. (Akin to the lx.Diagonal above.)

Neither of these requires any changes to Lineax itself, as every built-in solver is defined in the exact same way as one that is defined by a user.

Finally, you can of course do your minimisation by doing a root-find on the gradient, as discussed above. (Which is still generally discouraged as @packquickly notes, but that's up to you. :) )

from optimistix.

ataras2 commented on June 19, 2024

Thanks, I'll be sure to have a closer look at Lineax too!

from optimistix.

Classical newton methods about optimistix HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent