Git Product home page Git Product logo

Comments (6)

patrick-kidger avatar patrick-kidger commented on June 19, 2024

This should be what you get when doing optx.minimise(..., solver=optx.Newton()). :) (EDIT: wrong way around, whoops, see below.)

On line searchs: yup, we support a few of these. In particular backtracking.

from optimistix.

ataras2 avatar ataras2 commented on June 19, 2024

My understanding was that optx.Newton() was a root finder, not a minimiser (though I guess minimising = root solving for the gradient function...). I'm looking for something like this - there are almost too many Newton algorithms*

Minimum example below showcases some of my confusion - works with BFGS solver but not with Newton (since the function has no root but a minimum).

import optimistix as optx
import jax.numpy as np

gamma = 10

def fn(x,args):
    return x[0]**2 + gamma*x[1]**2 + 5

# solver = optx.Newton(1e-3,1e-3)
solver = optx.BFGS(1e-3,1e-3)
res = optx.minimise(fn, solver, y0=np.array([1, 1]), max_steps=int(1e4))

print(res.stats["num_steps"], res.value)

Though I now also see that this works:

import jax
solver = optx.Newton(1e-3,1e-3)
res = optx.root_find(jax.grad(fn), solver, y0=np.array([1, 1]), max_steps=int(1e4))

print(res.stats["num_steps"], res.value)

*please eliminate three. P.S. I am not a crackpot

from optimistix.

packquickly avatar packquickly commented on June 19, 2024

We deliberately kept Newton as a root-finder and not a general-purpose minimisation algorithm. Newton's method for minimisation refers to the minimiser you get by applying Newton's method for root-finding to the gradient of the function, as you've done in your example. This means for non-convex problems Newton is happy to converge to any critical point, and not just minima. We've excluded it to deter people from using it without knowing this, and to encourage use of quasi-Newton methods instead. Quasi-Newton methods converge to non-minimal critical points less frequently.

If you want to apply Newton's method regardless, you can apply it to the gradient as you've done. Unfortunately this doesn't support line searches. You could implement a custom Newton solver with line search quite easily by subclassing AbstractMinimiser and providing the true Hessian in the function info. Take a look at the code for any of the existing minimisers for how to do this, and let me know if you run into trouble.

from optimistix.

ataras2 avatar ataras2 commented on June 19, 2024

That all makes sense, thanks for clarifying!
Would Newton solvers that exploit structure (perhaps at the cost of some ease of use) be in scope too? or maybe these are already out there and I'm missing them? e.g. solver for a min. problem with Hessian as arrow matrix. In my mind this could involve reformatting the loss function from fn(x) to f(sparse_params, dense_params) and providing a function that can invert the Hessian of the sparse params using the structure.

from optimistix.

patrick-kidger avatar patrick-kidger commented on June 19, 2024

I think this should be doable fairly straightforwardly.

First of all, as a similar example for what is already supported, here's a root-find using Newton's method and a diagonal Jacobian. (Which takes the place of the Hessian when doing a root-find rather than a minimisation.)

import lineax as lx
import optimistix as optx
import jax.numpy as jnp
from jaxtyping import Array, Float

def fn(x: Float[Array, "2"], _) -> Float[Array, "2"]:
    x0, x1 = x
    return jnp.stack([x0**2, (x1 - 3)**2])

solver = optx.Newton(rtol=1e-5, atol=1e-5, linear_solver=lx.Diagonal())
y0 = jnp.array([1., 2.])
sol = optx.root_find(fn, solver, y0, tags=frozenset({lx.diagonal_tag}))
print(sol.value)

Here you can see that we use Lineax to handle linear operators and linear solves.

So in order to support the example you have in mind, you would need to:

  1. Add an "arrow tag" to indicate the structure of your matrix: arrow_tag = object(). (Akin to the lx.diagonal_tag above.)
  2. Create an Arrow solver that knows how to solve matrices with that structure. (Akin to the lx.Diagonal above.)

Neither of these requires any changes to Lineax itself, as every built-in solver is defined in the exact same way as one that is defined by a user.

Finally, you can of course do your minimisation by doing a root-find on the gradient, as discussed above. (Which is still generally discouraged as @packquickly notes, but that's up to you. :) )

from optimistix.

ataras2 avatar ataras2 commented on June 19, 2024

Thanks, I'll be sure to have a closer look at Lineax too!

from optimistix.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.