Git Product home page Git Product logo

Comments (17)

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

starter code:

using OrdinaryDiffEq, NeuralNetDiffEq, Plots

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

chain = Flux.Chain(Dense(1,32, σ), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,256, σ), Dense(256,1028, tanh), Dense(1028,1028, tanh), Dense(1028,length(u0)))
opt = Flux.Descent(0.00000001)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt),maxiters = 100, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

From GCI:

using OrdinaryDiffEq, NeuralNetDiffEq, Plots, Flux


function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

opt = ADAM(1e-03) #1e-04
# opt = NADAM()
# opt = Nesterov()
# opt = AMSGrad()
# chain = Chain(x -> reshape(x, length(x), 1, 1), Conv((1,), 1=>16, relu), Conv((1,), 16=>8, relu), x -> reshape(x, :, size(x, 4)), Dense(8, 10), softmax)


chain = Chain(
    x -> reshape(x, length(x), 1, 1),
    MaxPool((1,)),
    Conv((1,), 1=>16, relu),
    Conv((1,), 16=>16, relu),
    Conv((1,), 16=>32, relu),
    Conv((1,), 32=>64, relu),
    Conv((1,), 64=>256, relu),
    Conv((1,), 256=>256, relu),
    Conv((1,), 256=>1028, relu),
    Conv((1,), 1028=>1028),
    x -> reshape(x, :, size(x, 4)),
    Dense(1028, 512, tanh),
    Dense(512, 128, relu),
    Dense(128, 64, tanh),
    Dense(64, 2),
    softmax)

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt),maxiters = 100, verbose = true, dt=1/5f0)

plot(true_sol)
plot!(sol)

It's still monotonic. For some reason things keep coming out monotonic!

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

Capture

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024
using OrdinaryDiffEq, DiffEqFlux, NeuralNetDiffEq, Plots, Optim, Flux

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

N = 128
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
opt = BFGS()
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=false,diffmode=DiffEqFlux.ReverseDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=true),maxiters = 1000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

#=
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=true,diffmode=DiffEqFlux.ForwardDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)
=#

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=false,diffmode=DiffEqFlux.TrackerDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)

shows that using ReverseDiff fixes how it can converge. Looks like Zygote is dropping gradients, specifically:

FluxML/Zygote.jl#231
FluxML/Zygote.jl#314

@dhairyagandhi96

from neuralpde.jl.

CarloLucibello avatar CarloLucibello commented on August 23, 2024

Looks like Zygote is dropping gradients, specifically:

this should be fixed on zygote master

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

has it? The issues weren't closed though?

from neuralpde.jl.

DhairyaLGandhi avatar DhairyaLGandhi commented on August 23, 2024

Yeah, we'd closed both those issues with the closures, seems different from that

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

FluxML/Zygote.jl#510 needs to get rebased then. Whoever can force push could you please?

from neuralpde.jl.

DhairyaLGandhi avatar DhairyaLGandhi commented on August 23, 2024

I was getting tracked array type mismatch error with the mwe here, is there some setup to see a clearer error? This is with masters of all the relevant packages

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

I'm tagging everything, it's just Zygote that needs a special branch. There shouldn't be tracked arrays except with ReverseDiff?

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

#70 shows that naive training of a big network still doesn't work that well, though it's getting closer:

using OrdinaryDiffEq, DiffEqFlux, NeuralNetDiffEq, Plots, Optim, Flux

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
true_sol = solve(prob,Tsit5())

N = 128
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
opt = ADAM(0.000001)
θ = DiffEqFlux.initial_params(chain)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,θ,autodiff=true),maxiters = 10000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

using CuArrays
function f(u,p,t)
  cu([p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]])
end

p = cu(Float32[1.5,1.0,3.0,1.0])
u0 = cu(Float32[1.0,1.0])
N = 512
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
θ = cu(DiffEqFlux.initial_params(chain))
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,θ,autodiff=false),maxiters = 10000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

plot

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

I think that what is really required is an investigation of new training strategies. I'll open an issue about this

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

Still can't do this one. Lotka-Volterra is the hardest one 😅

using NeuralPDE, OrdinaryDiffEq, DiffEqFlux, Flux, OptimizationPolyalgorithms

function f(u, p, t)
    [p[1] * u[1] - p[2] * u[1] * u[2], -p[3] * u[2] + p[4] * u[1] * u[2]]
end

p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob_oop = ODEProblem{false}(f, u0, (0.0, 3.0), p)
true_sol = solve(prob_oop, Tsit5())

N = 1028
chain = FastChain(FastDense(1, N, softplus), FastDense(N, N, softplus), FastDense(N, N, softplus),
                  FastDense(N, N, softplus), FastDense(N, N, softplus), FastDense(N, N, softplus),
                  FastDense(N, N, softplus), FastDense(N, length(u0), softplus))
opt = ADAM(0.001)
θ = Float64.(DiffEqFlux.initial_params(chain))
alg = NeuralPDE.NNODE(chain, opt, θ; strategy = StochasticTraining(2000))
sol = solve(prob_oop, alg, verbose=true, maxiters = 200)

using Plots
plot(sol)
plot!(true_sol)

from neuralpde.jl.

YichengDWu avatar YichengDWu commented on August 23, 2024

This is because the information is propagated from the initial values along time according to the ODE system. Introducing points with a large time span will not help convergence. This is a simple problem that does not require a large network. NNODE should have "time stepping" like other ode solvers. This problem can be easily solved if we train on [0,2] first and then train on [0,3]. Or we can save the prediction at t=2 and retrain the nn on [2,3].

from neuralpde.jl.

ChrisRackauckas avatar ChrisRackauckas commented on August 23, 2024

I thought I tried that. Worth giving it another try. Another thing could be to just add a weight to the loss that biases it more heavily towards the front. That would then more naturally work itself out over the course of an optimization.

from neuralpde.jl.

YichengDWu avatar YichengDWu commented on August 23, 2024

I would not use weighting to overcome this bias, as it would not capture the error in a narrow area no matter how it's weighted. Sampling more points is a natural way of weighting. Here is what I got:

using OrdinaryDiffEq

function f(u, p, t)
    return [p[1] * u[1] - p[2] * u[1] * u[2], -p[3] * u[2] + p[4] * u[1] * u[2]]
end

p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob_oop = ODEProblem{false}(f, u0, (0.0, 3.0), p)
true_sol = solve(prob_oop, Tsit5(), saveat=0.01)

using ModelingToolkit
using Sophon, IntervalSets
using Optimization, OptimizationOptimJL, OptimizationOptimisers

@parameters t
@variables x(..), y(..)

Dₜ = Differential(t)

eqs = [Dₜ(x(t)) ~ p[1] * x(t) - p[2] * x(t) * y(t),
      Dₜ(y(t)) ~ -p[3] * y(t) + p[4] * x(t) * y(t)]

domain = [t  0 .. 3.0]

bcs = [x(0.0) ~ 1.0, y(0.0) ~ 1.0]

@named lotka_volterra = PDESystem(eqs, bcs, domain, [t], [x(t), y(t)])

chain = FullyConnected(1, 1, sin; hidden_dims=6, num_layers=2)
pinn = PINN(x = chain, y = chain)
sampler = QuasiRandomSampler(1, 1) # ingore this line
strategy = NonAdaptiveTraining() # ingore this line

prob = Sophon.discretize(lotka_volterra, pinn, sampler, strategy)

# more points towards the front
data_1 = rand(1, 100)
data_2 = rand(1, 50) .+ 1.0
data_3 = rand(1, 10) .+ 2.0

data = [data_1 data_2 data_3]

prob.p[1] = data # manually change the dataset
prob.p[2] = data


function callback(p, l)
    println("Current loss is: $l")
    return false
end

res = solve(prob, BFGS(); callback=callback, maxiters=2000)

using Plots

phi = pinn.phi
ts = [true_sol.t...;;]
x_pred = phi.x(ts, res.u.x)
y_pred = phi.y(ts, res.u.y)

plot(vec(ts), vec(x_pred), label="x_pred")
plot!(vec(ts), vec(y_pred), label="y_pred")
plot!(true_sol)

plot_206

from neuralpde.jl.

YichengDWu avatar YichengDWu commented on August 23, 2024

Additional Notes: the BFGS optimizer is really confused with a changing loss function. It‘s necessary to re-instantiate the optimizer if the weights are updated.

from neuralpde.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.