We should get some examples showing bigger neural networks training things like Lotka-

starter code: <div class="highlight highlight-source-julia notranslate position-re

From GCI: <div class="highlight highlight-source-julia notranslate position-relati

<a target="_blank" rel="noopener noreferrer nofollow" href="https://user-images.github

<div class="highlight highlight-source-julia notranslate position-relative overflow-auto" dir="auto"

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="56

Push nnode further about neuralpde.jl HOT 17 CLOSED

sciml commented on August 23, 2024

Push nnode further

from neuralpde.jl.

Comments (17)

ChrisRackauckas commented on August 23, 2024

starter code:

using OrdinaryDiffEq, NeuralNetDiffEq, Plots

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

chain = Flux.Chain(Dense(1,32, σ), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,32,tanh), Dense(32,256, σ), Dense(256,1028, tanh), Dense(1028,1028, tanh), Dense(1028,length(u0)))
opt = Flux.Descent(0.00000001)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt),maxiters = 100, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

From GCI:

using OrdinaryDiffEq, NeuralNetDiffEq, Plots, Flux


function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

opt = ADAM(1e-03) #1e-04
# opt = NADAM()
# opt = Nesterov()
# opt = AMSGrad()
# chain = Chain(x -> reshape(x, length(x), 1, 1), Conv((1,), 1=>16, relu), Conv((1,), 16=>8, relu), x -> reshape(x, :, size(x, 4)), Dense(8, 10), softmax)


chain = Chain(
    x -> reshape(x, length(x), 1, 1),
    MaxPool((1,)),
    Conv((1,), 1=>16, relu),
    Conv((1,), 16=>16, relu),
    Conv((1,), 16=>32, relu),
    Conv((1,), 32=>64, relu),
    Conv((1,), 64=>256, relu),
    Conv((1,), 256=>256, relu),
    Conv((1,), 256=>1028, relu),
    Conv((1,), 1028=>1028),
    x -> reshape(x, :, size(x, 4)),
    Dense(1028, 512, tanh),
    Dense(512, 128, relu),
    Dense(128, 64, tanh),
    Dense(64, 2),
    softmax)

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt),maxiters = 100, verbose = true, dt=1/5f0)

plot(true_sol)
plot!(sol)

It's still monotonic. For some reason things keep coming out monotonic!

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

using OrdinaryDiffEq, DiffEqFlux, NeuralNetDiffEq, Plots, Optim, Flux

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)

true_sol = solve(prob,Tsit5())

N = 128
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
opt = BFGS()
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=false,diffmode=DiffEqFlux.ReverseDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=true),maxiters = 1000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

#=
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=true,diffmode=DiffEqFlux.ForwardDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)
=#

sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,autodiff=false,diffmode=DiffEqFlux.TrackerDiffMode()),
                                            maxiters = 1000, verbose = true,
                                            dt=1/5f0)
plot(true_sol)
plot!(sol)

shows that using ReverseDiff fixes how it can converge. Looks like Zygote is dropping gradients, specifically:

FluxML/Zygote.jl#231
FluxML/Zygote.jl#314

@dhairyagandhi96

from neuralpde.jl.

CarloLucibello commented on August 23, 2024

Looks like Zygote is dropping gradients, specifically:

this should be fixed on zygote master

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

has it? The issues weren't closed though?

from neuralpde.jl.

DhairyaLGandhi commented on August 23, 2024

Yeah, we'd closed both those issues with the closures, seems different from that

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

FluxML/Zygote.jl#510 needs to get rebased then. Whoever can force push could you please?

from neuralpde.jl.

DhairyaLGandhi commented on August 23, 2024

I was getting tracked array type mismatch error with the mwe here, is there some setup to see a clearer error? This is with masters of all the relevant packages

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

I'm tagging everything, it's just Zygote that needs a special branch. There shouldn't be tracked arrays except with ReverseDiff?

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

#70 shows that naive training of a big network still doesn't work that well, though it's getting closer:

using OrdinaryDiffEq, DiffEqFlux, NeuralNetDiffEq, Plots, Optim, Flux

function f(du,u,p,t)
  du[1] = p[1]*u[1] - p[2]*u[1]*u[2]
  du[2] = -p[3]*u[2] + p[4]*u[1]*u[2]
end
function f(u,p,t)
  [p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]]
end

p = Float32[1.5,1.0,3.0,1.0]
u0 = Float32[1.0,1.0]
prob = ODEProblem(f,u0,(0f0,3f0),p)
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
true_sol = solve(prob,Tsit5())

N = 128
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
opt = ADAM(0.000001)
θ = DiffEqFlux.initial_params(chain)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,θ,autodiff=true),maxiters = 10000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

using CuArrays
function f(u,p,t)
  cu([p[1]*u[1] - p[2]*u[1]*u[2],-p[3]*u[2] + p[4]*u[1]*u[2]])
end

p = cu(Float32[1.5,1.0,3.0,1.0])
u0 = cu(Float32[1.0,1.0])
N = 512
chain = FastChain(FastDense(1,N,tanh), FastDense(N,N,tanh), FastDense(N,N,tanh), FastDense(N,length(u0)))
θ = cu(DiffEqFlux.initial_params(chain))
prob_oop = ODEProblem{false}(f,u0,(0f0,3f0),p)
sol  = solve(prob_oop,NeuralNetDiffEq.NNODE(chain,opt,θ,autodiff=false),maxiters = 10000, verbose = true, dt=1/5f0)
plot(true_sol)
plot!(sol)

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

I think that what is really required is an investigation of new training strategies. I'll open an issue about this

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

Still can't do this one. Lotka-Volterra is the hardest one 😅

using NeuralPDE, OrdinaryDiffEq, DiffEqFlux, Flux, OptimizationPolyalgorithms

function f(u, p, t)
    [p[1] * u[1] - p[2] * u[1] * u[2], -p[3] * u[2] + p[4] * u[1] * u[2]]
end

p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob_oop = ODEProblem{false}(f, u0, (0.0, 3.0), p)
true_sol = solve(prob_oop, Tsit5())

N = 1028
chain = FastChain(FastDense(1, N, softplus), FastDense(N, N, softplus), FastDense(N, N, softplus),
                  FastDense(N, N, softplus), FastDense(N, N, softplus), FastDense(N, N, softplus),
                  FastDense(N, N, softplus), FastDense(N, length(u0), softplus))
opt = ADAM(0.001)
θ = Float64.(DiffEqFlux.initial_params(chain))
alg = NeuralPDE.NNODE(chain, opt, θ; strategy = StochasticTraining(2000))
sol = solve(prob_oop, alg, verbose=true, maxiters = 200)

using Plots
plot(sol)
plot!(true_sol)

from neuralpde.jl.

YichengDWu commented on August 23, 2024

This is because the information is propagated from the initial values along time according to the ODE system. Introducing points with a large time span will not help convergence. This is a simple problem that does not require a large network. NNODE should have "time stepping" like other ode solvers. This problem can be easily solved if we train on [0,2] first and then train on [0,3]. Or we can save the prediction at t=2 and retrain the nn on [2,3].

from neuralpde.jl.

ChrisRackauckas commented on August 23, 2024

I thought I tried that. Worth giving it another try. Another thing could be to just add a weight to the loss that biases it more heavily towards the front. That would then more naturally work itself out over the course of an optimization.

from neuralpde.jl.

YichengDWu commented on August 23, 2024

I would not use weighting to overcome this bias, as it would not capture the error in a narrow area no matter how it's weighted. Sampling more points is a natural way of weighting. Here is what I got:

using OrdinaryDiffEq

function f(u, p, t)
    return [p[1] * u[1] - p[2] * u[1] * u[2], -p[3] * u[2] + p[4] * u[1] * u[2]]
end

p = [1.5, 1.0, 3.0, 1.0]
u0 = [1.0, 1.0]
prob_oop = ODEProblem{false}(f, u0, (0.0, 3.0), p)
true_sol = solve(prob_oop, Tsit5(), saveat=0.01)

using ModelingToolkit
using Sophon, IntervalSets
using Optimization, OptimizationOptimJL, OptimizationOptimisers

@parameters t
@variables x(..), y(..)

Dₜ = Differential(t)

eqs = [Dₜ(x(t)) ~ p[1] * x(t) - p[2] * x(t) * y(t),
      Dₜ(y(t)) ~ -p[3] * y(t) + p[4] * x(t) * y(t)]

domain = [t ∈ 0 .. 3.0]

bcs = [x(0.0) ~ 1.0, y(0.0) ~ 1.0]

@named lotka_volterra = PDESystem(eqs, bcs, domain, [t], [x(t), y(t)])

chain = FullyConnected(1, 1, sin; hidden_dims=6, num_layers=2)
pinn = PINN(x = chain, y = chain)
sampler = QuasiRandomSampler(1, 1) # ingore this line
strategy = NonAdaptiveTraining() # ingore this line

prob = Sophon.discretize(lotka_volterra, pinn, sampler, strategy)

# more points towards the front
data_1 = rand(1, 100)
data_2 = rand(1, 50) .+ 1.0
data_3 = rand(1, 10) .+ 2.0

data = [data_1 data_2 data_3]

prob.p[1] = data # manually change the dataset
prob.p[2] = data


function callback(p, l)
    println("Current loss is: $l")
    return false
end

res = solve(prob, BFGS(); callback=callback, maxiters=2000)

using Plots

phi = pinn.phi
ts = [true_sol.t...;;]
x_pred = phi.x(ts, res.u.x)
y_pred = phi.y(ts, res.u.y)

plot(vec(ts), vec(x_pred), label="x_pred")
plot!(vec(ts), vec(y_pred), label="y_pred")
plot!(true_sol)

from neuralpde.jl.

YichengDWu commented on August 23, 2024

Additional Notes: the BFGS optimizer is really confused with a changing loss function. It‘s necessary to re-instantiate the optimizer if the weights are updated.

from neuralpde.jl.

Push nnode further about neuralpde.jl HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent