Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in the UODEs training when the network enlarges #689

Closed
junsebas97 opened this issue Mar 3, 2022 · 3 comments
Closed

Error in the UODEs training when the network enlarges #689

junsebas97 opened this issue Mar 3, 2022 · 3 comments

Comments

@junsebas97
Copy link

Hi, currently, I'm modeling a dynamic system with a universal ordinary differential equation. This system and the UODE receive external inputs f(t) and respond according to them. To train the UODE, I dispose of experimental data in only one case. However, for another two cases, I know some properties of the output; thus, I implement them as constraints. Here, it is an MWE, where the constraints correspond to a positive u1 at five seconds when the external input is zero and a u1 equal to zero at 100 seconds for a triangular pulse.

## libraries and functions:
using DiffEqFlux, DifferentialEquations, Interpolations

function callback(θ, loss)
   println("loss: $loss")
   return (loss < 1e-3)
end



function my_solver(f, u0, θ, ti)
    tspan = (ti[1], ti[end])   # integration time

    # incorporate the external input into the model
    aux_model(du, u, θ, t) = model(du, u, θ, t, f)

    prob = ODEProblem(aux_model, u0, tspan, θ)
    sol  = solve(prob, RK4(), saveat = ti, sensealg = ForwardSensitivity())

    return Array(sol)'
end



function loss(θ, example)
    # assess the model response for case 1 (experimental data)
    (t1, f1, u01) = example
    u1 = my_solver(f1, u01, θ, t1)

    # assess the model response for case 2
    t2    = 0:0.1:100
    f2(t) = 0.0
    u02   = [1.0, 0.0]
    u2    = my_solver(f2, u02, θ, t2);

    # assess the model response for case 3
    t3  = 0.0:0.1:100
    f3  = LinearInterpolation([0., 30, 60, 100], [0, 20, 0, 0])
    u03 = [0.0, 0.0]
    u3  = my_solver(f3, u03, θ, t3);
    
    # extract the needed quantities of each response
    y1 = u1[:, 1]
    y2 = u2[:, 1]
    y3 = u3[:, 1]

    # compute the loss
    return sum(abs2, 0 .- y1) + (1 - sign(y2[50])) + (abs(y3[end]))
end


## data: define the input of experimental data
t       = collect(0.0:0.1:30)
f(t)    = sin(t)
u0      = [0.0, 0.0]
example = [t, f, u0]

## model:
ANN = FastChain(FastDense(2, 20, tanh), FastDense(20, 1))
function model(du, u, θ, t, f)
    du[1]  = u[2]
    du[2]  = f(t) - ANN(u, θ)[1]
end

## training:
θ_model = initial_params(ANN)
loss(θ) = loss(θ, example)       # incorporate the experimental data in the loss
DiffEqFlux.sciml_train(loss, θ_model, ADAM(0.01), maxiters = 100, cb = callback)

When I run the above code with ten neurons in the hidden layer, it works perfectly. However, if I use 20 neurons in the hidden layer, I get this message ERROR: LoadError: MethodError: no method matching getindex(::ChainRulesCore.NotImplemented, ::Int64); the full stracktrace is:

ERROR: LoadError: MethodError: no method matching getindex(::ChainRulesCore.NotImplemented, ::Int64)
Stacktrace:
 [1] (::ChainRules.var"#1209#1211"{ChainRulesCore.NotImplemented, Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}}})(n::Int64)
    @ ChainRules ~/.julia/packages/ChainRules/QbPMo/src/rulesets/Base/array.jl:34
  [2] ntuple
    @ ./ntuple.jl:19 [inlined]
  [3] (::ChainRules.var"#vect_pullback#1210"{2, Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}}})(ȳ::ChainRulesCore.NotImplemented)
    @ ChainRules ~/.julia/packages/ChainRules/QbPMo/src/rulesets/Base/array.jl:34
  [4] (::Zygote.ZBack{ChainRules.var"#vect_pullback#1210"{2, Tuple{ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}, ChainRulesCore.ProjectTo{Float64, NamedTuple{(), Tuple{}}}}}})(dy::ChainRulesCore.NotImplemented)
    @ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/chainrules.jl:204
  [5] Pullback
    @ ~/Desktop/short_gilb/main_1.jl:39 [inlined]
  [6] (::typeof((loss)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
  [7] Pullback
    @ ~/Desktop/short_gilb/main_1.jl:67 [inlined]
  [8] (::typeof((loss)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
  [9] Pullback
    @ ~/.julia/packages/DiffEqFlux/jpIWG/src/train.jl:84 [inlined]
 [10] #214
    @ ~/.julia/packages/Zygote/bJn8I/src/lib/lib.jl:203 [inlined]
 [11] #1739#back
    @ ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67 [inlined]
 [12] Pullback
    @ ~/.julia/packages/SciMLBase/7GnZA/src/problems/basic_problems.jl:107 [inlined]
 [13] #214
    @ ~/.julia/packages/Zygote/bJn8I/src/lib/lib.jl:203 [inlined]
 [14] #1739#back
    @ ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67 [inlined]
 [15] Pullback
    @ ~/.julia/packages/GalacticOptim/DHxE0/src/function/zygote.jl:6 [inlined]
 [16] (::typeof((#260)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
 [17] #214
    @ ~/.julia/packages/Zygote/bJn8I/src/lib/lib.jl:203 [inlined]
 [18] #1739#back
    @ ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67 [inlined]
 [19] Pullback
    @ ~/.julia/packages/GalacticOptim/DHxE0/src/function/zygote.jl:8 [inlined]
 [20] (::typeof((#263)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
 [21] (::Zygote.var"#57#58"{typeof((#263))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface.jl:41
 [22] gradient(f::Function, args::Vector{Float32})
    @ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface.jl:76
 [23] (::GalacticOptim.var"#261#271"{GalacticOptim.var"#260#270"{OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{typeof(loss)}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}})(::Vector{Float32}, ::Vector{Float32})
    @ GalacticOptim ~/.julia/packages/GalacticOptim/DHxE0/src/function/zygote.jl:8
 [24] macro expansion
    @ ~/.julia/packages/GalacticOptim/DHxE0/src/solve/flux.jl:41 [inlined]
 [25] macro expansion
    @ ~/.julia/packages/GalacticOptim/DHxE0/src/utils.jl:35 [inlined]
 [26] __solve(prob::OptimizationProblem{false, OptimizationFunction{false, GalacticOptim.AutoZygote, OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{typeof(loss)}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, GalacticOptim.var"#261#271"{GalacticOptim.var"#260#270"{OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{typeof(loss)}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}}, GalacticOptim.var"#264#274"{GalacticOptim.var"#260#270"{OptimizationFunction{true, GalacticOptim.AutoZygote, DiffEqFlux.var"#82#87"{typeof(loss)}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing}}, GalacticOptim.var"#269#279", Nothing, Nothing, Nothing}, Vector{Float32}, SciMLBase.NullParameters, Nothing, Nothing, Nothing, Nothing, Base.Pairs{Symbol, typeof(callback), Tuple{Symbol}, NamedTuple{(:cb,), Tuple{typeof(callback)}}}}, opt::ADAM, data::Base.Iterators.Cycle{Tuple{GalacticOptim.NullData}}; maxiters::Int64, cb::Function, progress::Bool, save_best::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ GalacticOptim ~/.julia/packages/GalacticOptim/DHxE0/src/solve/flux.jl:39
 [27] #solve#480
    @ ~/.julia/packages/SciMLBase/7GnZA/src/solve.jl:3 [inlined]
 [28] sciml_train(::typeof(loss), ::Vector{Float32}, ::ADAM, ::Nothing; lower_bounds::Nothing, upper_bounds::Nothing, maxiters::Int64, kwargs::Base.Pairs{Symbol, typeof(callback), Tuple{Symbol}, NamedTuple{(:cb,), Tuple{typeof(callback)}}})
    @ DiffEqFlux ~/.julia/packages/DiffEqFlux/jpIWG/src/train.jl:89
 [29] top-level scope
    @ ~/Desktop/short_gilb/main_1.jl:68
in expression starting at /home/junsebas97/Desktop/short_gilb/main_1.jl:68

Does anybody know why this happens by increasing the number of neurons? Is there any solution?

@junsebas97
Copy link
Author

anyone? please help!

@lungd
Copy link

lungd commented Apr 1, 2022

When I run the above code with ten neurons in the hidden layer, it works perfectly.

When I c/p your code I get a

┌ Warning: Instability detected. Aborting
└ @ SciMLBase ~/.julia/packages/SciMLBase/GW7GW/src/integrator_interface.jl:351

It does not crash and the loss decreases but this fact could be relevant.
With 20 neurons I get the instability warning then it errors with the message you've posted.

It throws the error at the line u03 = [0.0, 0.0]

Interestingly, I still got the same error after I changed every array and range to Float32s by adding some f0s or using Float32.([]) (you get Float32 params from initial_params(model) by default), but then it worked with zeros(Float32, 2)

So here is a working example of your code

using DiffEqFlux, OrdinaryDiffEq, Interpolations


function callback(θ, loss)
   println("loss: $loss")
   return (loss < 1e-3)
end



function my_solver(f, u0, θ, ti)
    tspan = (ti[1], ti[end])   # integration time

    # incorporate the external input into the model
    aux_model(du, u, θ, t) = model(du, u, θ, t, f)

    prob = ODEProblem(aux_model, u0, tspan, θ)
    sol  = solve(prob, RK4(), saveat = ti, sensealg = ForwardSensitivity())

    return Array(sol)'
end



function loss(θ, example)
    # assess the model response for case 1 (experimental data)
    (t1, f1, u01) = example
    u1 = my_solver(f1, u01, θ, t1)

    # assess the model response for case 2
    t2    = 0f0:0.1f0:100
    f2(t) = 0.0f0
    u02   = Float32.([1.0, 0.0])
    u2    = my_solver(f2, u02, θ, t2);

    # assess the model response for case 3
    t3  = 0.0f0:0.1f0:100
    f3  = Interpolations.LinearInterpolation(Float32.([0.0, 30.0, 60.0, 100.0]), Float32.([0.0, 20.0, 0.0, 0.0]))
    # u03 = Float32.([0.0, 0.0])         ### ERROR
    # u03 = [0f0, 0f0]                   ### ERROR
    u03 = zeros(Float32, 2)              ### check
    u3  = my_solver(f3, u03, θ, t3);

    # extract the needed quantities of each response
    y1 = u1[:, 1]
    y2 = u2[:, 1]
    y3 = u3[:, 1]

    # compute the loss
    return sum(abs2, 0 .- y1) + (1 - sign(y2[50])) + (abs(y3[end]))
end


## data: define the input of experimental data
t       = collect(0.0f0:0.1f0:30)
f(t)    = sin(t)
u0      = Float32.([0.0, 0.0])
example = [t, f, u0]

## model:
ANN = FastChain(FastDense(2, 20, tanh), FastDense(20, 1))
function model(du, u, θ, t, f)
    du[1]  = u[2]
    du[2]  = f(t) - ANN(u, θ)[1]
end


## training:
θ_model = initial_params(ANN)
loss(θ) = loss(θ, example)       # incorporate the experimental data in the loss
DiffEqFlux.sciml_train(loss, θ_model, ADAM(0.01), maxiters = 100, cb = callback)
julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen Threadripper 2970WX 24-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, znver1)

st
  [aae7a2af] DiffEqFlux v1.45.3
  [a98d9a8b] Interpolations v0.13.5
  [1dea7af3] OrdinaryDiffEq v6.7.1

@ChrisRackauckas
Copy link
Member

Has a working example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants