Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lobatto IIIa-c and Radau IIa solvers #178

Merged
merged 70 commits into from
Sep 1, 2024

Conversation

axla-io
Copy link
Contributor

@axla-io axla-io commented Mar 31, 2024

Features

  • 17 new Fully Implicit Runge-Kutta (FIRK) solvers: Lobatto IIIa-c 2 to 5 stage versions, and Radau IIa 1,2,3,5,7 - stage versions
  • Choice of cache with nested nonlinear solve (better performance for large problems) or expanded cache (seems to be more robust)
  • Adaptivity and interpolations
  • Sparse Jacobians

Issues with the code

Tests pass for the expanded cache, except for in some cases with the convergence tests for the higher order solvers, I think that the reason for this is that the error is low in absolute terms. I've done a check where the tests are marked as broken if the solution error is less than 1e-12

Also there is a runtime dispatch error when calling the JET tests. I have commented the lines in src/solve/firk.jl where the error occurs. I couldn't figure out how to resolve them.

For the nested cache, in addition to the tests that the expanded one doesn't pass, it also seem to have poor performance on NLLNS and Vector of vector initials tests. I haven't found a solver for which the nested ones converges there. For now these tests are commented out.

Also, the nested cache doesn't support ForwardDiff yet.

Checklist

  • [x ] Appropriate tests were added
  • [x ] Any code changes were done in a way that does not break public API
  • [x ] All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC. (I commented out tests for nested cache as a WIP thing)
  • Any new documentation only uses public API (I don't understand what this means :( )

src/adaptivity.jl Outdated Show resolved Hide resolved
@ChrisRackauckas
Copy link
Member

Tests pass for the expanded cache, except for in some cases with the convergence tests for the higher order solvers, I think that the reason for this is that the error is low in absolute terms. I've done a check where the tests are marked as broken if the solution error is less than 1e-12

Can you plot what those look like? for higher order you likely need to increase the dts

@avik-pal
Copy link
Member

Also let's merge the tests for MIRK and the other methods, except Shooting. The only reason Shooting is separate, is that the other methods currently don't support interpolation in the solution.

@ErikQQY

This comment was marked as outdated.

@avik-pal

This comment was marked as outdated.

@ErikQQY

This comment was marked as outdated.

@axla-io

This comment was marked as outdated.

@ChrisRackauckas
Copy link
Member

That looks like the opposite direction that the convergence plot recipe normally does. Is this not just plot(sim)?

@ErikQQY ErikQQY closed this Aug 28, 2024
@ErikQQY ErikQQY reopened this Aug 28, 2024
@ChrisRackauckas
Copy link
Member

Hit a time limit? We really need to split this into different test groups.

@ErikQQY
Copy link
Member

ErikQQY commented Aug 28, 2024

Yeah, I will split them into different test groups

Signed-off-by: ErikQQY <2283984853@qq.com>
Signed-off-by: ErikQQY <2283984853@qq.com>
Signed-off-by: ErikQQY <2283984853@qq.com>
Signed-off-by: ErikQQY <2283984853@qq.com>
Signed-off-by: ErikQQY <2283984853@qq.com>
Signed-off-by: ErikQQY <2283984853@qq.com>
Signed-off-by: ErikQQY <2283984853@qq.com>
Signed-off-by: ErikQQY <2283984853@qq.com>
Signed-off-by: ErikQQY <2283984853@qq.com>
@ChrisRackauckas
Copy link
Member

Is the Retest parallelism actually helpful? BLAS will multithread, so we may be slamming the threads on any large enough problem. I don't think that part makes sense in the BVP library, or ML @avik-pal ?

Signed-off-by: ErikQQY <2283984853@qq.com>
@avik-pal
Copy link
Member

Is the Retest parallelism actually helpful? BLAS will multithread, so we may be slamming the threads on any large enough problem. I don't think that part makes sense in the BVP library, or ML @avik-pal ?

Are any of the tests here heavy BLAS? In the ML context, none of the Lux tests are heavy enough to cause issues with oversubscribing, and it helps bring down the build times to about 25 mins (compared to > 1hr) on decent machines (the cuda ones on buildkite). Not the most helpful for github actions.

That said for testing heavy workloads (both compute or memory), I do disable parallelism in testing. I still keep ReTestItems because it lets you isolate the testitem that fails and for testing Enzyme which can segfault/assert crash julia it will just crash the worker, so the test shows up as failed rather than the main process crashing.

@ChrisRackauckas
Copy link
Member

The LU/QR will all multithread above 200x200, which I presume we must be hitting?

@avik-pal
Copy link
Member

https://github.com/SciML/BoundaryValueDiffEq.jl/actions/runs/10617623160/job/29430931485?pr=178#step:6:397 the AD isn't robust to rank deficient matrices. This probably needs to be handled upstream

Signed-off-by: ErikQQY <2283984853@qq.com>
@ErikQQY
Copy link
Member

ErikQQY commented Aug 30, 2024

https://github.com/SciML/BoundaryValueDiffEq.jl/actions/runs/10617623160/job/29430931485?pr=178#step:6:397 the AD isn't robust to rank deficient matrices. This probably needs to be handled upstream

From local testing, RadauIIa5 just error with GaussNewton, the other nonlinear solvers are just fine, I have just skipped this combination for now

@ChrisRackauckas
Copy link
Member

That's the least stable one. Not surprised. It should really default to using a trust region method for this reason.

@ErikQQY ErikQQY closed this Aug 30, 2024
@ErikQQY ErikQQY reopened this Aug 30, 2024
@ChrisRackauckas
Copy link
Member

Oh no I thought it was going to merge 😅

@ErikQQY
Copy link
Member

ErikQQY commented Aug 30, 2024

Yeah it was going to, but CI seemed to have a different opinion, let’s see how about now😅

Signed-off-by: ErikQQY <2283984853@qq.com>
@ErikQQY
Copy link
Member

ErikQQY commented Sep 1, 2024

Finally ready to go🎉🎉

@ChrisRackauckas
Copy link
Member

Amazing, and thanks @axla-io too.

@ChrisRackauckas ChrisRackauckas merged commit 4590c02 into SciML:master Sep 1, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants