Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reducing memory requirements of 2N methods for special functions #691

Merged
merged 10 commits into from
Mar 16, 2019
Merged

Reducing memory requirements of 2N methods for special functions #691

merged 10 commits into from
Mar 16, 2019

Conversation

kanav99
Copy link
Contributor

@kanav99 kanav99 commented Mar 7, 2019

This is a very experimental PR. Just to demonstrate the trick to remove k from the 2N Low Memory Methods for some specific problem.
Trick is to make a wrapper of a register for which just overloads du[i] = ... to du[i] += .... Using this the dependency of k is removed.

Limitation:
This won't work with the functions where du is on the right hand side. For example:

function f(du, u, p, t)
  du[1] = 5* u[1]
  du[2] = 2* du[1]
end

which should not be a problem as ,in theory, we write du/dt = f(u, p, t) so du can be written purely in terms of u, p and t. Still we can make a flag which will make sure from the user that this can be done.

I tried this code with my code and gave same output before and after the commit:

using OrdinaryDiffEq

function f(du,u,p,t)
	du[1] = 5 * u[2] + 4 * u[1]
	du[2] = 3 * u[2]
end
u0 = zeros(Float64, 2)
u0[1] = 1.0
u0[2] = 1.0

tspan = (0.0,1.0)
prob = ODEProblem(f,u0,tspan)
integ = init(prob, CarpenterKennedy2N54(),save_start=false,save_end=true,
                          save_everystep = false, calck = false, dt = 0.01)
solve!(integ)
@show(integ.sol)

If this idea is good to go,I would remove the unused registers and add the flag.


Base.setindex!(a::WilliamsonWrapper{kType, dtType}, b::bType, c::cType) where {kType, dtType, bType, cType} = (a.kref[c] += a.dt * b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with linear algebra?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the linear algebra in RHS is untouched. Just replaces du = .... statements to du += .... So the RHS is all the same.

@ChrisRackauckas
Copy link
Member

Yeah, I think a flag in each of the low memory algorithms about this optimization would be good to have. I would even have it set to true by default.

@kanav99
Copy link
Contributor Author

kanav99 commented Mar 8, 2019

Okay I will add more possibilities to the interface now. Been traveling, so would do it by evening (IST). This would need nice documentation too.

@codecov
Copy link

codecov bot commented Mar 9, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@a9966b1). Click here to learn what that means.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master     #691   +/-   ##
=========================================
  Coverage          ?   72.86%           
=========================================
  Files             ?       93           
  Lines             ?    29111           
  Branches          ?        0           
=========================================
  Hits              ?    21211           
  Misses            ?     7900           
  Partials          ?        0
Impacted Files Coverage Δ
src/alg_utils.jl 34.26% <0%> (ø)
src/algorithms.jl 98.43% <100%> (ø)
src/perform_step/low_storage_rk_perform_step.jl 92.24% <100%> (ø)
src/caches/low_storage_rk_caches.jl 89.54% <88.88%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9966b1...e4e8ffd. Read the comment docs.

@kanav99
Copy link
Contributor Author

kanav99 commented Mar 9, 2019

Current status:
Last commit successfully removes one register from the cache. Current naming conventions are a bit misleading (tmp is actually similar to fsal) and the flag is not yet done. Output is correct too for basic problems. Needs a lot of work and testing to verify that it covers all the cases.

@kanav99
Copy link
Contributor Author

kanav99 commented Mar 13, 2019

Actual work is almost done. The only problem is that while using . I am getting large number of extra allocations. I think that this is problem is solvable, just the midway statements aren't getting optimized somewhere, but I currently don't know how to proceed, tried many ways, but this was the best method I came upon. I would need help of someone who have implemented custom and in-place broadcast before. Requesting reviews @ChrisRackauckas @YingboMa

Benchmarks:
Script:

function f(du,u,p,t)
 du .= u
end

u0 = [1.0, 1.0]
tspan = (0.0,1.0)
prob = ODEProblem(f,u0,tspan)
integ = init(prob, CarpenterKennedy2N54(),save_start=false,save_end=false,
                          save_everystep = false, calck = false, dt = 0.01, alias_u0 = true)
@time solve!(integ)
integ = init(prob, CarpenterKennedy2N54(),save_start=false,save_end=true,
                          save_everystep = false, calck = true, dt = 0.01)
@time solve!(integ)
integ = init(prob, CarpenterKennedy2N54(),save_start=false,save_end=true,
                          save_everystep = false, calck = true, dt = 0.01)
@time solve!(integ)
@show(integ.sol)
@show(u0)


u0 = rand(1000,1000)
prob = ODEProblem(f,u0,tspan)
integ = init(prob, CarpenterKennedy2N54(),save_start=false,save_end=false,
                          save_everystep = false, calck = false, dt = 0.1, alias_u0 = true)

@show Base.summarysize(integ)/Base.summarysize(u0)

Master:

  0.979325 seconds (1.99 M allocations: 97.611 MiB, 4.47% gc time)
  0.000027 seconds (6 allocations: 336 bytes)
  0.000027 seconds (6 allocations: 336 bytes)
integ.sol = retcode: Success
Interpolation: 1st order linear
t: [1.0]
u: Array{Float64,1}[[7.38906, 7.38906]]
u0 = [2.71828, 2.71828]
Base.summarysize(integ) / Base.summarysize(u0) = 3.000164374178129

With flag turned to true:

  0.936926 seconds (1.75 M allocations: 85.362 MiB, 3.95% gc time)
  0.000043 seconds (406 allocations: 12.844 KiB)
  0.000026 seconds (406 allocations: 12.844 KiB)
integ.sol = retcode: Success
Interpolation: 1st order linear
t: [1.0]
u: Array{Float64,1}[[7.38906, 7.38906]]
u0 = [2.71828, 2.71828]
Base.summarysize(integ) / Base.summarysize(u0) = 2.0001653741731293

Flag = false:

  1.466949 seconds (1.84 M allocations: 89.197 MiB, 3.92% gc time)
  0.000030 seconds (9 allocations: 608 bytes)
  0.000029 seconds (9 allocations: 608 bytes)
integ.sol = retcode: Success
Interpolation: 1st order linear
t: [1.0]
u: Array{Float64,1}[[7.38906, 7.38906]]
u0 = [2.71828, 2.71828]
Base.summarysize(integ) / Base.summarysize(u0) = 2.0001653741731293

@kanav99
Copy link
Contributor Author

kanav99 commented Mar 13, 2019

Hopefully (just waiting for the tests to complete) this completes the PR. Maybe some variable name changes and moving code here and there (because wrappers like this could help us further improve memory usage in other algorithms and it should be nice to have a seperate file for these) and it should be good. By default this optimization is set to true. We finally get to the theoretical limit of memory.

Copy link
Member

@ChrisRackauckas ChrisRackauckas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this solution. It's great haha. Nicely done.

@. u = u + B2end[i]*tmp
end

f(k, u, p, t+dt)
@. tmp = dt*k
integrator.destats.nf += 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These will never Hermite interpolate, so the default interpolation should just be set to linear interpolation for these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For these methods, the interpolation is already set to 1st order Linear.

@ChrisRackauckas
Copy link
Member

The tests need to be updated to make sure it's only 2 registers.

@kanav99
Copy link
Contributor Author

kanav99 commented Mar 15, 2019

Added some more of the tests to test both with and without the flag. Have started documentation for this.

@ChrisRackauckas
Copy link
Member

lgtm. Still WIP?

@kanav99
Copy link
Contributor Author

kanav99 commented Mar 16, 2019

No, can be merged now :)

@kanav99 kanav99 changed the title [WIP] Reducing memory requirements of 2N methods for special functions Reducing memory requirements of 2N methods for special functions Mar 16, 2019
@ChrisRackauckas ChrisRackauckas merged commit 0a87018 into SciML:master Mar 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants