Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implicit tendency for canopy temperature #675

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kmdeck
Copy link
Collaborator

@kmdeck kmdeck commented Jun 24, 2024

Purpose

Add implicit tendency for canopy temperature
Address #696

There are two things I dont understand about this change:

  • there seem to be some times during the simultation where the error is always large, and a smaller dt doesnt help. For example, the 99th percentile error curve dips below the mean error for small dt, which indicates there are outliers that are very large in the top 1% error
    • update 9/18: it's possible this may happen during times in the simulation where temperature is changing a lot, e.g. shortly after a driver update, as opposed to when the temperature is in equilibrium. See plots in comments for more information
  • the derivative of SHF with respect to temperature is correct to a few percent, but the derivative of LHF with respect to temeperature is ~30% wrong. This is partly due to d_qsat/d_T errors (but Ive double checked that I entered the formula correct), but seems like it is also due to d_LHF/d_qsat. I think I have followed CLM exactly. Not sure why the error is large.

These two questions will be investigated in a later PR - see #782

Results

With an explicit stepper, RK4, a test script at the Ozark site with the canopy alone permits dt = 60 [ac_canopy = 1e3]. Time to solution = 114 seconds - see kd/canopy_explicit_tests branch. Using dt = 180 fails.
With an implicit stepper ARS111, updated jacobian every newton iteration, max_iters = 6, the same simulation can be run with dt=3600. Time to solution = 16 seconds. Using only a single iteration failed.

Accuracy as a function of timestep (ref solution = dt = 6, implicit solver)
Float 32:
errors_f32
Float64:
errors_f64

In the global long run, on this branch: kd/canopy_explicit_global_nans I just changed the value of ac_canopy but left everything else the same as #main - i.e. using an explicit solver. The resulting map is almost entirely NaN [ look at time 86400]

On the implicit branch (this PR), we have no NaNs after one day. [ look at time 86400]

To-do

  • Merge ClimaCore Branch
  • Test timestep improvement
  • Unit test Jacobian
  • Cleanup canopy timestep_test.jl
  • Add script to show soil/canopy convergence (analogous to canopy timestep_test.jl)
  • Rebase
  • Check longruns
  • Review

Content

  • Adds canopy temp to the list of implicitly stepped variables
  • Adds the canopy jacobian
  • Adds the canopy implicit tendency
  • updates scripts and some timesteps
  • adds a script testing timestep convergence, adds to buildkite
  • update docs Manifest to ClimaCore 0.14.11

  • I have read and checked the items on the review checklist.

experiments/long_runs/land.jl Outdated Show resolved Hide resolved
@kmdeck kmdeck force-pushed the kd/canopy_temp_implicit branch 3 times, most recently from 8323271 to f754df6 Compare August 19, 2024 18:14
@kmdeck kmdeck requested a review from Sbozzolo August 19, 2024 23:37
Copy link
Member

@dennisYatunin dennisYatunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good overall, but I have some concerns about the reliability of the tests. The current test results indicate that there are significant errors in the Jacobian approximation, but these errors might appear bigger than they actually are if there is a problem with the reference solutions. This issue can be dealt with in future PRs, though; the new infrastructure added here is fine to merge in.

finitediff_LHF = (p_2.canopy.energy.lhf .- p.canopy.energy.lhf) ./ ΔT
estimated_LHF = p.canopy.energy.∂LHF∂qc .* p.canopy.energy.∂qc∂Tc
@test parent(abs.(finitediff_LHF .- estimated_LHF) ./ finitediff_LHF)[1] <
0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Jacobian errors seem rather large to me. Although they're most likely caused by issues with your approximation, they could also be coming from the reference finite difference solution if your system is sufficiently nonlinear. I don't have a good intuition for whether this is the case here, but you can check by using autodiff for the reference instead of finite differences. I can help you set that up using ForwardDiff.jl if you think it's worth looking into.

src/shared_utilities/implicit_timestepping.jl Show resolved Hide resolved
available_explicit_vars =
MatrixFields.unrolled_filter(is_in_Y, explicit_vars)

get_jac_type(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a little risky to determine the jacobian block type based on the space rather than the variable name. Unless you are certain that your 3D variable derivatives will always be tridiagonal (which is not the case in ClimaAtmos), I would recommend dispatching on variable names here instead.


# Set up timestepper
timestepper = CTS.ARS343();
timestepper = CTS.ARS111();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you encounter some sort of issue with the ARS343 timestepper here (and in other experiments), or did you just switch to IMEX Euler for convenience?

@juliasloan25
Copy link
Member

juliasloan25 commented Sep 17, 2024

Some insights we gained during explicitly-stepped experiments in #771 -

In testing the canopy temperature implicit solver implemented in #675, we saw unexpected convergence behavior. To investigate if the cause of this was due to inacurracy of the reference solution or flaws in the simulation setup, we take a look here at the convergence behavior of the same setup using an explicit solver.

Key point 1: ac_canopy affects time to steady state

Here we can see the evolution of canopy temperature over time using two different ac_canopy values. Atmospheric and radiative drivers are updated every 3 hours, so we see the temperature sharply change every 3 hours in both cases. The temperature then tends towards steady state but only reaches steady state with the smaller ac_canopy=1e3, creating the flat sections of the plot.

We want to use the smaller ac_canopy=1e3 moving forward since it's thought to be more physically accurate. With this value, we will see the pattern of temperature changing following a driver update then remaining constant upon reaching steady state, producing two distinct regimes.

Key point 2: convergence behavior should be assessed outside of the steady state regime

Here we have two plots showing convergence using ARS111 with only an explicit tendency (nothing stepped implicitly). The left plot shows convergence for simulations run for 12 hours + 80 seconds, while the left was run for 12 hours. The plotted line has slope 1, so we see the expected convergence behavior in the first case but not in the second.

In the first case, the simulations end shortly after a driver update, so outside of the steady state regime. All simulations are converging towards the steady state, but are not yet there, so we see the expected convergence behavior.

In the second case, the simulations end during the steady state regime. Simulations with all timesteps have converged to the same steady state, so the error for each of them is very small and doesn't converge as we would expect. This doesn't mean the convergence behavior of the solver/tendency is wrong; we've just stopped the simulation at an uninformative time.

Conclusion

When testing the convergence of a solver for canopy temperature (or similar equations that reach steady state), we need to make sure we check convergence at a time outside of the steady state regime. Also note that this could have effects on errors throughout the simulation - e.g. error peaking sharply after a driver update then returning to a small value as steady state is reached.

Note also that when running for a simulation time that is not a multiple of the interval used in saveat, we need to manually add tf to the saveat array.

@juliasloan25
Copy link
Member

juliasloan25 commented Sep 18, 2024

Here is the convergence of the implicit solver when running the simulation for 3 hours and updating the drivers at every timestep. This shows the expected convergence behavior for timesteps up to ~1800s (all dts shown are [12.0, 24.0, 48.0, 100.0, 225.0, 450.0, 900.0, 1800.0, 3600.0].

@kmdeck
Copy link
Collaborator Author

kmdeck commented Sep 18, 2024

Here is the convergence of the implicit solver when running the simulation for 3 hours and updating the drivers at every timestep. This shows the expected convergence behavior for timesteps up to ~1800s (all dts shown are [12.0, 24.0, 48.0, 100.0, 225.0, 450.0, 900.0, 1800.0, 3600.0].

Which convergence checker/max iters combo are we using? this is with ARS111? Just curious because, as this looks quite good already, I am wondering if we can improve the error

@juliasloan25
Copy link
Member

Here is the convergence of the implicit solver when running the simulation for 3 hours and updating the drivers at every timestep. This shows the expected convergence behavior for timesteps up to ~1800s (all dts shown are [12.0, 24.0, 48.0, 100.0, 225.0, 450.0, 900.0, 1800.0, 3600.0].

Which convergence checker/max iters combo are we using? this is with ARS111? Just curious because, as this looks quite good already, I am wondering if we can improve the error

This is using ARS111 with max_iters = 6 and no convergence checker

@juliasloan25
Copy link
Member

juliasloan25 commented Sep 19, 2024

Here are plots showing the difference in T at the end of a 100 day simulation between runs with dt=10s and dt=1800s. The plot on the left uses max_iters = 6, and the plot on the right uses max_iters = 10. Both have the same features and look nearly identical, except the mean error is slightly larger in the case with max_iters = 10.

maxiters6 maxiters10

@kmdeck
Copy link
Collaborator Author

kmdeck commented Sep 19, 2024

Here are plots showing the difference in T at the end of a 100 day simulation between runs with dt=10s and dt=1800s. The plot on the left uses max_iters = 6, and the plot on the right uses max_iters = 10. Both have the same features and look nearly identical, except the mean error is slightly larger in the case with max_iters = 10.

thanks for checking this! I think this is consistent with there being certain times where the Newton iterations do not converge (and not because we dont carry out enough iterations, but because J is so far off)

@juliasloan25 juliasloan25 force-pushed the kd/canopy_temp_implicit branch 2 times, most recently from c19838d to 6e300b0 Compare September 20, 2024 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request LSMv1 performance improve performance Run long runs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implicit timestepper for canopy temperature or solve flux balance equation
4 participants