Add implicit tendency for canopy temperature #675

kmdeck · 2024-06-24T23:48:04Z

Purpose

Add implicit tendency for canopy temperature
Address #696

There are two things I dont understand about this change:

there seem to be some times during the simultation where the error is always large, and a smaller dt doesnt help. For example, the 99th percentile error curve dips below the mean error for small dt, which indicates there are outliers that are very large in the top 1% error
- update 9/18: it's possible this may happen during times in the simulation where temperature is changing a lot, e.g. shortly after a driver update, as opposed to when the temperature is in equilibrium. See plots in comments for more information
the derivative of SHF with respect to temperature is correct to a few percent, but the derivative of LHF with respect to temeperature is ~30% wrong. This is partly due to d_qsat/d_T errors (but Ive double checked that I entered the formula correct), but seems like it is also due to d_LHF/d_qsat. I think I have followed CLM exactly. Not sure why the error is large.

These two questions will be investigated in a later PR - see #782

Results

With an explicit stepper, RK4, a test script at the Ozark site with the canopy alone permits dt = 60 [ac_canopy = 1e3]. Time to solution = 114 seconds - see kd/canopy_explicit_tests branch. Using dt = 180 fails.
With an implicit stepper ARS111, updated jacobian every newton iteration, max_iters = 6, the same simulation can be run with dt=3600. Time to solution = 16 seconds. Using only a single iteration failed.

Accuracy as a function of timestep (ref solution = dt = 6, implicit solver)
Float 32:

Float64:

In the global long run, on this branch: kd/canopy_explicit_global_nans I just changed the value of ac_canopy but left everything else the same as #main - i.e. using an explicit solver. The resulting map is almost entirely NaN [ look at time 86400]

On the implicit branch (this PR), we have no NaNs after one day. [ look at time 86400]

To-do

Content

Adds canopy temp to the list of implicitly stepped variables
Adds the canopy jacobian
Adds the canopy implicit tendency
updates scripts and some timesteps
adds a script testing timestep convergence, adds to buildkite
update docs Manifest to ClimaCore 0.14.11

I have read and checked the items on the review checklist.

experiments/long_runs/land.jl

src/standalone/Vegetation/canopy_boundary_fluxes.jl

test/standalone/Vegetation/canopy_model.jl

src/standalone/Vegetation/canopy_energy.jl

experiments/standalone/Vegetation/timestep_test.jl

src/standalone/Vegetation/canopy_energy.jl

src/standalone/Vegetation/component_models.jl

experiments/standalone/Vegetation/timestep_test.jl

dennisYatunin

These changes look good overall, but I have some concerns about the reliability of the tests. The current test results indicate that there are significant errors in the Jacobian approximation, but these errors might appear bigger than they actually are if there is a problem with the reference solutions. This issue can be dealt with in future PRs, though; the new infrastructure added here is fine to merge in.

src/standalone/Vegetation/canopy_boundary_fluxes.jl

dennisYatunin · 2024-08-27T00:47:17Z

test/standalone/Vegetation/canopy_model.jl

+        finitediff_LHF = (p_2.canopy.energy.lhf .- p.canopy.energy.lhf) ./ ΔT
+        estimated_LHF = p.canopy.energy.∂LHF∂qc .* p.canopy.energy.∂qc∂Tc
+        @test parent(abs.(finitediff_LHF .- estimated_LHF) ./ finitediff_LHF)[1] <
+              0.5


These Jacobian errors seem rather large to me. Although they're most likely caused by issues with your approximation, they could also be coming from the reference finite difference solution if your system is sufficiently nonlinear. I don't have a good intuition for whether this is the case here, but you can check by using autodiff for the reference instead of finite differences. I can help you set that up using ForwardDiff.jl if you think it's worth looking into.

src/shared_utilities/implicit_timestepping.jl

dennisYatunin · 2024-08-27T00:54:52Z

src/shared_utilities/implicit_timestepping.jl

+    available_explicit_vars =
+        MatrixFields.unrolled_filter(is_in_Y, explicit_vars)
+
+    get_jac_type(


It seems a little risky to determine the jacobian block type based on the space rather than the variable name. Unless you are certain that your 3D variable derivatives will always be tridiagonal (which is not the case in ClimaAtmos), I would recommend dispatching on variable names here instead.

dennisYatunin · 2024-08-27T00:56:16Z

experiments/integrated/fluxnet/fluxnet_simulation.jl


 # Set up timestepper
-timestepper = CTS.ARS343();
+timestepper = CTS.ARS111();


Did you encounter some sort of issue with the ARS343 timestepper here (and in other experiments), or did you just switch to IMEX Euler for convenience?

experiments/standalone/Vegetation/timestep_test.jl

experiments/integrated/fluxnet/fluxnet_simulation.jl

experiments/standalone/Vegetation/timestep_test.jl

src/shared_utilities/implicit_timestepping.jl

src/standalone/Vegetation/canopy_energy.jl

juliasloan25 · 2024-09-17T22:43:47Z

Some insights we gained during explicitly-stepped experiments in #771 -

In testing the canopy temperature implicit solver implemented in #675, we saw unexpected convergence behavior. To investigate if the cause of this was due to inacurracy of the reference solution or flaws in the simulation setup, we take a look here at the convergence behavior of the same setup using an explicit solver.

Key point 1: `ac_canopy` affects time to steady state

Here we can see the evolution of canopy temperature over time using two different ac_canopy values. Atmospheric and radiative drivers are updated every 3 hours, so we see the temperature sharply change every 3 hours in both cases. The temperature then tends towards steady state but only reaches steady state with the smaller ac_canopy=1e3, creating the flat sections of the plot.

We want to use the smaller ac_canopy=1e3 moving forward since it's thought to be more physically accurate. With this value, we will see the pattern of temperature changing following a driver update then remaining constant upon reaching steady state, producing two distinct regimes.

Key point 2: convergence behavior should be assessed outside of the steady state regime

Here we have two plots showing convergence using ARS111 with only an explicit tendency (nothing stepped implicitly). The left plot shows convergence for simulations run for 12 hours + 80 seconds, while the left was run for 12 hours. The plotted line has slope 1, so we see the expected convergence behavior in the first case but not in the second.

In the first case, the simulations end shortly after a driver update, so outside of the steady state regime. All simulations are converging towards the steady state, but are not yet there, so we see the expected convergence behavior.

In the second case, the simulations end during the steady state regime. Simulations with all timesteps have converged to the same steady state, so the error for each of them is very small and doesn't converge as we would expect. This doesn't mean the convergence behavior of the solver/tendency is wrong; we've just stopped the simulation at an uninformative time.

Conclusion

When testing the convergence of a solver for canopy temperature (or similar equations that reach steady state), we need to make sure we check convergence at a time outside of the steady state regime. Also note that this could have effects on errors throughout the simulation - e.g. error peaking sharply after a driver update then returning to a small value as steady state is reached.

Note also that when running for a simulation time that is not a multiple of the interval used in saveat, we need to manually add tf to the saveat array.

juliasloan25 · 2024-09-18T23:24:39Z

Here is the convergence of the implicit solver when running the simulation for 3 hours and updating the drivers at every timestep. This shows the expected convergence behavior for timesteps up to ~1800s (all dts shown are [12.0, 24.0, 48.0, 100.0, 225.0, 450.0, 900.0, 1800.0, 3600.0].

kmdeck · 2024-09-18T23:30:07Z

Here is the convergence of the implicit solver when running the simulation for 3 hours and updating the drivers at every timestep. This shows the expected convergence behavior for timesteps up to ~1800s (all dts shown are [12.0, 24.0, 48.0, 100.0, 225.0, 450.0, 900.0, 1800.0, 3600.0].

Which convergence checker/max iters combo are we using? this is with ARS111? Just curious because, as this looks quite good already, I am wondering if we can improve the error

juliasloan25 · 2024-09-18T23:36:34Z

Here is the convergence of the implicit solver when running the simulation for 3 hours and updating the drivers at every timestep. This shows the expected convergence behavior for timesteps up to ~1800s (all dts shown are [12.0, 24.0, 48.0, 100.0, 225.0, 450.0, 900.0, 1800.0, 3600.0].

Which convergence checker/max iters combo are we using? this is with ARS111? Just curious because, as this looks quite good already, I am wondering if we can improve the error

This is using ARS111 with max_iters = 6 and no convergence checker

juliasloan25 · 2024-09-19T22:30:00Z

Here are plots showing the difference in T at the end of a 100 day simulation between runs with dt=10s and dt=1800s. The plot on the left uses max_iters = 6, and the plot on the right uses max_iters = 10. Both have the same features and look nearly identical, except the mean error is slightly larger in the case with max_iters = 10.

kmdeck · 2024-09-19T22:32:31Z

Here are plots showing the difference in T at the end of a 100 day simulation between runs with dt=10s and dt=1800s. The plot on the left uses max_iters = 6, and the plot on the right uses max_iters = 10. Both have the same features and look nearly identical, except the mean error is slightly larger in the case with max_iters = 10.

thanks for checking this! I think this is consistent with there being certain times where the Newton iterations do not converge (and not because we dont carry out enough iterations, but because J is so far off)

kmdeck force-pushed the kd/canopy_temp_implicit branch from 150b23a to eadfba1 Compare July 12, 2024 22:10

kmdeck added enhancement New feature or request LSMv1 performance improve performance labels Jul 12, 2024

This was linked to issues Jul 12, 2024

Generalize variable names in Jacobian initialization #661

Open

Implicit timestepper for canopy temperature or solve flux balance equation #696

Open

kmdeck removed a link to an issue Jul 12, 2024

Generalize variable names in Jacobian initialization #661

Open

kmdeck force-pushed the kd/canopy_temp_implicit branch from 222a4c7 to c7aaf40 Compare July 15, 2024 02:06

kmdeck added the Run long runs label Jul 15, 2024

kmdeck self-assigned this Aug 8, 2024

kmdeck force-pushed the kd/canopy_temp_implicit branch from c06b355 to aa327e9 Compare August 15, 2024 00:37

kmdeck commented Aug 15, 2024

View reviewed changes

experiments/long_runs/land.jl Outdated Show resolved Hide resolved

kmdeck commented Aug 15, 2024

View reviewed changes

src/standalone/Vegetation/canopy_boundary_fluxes.jl Show resolved Hide resolved

kmdeck commented Aug 15, 2024

View reviewed changes

src/standalone/Vegetation/canopy_boundary_fluxes.jl Show resolved Hide resolved

kmdeck commented Aug 15, 2024

View reviewed changes

test/standalone/Vegetation/canopy_model.jl Outdated Show resolved Hide resolved

kmdeck requested a review from dennisYatunin August 15, 2024 20:26

kmdeck force-pushed the kd/canopy_temp_implicit branch 3 times, most recently from 8323271 to f754df6 Compare August 19, 2024 18:14

kmdeck requested a review from Sbozzolo August 19, 2024 23:37