diff --git a/docs/src/differentiable_programming.md b/docs/src/differentiable_programming.md
index 4f082d3d4e..c2aff0b229 100644
--- a/docs/src/differentiable_programming.md
+++ b/docs/src/differentiable_programming.md
@@ -223,6 +223,145 @@ Here, we used some knowledge about the internal memory layout of Trixi, an array
 with the conserved variables as fastest-varying index in memory.
 
 
+### Differentiating through a complete simulation
+
+It is also possible to differentiate through a complete simulation. As an example, let's differentiate
+the total energy of a simulation using the linear scalar advection equation with respect to the
+wave number (frequency) of the initial data.
+
+```jldoctest advection_differentiate_simulation
+julia> using Trixi, OrdinaryDiffEq, ForwardDiff, Plots
+
+julia> function energy_at_final_time(k) # k is the wave number of the initial condition
+           equations = LinearScalarAdvectionEquation2D(1.0, -0.3)
+           mesh = TreeMesh((-1.0, -1.0), (1.0, 1.0), initial_refinement_level=3, n_cells_max=10^4)
+           solver = DGSEM(3, flux_lax_friedrichs)
+           initial_condition = (x, t, equation) -> begin
+               x_trans = Trixi.x_trans_periodic_2d(x - equation.advectionvelocity * t)
+               return SVector(sinpi(k * sum(x_trans)))
+           end
+           semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver,
+                                               uEltype=typeof(k))
+           ode = semidiscretize(semi, (0.0, 1.0))
+           sol = solve(ode, BS3(), save_everystep=false)
+           Trixi.integrate(energy_total, sol.u[end], semi)
+       end
+energy_at_final_time (generic function with 1 method)
+
+julia> k_values = range(0.9, 1.1, length=101)
+0.9:0.002:1.1
+
+julia> plot(k_values, energy_at_final_time.(k_values), label="Energy");
+```
+
+You should see a plot of a curve that resembles a parabola with local maximum around `k = 1.0`.
+Why's that? Well, the domain is fixed but the wave number changes. Thus, if the wave number is
+not chosen as an integer, the initial condition will not be a smooth periodic function in the
+given domain. Hence, the dissipative surface flux (`flux_lax_friedrichs` in this example)
+will introduce more dissipation. In particular, it will introduce more dissipation for "less smooth"
+initial data, corresponding to wave numbers `k` further away from integers.
+
+We can compute the discrete derivative of the energy at the final time with respect to the wave
+number `k` as follows.
+```jldoctest advection_differentiate_simulation
+julia> round(ForwardDiff.derivative(energy_at_final_time, 1.0), sigdigits=2)
+1.4e-5
+```
+
+This is rather small and we can treat it as zero in comparison to the value of this derivative at
+other wave numbers `k`.
+
+```jldoctest advection_differentiate_simulation
+julia> dk_values = ForwardDiff.derivative.((energy_at_final_time,), k_values);
+
+julia> plot(k_values, dk_values, label="Derivative");
+```
+
+If you remember basic calculus, a sufficient condition for a local maximum is that the first derivative
+vanishes and the second derivative is negative. We can also check this discretely.
+
+```jldoctest advection_differentiate_simulation
+julia> round(ForwardDiff.derivative(
+           k -> Trixi.ForwardDiff.derivative(energy_at_final_time, k),
+       1.0), sigdigits=2)
+-0.9
+```
+
+Having seen this application, let's break down what happens step by step.
+```julia
+julia> function energy_at_final_time(k) # k is the wave number of the initial condition
+           equations = LinearScalarAdvectionEquation2D(1.0, -0.3)
+           mesh = TreeMesh((-1.0, -1.0), (1.0, 1.0), initial_refinement_level=3, n_cells_max=10^4)
+           solver = DGSEM(3, flux_lax_friedrichs)
+           initial_condition = (x, t, equation) -> begin
+               x_trans = Trixi.x_trans_periodic_2d(x - equation.advectionvelocity * t)
+               return SVector(sinpi(k * sum(x_trans)))
+           end
+           semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver,
+                                               uEltype=typeof(k))
+           ode = semidiscretize(semi, (0.0, 1.0))
+           sol = solve(ode, BS3(), save_everystep=false)
+           Trixi.integrate(energy_total, sol.u[end], semi)
+       end
+
+julia> round(ForwardDiff.derivative(energy_at_final_time, 1.0), sigdigits=2)
+1.4e-5
+```
+When calling `ForwardDiff.derivative(energy_at_final_time, 1.0)`, ForwardDiff.jl
+will basically use the chain rule and known derivatives of existing basic functions
+to calculate the derivative of the energy at the final time with respect to the
+wave number `k` at `k0 = 1.0`. To do this, ForwardDiff.jl uses dual numbers, which
+basically store the result and its derivative w.r.t. a specified parameter at the
+same time. Thus, we need to make sure that we can treat these `ForwardDiff.Dual`
+numbers everywhere during the computation. Fortunately, generic Julia code usually
+supports these operations. The most basic problem for a developer is to ensure
+that all types are generic enough, in particular the ones of internal caches.
+
+The first step in this example creates some basic ingredients of our simulation.
+```julia
+equations = LinearScalarAdvectionEquation2D(1.0, -0.3)
+mesh = TreeMesh((-1.0, -1.0), (1.0, 1.0), initial_refinement_level=3, n_cells_max=10^4)
+solver = DGSEM(3, flux_lax_friedrichs)
+```
+These do not have internal caches storing intermediate values of the numerical
+solution, so we do not need to adapt them. In fact, we could also define them
+outside of `energy_at_final_time` (but would need to take care of globals or
+wrap everything in another function).
+
+Next, we define the initial condition
+```julia
+initial_condition = (x, t, equation) -> begin
+    x_trans = Trixi.x_trans_periodic_2d(x - equation.advectionvelocity * t)
+    return SVector(sinpi(k * sum(x_trans)))
+end
+```
+as a closure capturing the wave number `k` passed to `energy_at_final_time`.
+If you call `energy_at_final_time(1.0)`, `k` will be a `Float64`. Thus, the
+return values of `initial_condition` will be `SVector`s of `Float64`s. When
+calculating the `ForwardDiff.derivative`, `k` will be a `ForwardDiff.Dual` number.
+Hence, the `initial_condition` will return `SVector`s of `ForwardDiff.Dual`
+numbers.
+
+The semidiscretization `semi` uses some internal caches to avoid repeated allocations
+and speed up the computations, e.g. for numerical fluxes at interfaces. Thus, we
+need to tell Trixi to allow `ForwardDiff.Dual` numbers in these caches. That's what
+the keyword argument `uEltype=typeof(k)` in
+```julia
+semi = SemidiscretizationHyperbolic(mesh, equations, initial_condition, solver,
+                                    uEltype=typeof(k))
+```
+does. This is basically the only part where you need to modify your standard Trixi
+code to enable automatic differentiation. From there on, the remaining steps
+```julia
+ode = semidiscretize(semi, (0.0, 1.0))
+sol = solve(ode, BS3(), save_everystep=false)
+Trixi.integrate(energy_total, sol.u[end], semi)
+```
+do not need any modifications since they are sufficiently generic (and enough effort
+has been spend to allow general types inside thee calls).
+
+
+
 ## Finite difference approximations
 
 Trixi provides the convenience function [`jacobian_fd`](@ref) to approximate the Jacobian
diff --git a/src/semidiscretization/semidiscretization.jl b/src/semidiscretization/semidiscretization.jl
index 8e68bf0eb7..09d9b12720 100644
--- a/src/semidiscretization/semidiscretization.jl
+++ b/src/semidiscretization/semidiscretization.jl
@@ -216,15 +216,26 @@ function jacobian_ad_forward(semi::AbstractSemidiscretization;
                              t0=zero(real(semi)),
                              u0_ode=compute_coefficients(t0, semi))
 
-  J = ForwardDiff.jacobian((du_ode, u_ode) -> begin
-    new_semi = remake(semi, uEltype=eltype(u_ode))
+  du_ode = similar(u0_ode)
+  config = ForwardDiff.JacobianConfig(nothing, du_ode, u0_ode)
+
+  # Use a function barrier since the generation of the `config` we use above
+  # is not type-stable
+  _jacobian_ad_forward(semi, t0, u0_ode, du_ode, config)
+end
+
+function _jacobian_ad_forward(semi, t0, u0_ode, du_ode, config)
+
+  new_semi = remake(semi, uEltype=eltype(config))
+  J = ForwardDiff.jacobian(du_ode, u0_ode, config) do du_ode, u_ode
     Trixi.rhs!(du_ode, u_ode, new_semi, t0)
-  end, similar(u0_ode), u0_ode)
+  end
 
   return J
 end
 
 
+
 # Sometimes, it can be useful to save some (scalar) variables associated with each element,
 # e.g. AMR indicators or shock indicators. Since these usually have to be re-computed
 # directly before IO and do not necessarily need to be stored in memory before,
diff --git a/src/solvers/dg/containers_1d.jl b/src/solvers/dg/containers_1d.jl
index 64f75f3273..dd7de77cfd 100644
--- a/src/solvers/dg/containers_1d.jl
+++ b/src/solvers/dg/containers_1d.jl
@@ -13,6 +13,7 @@ end
 
 nvariables(::ElementContainer1D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::ElementContainer1D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::ElementContainer1D{RealT, uEltype}) where {RealT, uEltype} = uEltype
 
 # Only one-dimensional `Array`s are `resize!`able in Julia.
 # Hence, we use `Vector`s as internal storage and `resize!`
@@ -123,6 +124,7 @@ end
 
 nvariables(::InterfaceContainer1D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::InterfaceContainer1D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::InterfaceContainer1D{uEltype}) where {uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(interfaces::InterfaceContainer1D, capacity)
@@ -277,6 +279,7 @@ end
 
 nvariables(::BoundaryContainer1D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::BoundaryContainer1D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::BoundaryContainer1D{RealT, uEltype}) where {RealT, uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(boundaries::BoundaryContainer1D, capacity)
diff --git a/src/solvers/dg/containers_2d.jl b/src/solvers/dg/containers_2d.jl
index db7d509c59..88f0f48470 100644
--- a/src/solvers/dg/containers_2d.jl
+++ b/src/solvers/dg/containers_2d.jl
@@ -13,6 +13,7 @@ end
 
 nvariables(::ElementContainer2D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::ElementContainer2D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::ElementContainer2D{RealT, uEltype}) where {RealT, uEltype} = uEltype
 
 # Only one-dimensional `Array`s are `resize!`able in Julia.
 # Hence, we use `Vector`s as internal storage and `resize!`
@@ -128,6 +129,7 @@ end
 
 nvariables(::InterfaceContainer2D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::InterfaceContainer2D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::InterfaceContainer2D{uEltype}) where {uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(interfaces::InterfaceContainer2D, capacity)
@@ -294,6 +296,7 @@ end
 
 nvariables(::BoundaryContainer2D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::BoundaryContainer2D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::BoundaryContainer2D{RealT, uEltype}) where {RealT, uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(boundaries::BoundaryContainer2D, capacity)
@@ -493,6 +496,7 @@ end
 
 nvariables(::L2MortarContainer2D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::L2MortarContainer2D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::L2MortarContainer2D{uEltype}) where {uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(mortars::L2MortarContainer2D, capacity)
@@ -693,6 +697,7 @@ end
 
 nvariables(::MPIInterfaceContainer2D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::MPIInterfaceContainer2D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::MPIInterfaceContainer2D{uEltype}) where {uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(mpi_interfaces::MPIInterfaceContainer2D, capacity)
diff --git a/src/solvers/dg/containers_3d.jl b/src/solvers/dg/containers_3d.jl
index 6ea22626ea..0fbb81bc2c 100644
--- a/src/solvers/dg/containers_3d.jl
+++ b/src/solvers/dg/containers_3d.jl
@@ -13,6 +13,7 @@ end
 
 nvariables(::ElementContainer3D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::ElementContainer3D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::ElementContainer3D{RealT, uEltype}) where {RealT, uEltype} = uEltype
 
 # Only one-dimensional `Array`s are `resize!`able in Julia.
 # Hence, we use `Vector`s as internal storage and `resize!`
@@ -130,6 +131,7 @@ end
 
 nvariables(::InterfaceContainer3D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::InterfaceContainer3D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::InterfaceContainer3D{uEltype}) where {uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(interfaces::InterfaceContainer3D, capacity)
@@ -292,6 +294,7 @@ end
 
 nvariables(::BoundaryContainer3D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::BoundaryContainer3D{RealT, uEltype, NVARS, POLYDEG}) where {RealT, uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::BoundaryContainer3D{RealT, uEltype}) where {RealT, uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(boundaries::BoundaryContainer3D, capacity)
@@ -511,6 +514,7 @@ end
 
 nvariables(::L2MortarContainer3D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = NVARS
 polydeg(::L2MortarContainer3D{uEltype, NVARS, POLYDEG}) where {uEltype, NVARS, POLYDEG} = POLYDEG
+Base.eltype(::L2MortarContainer3D{uEltype}) where {uEltype} = uEltype
 
 # See explanation of Base.resize! for the element container
 function Base.resize!(mortars::L2MortarContainer3D, capacity)
diff --git a/src/solvers/dg/dg.jl b/src/solvers/dg/dg.jl
index d87c079772..8c9cb2502e 100644
--- a/src/solvers/dg/dg.jl
+++ b/src/solvers/dg/dg.jl
@@ -249,7 +249,7 @@ end
 function allocate_coefficients(mesh::TreeMesh, equations, dg::DG, cache)
   # We must allocate a `Vector` in order to be able to `resize!` it (AMR).
   # cf. wrap_array
-  zeros(real(dg), nvariables(equations) * nnodes(dg)^ndims(mesh) * nelements(dg, cache))
+  zeros(eltype(cache.elements), nvariables(equations) * nnodes(dg)^ndims(mesh) * nelements(dg, cache))
 end
 
 
diff --git a/src/solvers/dg/dg_2d.jl b/src/solvers/dg/dg_2d.jl
index 6d13be7436..af8be80795 100644
--- a/src/solvers/dg/dg_2d.jl
+++ b/src/solvers/dg/dg_2d.jl
@@ -99,10 +99,12 @@ end
 function create_cache(mesh::TreeMesh{2}, equations, mortar_l2::LobattoLegendreMortarL2, uEltype)
   # TODO: Taal compare performance of different types
   MA2d = MArray{Tuple{nvariables(equations), nnodes(mortar_l2)}, uEltype}
-  # A2d  = Array{uEltype, 2}
+  fstar_upper_threaded = MA2d[MA2d(undef) for _ in 1:Threads.nthreads()]
+  fstar_lower_threaded = MA2d[MA2d(undef) for _ in 1:Threads.nthreads()]
 
-  fstar_upper_threaded = [MA2d(undef) for _ in 1:Threads.nthreads()]
-  fstar_lower_threaded = [MA2d(undef) for _ in 1:Threads.nthreads()]
+  # A2d = Array{uEltype, 2}
+  # fstar_upper_threaded = [A2d(undef, nvariables(equations), nnodes(mortar_l2)) for _ in 1:Threads.nthreads()]
+  # fstar_lower_threaded = [A2d(undef, nvariables(equations), nnodes(mortar_l2)) for _ in 1:Threads.nthreads()]
 
   (; fstar_upper_threaded, fstar_lower_threaded)
 end