Skip to content

Commit

Permalink
Merge pull request #210 from ArnoStrouwen/LT
Browse files Browse the repository at this point in the history
[skip ci] LanguageTool
  • Loading branch information
ChrisRackauckas authored Jan 12, 2023
2 parents b1ad9b1 + 84ab23c commit 761c32d
Show file tree
Hide file tree
Showing 8 changed files with 60 additions and 59 deletions.
37 changes: 19 additions & 18 deletions docs/src/getting_started.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
# Getting Started with Optimization-Based ODE Parameter Estimation

In this tutorial we will showcase how to estimate the parameters of an ordinary
differential equation using DiffEqParamEstim.jl. DiffEqParamEstim.jl is a high
level tool that makes common parameter estimation tasks simple. Here we will show
In this tutorial, we will showcase how to estimate the parameters of an ordinary
differential equation using DiffEqParamEstim.jl. DiffEqParamEstim.jl is a high-level
tool that makes common parameter estimation tasks simple. Here, we will show
how to use its cost function generation to estimate the parameters of the Lotka-Volterra
equation against simulated data.

## Installation

First we will make sure DiffEqParamEstim.jl is installed correctly. To do this, we use the
First, we will make sure DiffEqParamEstim.jl is installed correctly. To do this, we use the
Julia package REPL, opened by typing `]` in the REPL and seeing `pkg>` appear in blue.
Then we type: `add DiffEqParamEstim` and hit enter. This will run the package management
sequence and we will be good to go.
sequence, and we will be good to go.

## Required Dependencies

Expand Down Expand Up @@ -57,7 +57,7 @@ randomized = VectorOfArray([(sol(t[i]) + .01randn(2)) for i in 1:length(t)])
data = convert(Array,randomized)
```

Here we used `VectorOfArray` from [RecursiveArrayTools.jl](https://docs.sciml.ai/RecursiveArrayTools/stable/)
Here, we used `VectorOfArray` from [RecursiveArrayTools.jl](https://docs.sciml.ai/RecursiveArrayTools/stable/)
to turn the result of an ODE into a matrix.

If we plot the solution with the parameter at `a=1.42`, we get the following:
Expand All @@ -69,7 +69,7 @@ plot(sol)
plot!(newsol)
```

Notice that after one period this solution begins to drift very far off: this
Notice that after one period, this solution begins to drift very far off: this
problem is sensitive to the choice of `a`.

To build the objective function for Optim.jl, we simply call the `build_loss_objective`
Expand All @@ -88,7 +88,7 @@ error more quickly when in bad regions of the parameter space, speeding up the
process. If the integrator stops early (due to divergence), then those parameters
are given an infinite loss, and thus this is a quick way to avoid bad parameters.
We set `verbose=false` because this divergence can get noisy. The `Optimization.AutoForwardDiff()`
is a choice of automatic differentiation, i.e. how the gradients are calculated.
is a choice of automatic differentiation, i.e., how the gradients are calculated.
For more information on this choice, see
[the automatic differentiation choice API](https://docs.sciml.ai/Optimization/stable/API/optimization_function/#Automatic-Differentiation-Construction-Choice-Recommendations).

Expand Down Expand Up @@ -127,13 +127,13 @@ plot(sol)
plot!(newsol)
```

Note that some of the algorithms may be sensitive to the initial condition. For more
Note that some algorithms may be sensitive to the initial condition. For more
details on using Optim.jl, see the [documentation for Optim.jl](https://julianlsolvers.github.io/Optim.jl/stable/).

### Adding Bounds Constraints

We can improve our solution by noting that the Lotka-Volterra equation requires that
the parameters are positive. Thus
the parameters are positive. Thus,
[following the Optimization.jl documentation](https://docs.sciml.ai/Optimization/stable/API/optimization_problem/)
we can add box constraints to ensure the optimizer only checks between `0.0` and `3.0`
which improves the efficiency of our algorithm. We pass the `lb` and `ub` keyword
Expand Down Expand Up @@ -176,7 +176,7 @@ result_bfgs = solve(optprob, BFGS())
### Alternative Cost Functions for Increased Robustness

The `build_loss_objective` with `L2Loss` is the most naive approach for parameter estimation.
There are many other
There are many others.

We can also use First-Differences in L2Loss by passing the kwarg `differ_weight` which decides the contribution of the
differencing loss to the total loss.
Expand Down Expand Up @@ -211,8 +211,8 @@ ms_obj = multiple_shooting_objective(ms_prob, Tsit5(), L2Loss(t, data), Optimiza
discontinuity_weight = 1.0, abstol = 1e-12, reltol = 1e-12)
```

This creates the objective function that can be passed to an optimizer from which we can then get the parameter values
and the initial values of the short time periods keeping in mind the indexing. Now we mix this with
This creates the objective function that can be passed to an optimizer, from which we can then get the parameter values
and the initial values of the short time periods, keeping in mind the indexing. Now we mix this with
a global optimization method to improve robustness even more:

```@example ode
Expand All @@ -226,22 +226,22 @@ optsol_ms.u[end-1:end]

Here as our model had 2 parameters, we look at the last 2 indexes of `result` to get our parameter values and
the rest of the values are the initial values of the shorter timespans as described in the reference section.
We can also use a gradient based optimizer with the multiple shooting objective.
We can also use a gradient-based optimizer with the multiple shooting objective.

```@example ode
optsol_ms = solve(optprob, BFGS())
optsol_ms.u[end-1:end]
```

The objective function for Two Stage method can be created and passed to an optimizer as
The objective function for the Two Stage method can be created and passed to an optimizer as

```@example ode
two_stage_obj = two_stage_objective(ms_prob, t, data, Optimization.AutoForwardDiff())
optprob = Optimization.OptimizationProblem(two_stage_obj, [1.3,0.8,2.8,1.2])
result = solve(optprob, Optim.BFGS())
```

The default kernel used in the method is `Epanechnikov` others that are available are `Uniform`, `Triangular`,
The default kernel used in the method is `Epanechnikov`, available others are `Uniform`, `Triangular`,
`Quartic`, `Triweight`, `Tricube`, `Gaussian`, `Cosine`, `Logistic` and `Sigmoid`, this can be passed by the
`kernel` keyword argument. `loss_func` keyword argument can be used to pass the loss function (cost function) you want
to use and passing a valid
Expand All @@ -250,5 +250,6 @@ The default kernel used in the method is `Epanechnikov` others that are availabl
## Conclusion

There are many more choices for how to improve the robustness of a parameter estimation. With all of these tools,
one likely should never do the simple "solve it with `p` and check the L2 loss", but instead use these tricks to
improve the loss landscape and increase the ability for optimizers to find the globally best parameters.
one likely should never do the simple “solve it with `p` and check the L2 loss”.
Instead, we should use these tricks to improve the loss landscape
and increase the ability for optimizers to find globally the best parameters.
2 changes: 1 addition & 1 deletion docs/src/methods/collocation_loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ fit by least squares) and optimizing
the derivative function and the data's timepoints to match the derivatives
of the smoothed trajectory. This method has less accuracy than other methods
but is much faster, and is a good method to try first to get in the general
"good parameter" region, to then finish using one of the other methods.
good parameter region, to then finish using one of the other methods.

```julia
function two_stage_objective(prob::DEProblem, tpoints, data, adtype = SciMLBase.NoAD(),;
Expand Down
16 changes: 8 additions & 8 deletions docs/src/methods/optimization_based_methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,12 @@ function build_loss_objective(prob::DEProblem, alg, loss,

The first argument is the `DEProblem` to solve, and next is the `alg` to use.
The `alg` must match the problem type, which can be any `DEProblem`
(ODEs, SDEs, DAEs, DDEs, etc.). `regularization` defaults to nothing which has no regularization function.
(ODEs, SDEs, DAEs, DDEs, etc.). `regularization` defaults to nothing, which has no regularization function.
The extra keyword arguments are passed to the differential equation solver.

### Multiple Shooting

Multiple Shooting is generally used in Boundary Value Problems (BVP) and is
Multiple Shooting is often used in Boundary Value Problems (BVP) and is
more robust than the regular objective function used in these problems. It
proceeds as follows:

Expand All @@ -47,7 +47,7 @@ function multiple_shooting_objective(prob::DiffEqBase.DEProblem, alg, loss,

For consistency `multiple_shooting_objective` takes exactly the same arguments
as `build_loss_objective`. It also has the option for `discontinuity_weight` as
a keyword argument which assigns weight to the error occurring due to the
a keyword argument, which assigns weight to the error occurring due to the
discontinuity that arises from the breaking up of the time span.

## Detailed Explanations of Arguments
Expand All @@ -58,7 +58,7 @@ discontinuity that arises from the breaking up of the time span.
loss(sol)
```

is the function which reduces the problem's solution to a scalar which the
is the function, which reduces the problem's solution to a scalar, which the
optimizer will try to minimize. While this is very
flexible, two convenience routines are included for fitting to data with standard
cost functions:
Expand All @@ -68,7 +68,7 @@ L2Loss(t, data; differ_weight=nothing, data_weight=nothing,
colloc_grad=nothing, dudt=nothing)
```

where `t` is the set of timepoints which the data is found at, and
where `t` is the set of timepoints which the data are found at, and
`data` are the values that are known where each column corresponds to measures
of the values of the system. `L2Loss` is an optimized version
of the L2-distance. The `data_weight` is a scalar or vector
Expand Down Expand Up @@ -98,7 +98,7 @@ is the likelihood distributions from a `UnivariateDistribution` from
corresponds to the likelihood at `t[i]` for component `j`. The second case is
where `distributions[i]` is a `MultivariateDistribution` which corresponds to
the likelihood at `t[i]` over the vector of components. This likelihood function
then calculates the negative of the total loglikelihood over time as its objective
then calculates the negative of the total log-likelihood over time as its objective
value (negative since optimizers generally find minimums, and thus this corresponds
to maximum likelihood estimation). The third term, `diff_distributions`, acts
similarly but allows putting a distribution on the first difference terms
Expand Down Expand Up @@ -138,7 +138,7 @@ L2Loss(t, data, differ_weight=0.3, data_weight=0.7)
First differencing incorporates the differences of data points at consecutive
time points which adds more information about the trajectory in the loss
function. Adding first differencing is helpful in cases where the `L2Loss`
alone leads to non-identifiable parameters but adding a first differencing
alone leads to non-identifiable parameters, but adding a first differencing
term makes it more identifiable. This can be noted on stochastic differential
equation models, where this aims to capture the autocorrelation and therefore
helps us avoid getting the same stationary distribution despite different
Expand Down Expand Up @@ -205,7 +205,7 @@ which simply uses `p` as the initial condition in the initial value problem.
## Using the Objectives for MAP estimates

You can also add a prior option to `build_loss_objective` and `multiple_shooting_objective` that
essentially turns it into MAP by multiplying the loglikelihood (the cost) by the prior. The option is available
essentially turns it into MAP by multiplying the log-likelihood (the cost) by the prior. The option is available
as the keyword argument `priors`, it can take in either an array of univariate distributions for each of
the parameters or a multivariate distribution.

Expand Down
4 changes: 2 additions & 2 deletions docs/src/methods/recommended_methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ of your choice. This method can thus be paired with global optimizers
from packages like BlackBoxOptim.jl or NLopt.jl which can be much less prone to
finding local minima than local optimization methods. Also, it allows the user
to define the cost function in the way they choose as a function
`loss(sol)`, and thus can fit using any cost function on the solution,
`loss(sol)`. This package can thus fit using any cost function on the solution,
making it applicable to fitting non-temporal data and other types of
problems. Also, `build_loss_objective` works for all of the `DEProblem`
problems. Also, `build_loss_objective` works for all the `DEProblem`
types, allowing it to optimize parameters on ODEs, SDEs, DDEs, DAEs,
etc.

Expand Down
20 changes: 10 additions & 10 deletions docs/src/tutorials/ensemble.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Fitting Ensembles of ODE Models to Data

In this tutoiral we will showcase how to fit multiple models simultaniously to respective
In this tutorial, we will showcase how to fit multiple models simultaneously to respective
data sources. Let's dive right in!

## Formulating the Ensemble Model

First you want to create a problem which solves multiple problems at the same time. This is
First, you want to create a problem which solves multiple problems at the same time. This is
the `EnsembleProblem`. When the parameter estimation tools say it will take any DEProblem,
it really means ANY DEProblem, which includes `EnsembleProblem`.

Expand Down Expand Up @@ -48,7 +48,7 @@ sim = solve(enprob,Tsit5(),trajectories=N)
plot(sim)
```

`trajectories=N` means "run N times", and each time it runs the problem returned by the `prob_func`, which is always the same problem but with the `i`th initial condition.
`trajectories=N` means run N times, and each time it runs the problem returned by the `prob_func`, which is always the same problem but with the `i`th initial condition.

Now let's generate a dataset from that. Let's get data points at every t=0.1 using `saveat`,
and then convert the solution into an array.
Expand All @@ -65,8 +65,8 @@ is the `k`th solution). So `data[i,j,k]` is the `j`th timepoint of the `i`th var
trajectory.

Now let's build a loss function. A loss function is some `loss(sol)` that spits out a scalar
for how far from optimal we are. In the documentation I show that we normally do
`loss = L2Loss(t,data)`, but we can bootstrap off of this. Instead lets build an array of `N` loss
for how far from optimal we are. In the documentation, I show that we normally do
`loss = L2Loss(t,data)`, but we can bootstrap off of this. Instead, let's build an array of `N` loss
functions, each one with the correct piece of data.

```@example ensemble
Expand All @@ -92,8 +92,8 @@ sim = solve(enprob,Tsit5(),trajectories=N,saveat=data_times)
loss(sim)
```

and get a non-zero loss. So we now have our problem, our data, and our loss function... we
have what we need.
and get a non-zero loss. So, we now have our problem, our data, and our loss function
we have what we need.

Put this into build_loss_objective.

Expand All @@ -105,9 +105,9 @@ obj = build_loss_objective(enprob,Tsit5(),loss,Optimization.AutoForwardDiff(),tr
Notice that we added the kwargs for `solve` of the `EnsembleProblem` into this. They get passed to the internal `solve`
command, so then the loss is computed on `N` trajectories at `data_times`.

Thus we take this objective function over to any optimization package. Here, since the Lotka-Volterra equation requires positive parameters,
we use Fminbox to make sure the parameters stay within passed bounds. Let's start the optimization with
[1.3,0.9], Optim spits out that the true parameters are:
Thus, we take this objective function over to any optimization package. Here, since the Lotka-Volterra equation requires positive parameters,
we use Fminbox to make sure the parameters stay within the passed bounds.
Let's start the optimization with [1.3,0.9], Optim spits out that the true parameters are:

```@example ensemble
lower = zeros(2)
Expand Down
26 changes: 13 additions & 13 deletions docs/src/tutorials/generalized_likelihood.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Generalized Likelihood Inference

In this example we will demo the likelihood-based approach to parameter fitting.
First let's generate a dataset to fit. We will re-use the Lotka-Volterra equation
but in this case fit just two parameters.
In this example, we will demo the likelihood-based approach to parameter fitting.
First, let's generate a dataset to fit. We will re-use the Lotka-Volterra equation,
but in this case, fit just two parameters.

```@example likelihood
using DifferentialEquations, DiffEqParamEstim, Optimization, OptimizationBBO
Expand Down Expand Up @@ -31,11 +31,11 @@ end
aggregate_data = convert(Array,VectorOfArray([generate_data(sol,t) for i in 1:100]))
```

here with `t` we measure the solution at 200 evenly spaced points. Thus `aggregate_data`
here, with `t` we measure the solution at 200 evenly spaced points. Thus, `aggregate_data`
is a 2x200x100 matrix where `aggregate_data[i,j,k]` is the `i`th component at time
`j` of the `k`th dataset. What we first want to do is get a matrix of distributions
where `distributions[i,j]` is the likelihood of component `i` at take `j`. We
can do this via `fit_mle` on a chosen distributional form. For simplicity we
can do this via `fit_mle` on a chosen distributional form. For simplicity, we
choose the `Normal` distribution. `aggregate_data[i,j,:]` is the array of points
at the given component and time, and thus we find the distribution parameters
which fits best at each time point via:
Expand All @@ -51,13 +51,13 @@ Notice for example that we have:
distributions[1,1]
```

that is, it fit the distribution to have its mean just about where our original
solution was and the variance is about how much noise we added to the dataset.
This this is a good check to see that the distributions we are trying to fit
that is, it fits the distribution to have its mean just about where our original
solution was, and the variance is about how much noise we added to the dataset.
This is a good check to see that the distributions we are trying to fit
our parameters to makes sense.

Note that in this case the `Normal` distribution was a good choice, and in many
cases it's a nice go-to choice, but one should experiment with other choices
Note that in this case the `Normal` distribution was a good choice, and often
it's a nice go-to choice, but one should experiment with other choices
of distributions as well. For example, a `TDist` can be an interesting way to
incorporate robustness to outliers since low degrees of free T-distributions
act like Normal distributions but with longer tails (though `fit_mle` does not
Expand All @@ -72,7 +72,7 @@ obj = build_loss_objective(prob1,Tsit5(),LogLikeLoss(t,distributions),
maxiters=10000,verbose=false)
```

First let's use the objective function to plot the likelihood landscape:
First, let's use the objective function to plot the likelihood landscape:

```julia
using Plots; plotly()
Expand All @@ -84,7 +84,7 @@ heatmap(prange,prange,[obj([j,i]) for i in prange, j in prange],

![2 Parameter Likelihood](../assets/2paramlike.png)

Recall that this is the negative loglikelihood and thus the minimum is the
Recall that this is the negative log-likelihood, and thus the minimum is the
maximum of the likelihood. There is a clear valley where the first parameter
is 1.5, while the second parameter's likelihood is more muddled. By taking a
one-dimensional slice:
Expand All @@ -97,7 +97,7 @@ plot(prange,[obj([1.5,i]) for i in prange],lw=3,

![1 Parameter Likelihood](../assets/1paramlike.png)

we can see that there's still a clear minimum at the true value. Thus we will
we can see that there's still a clear minimum at the true value. Thus, we will
use the global optimizers from BlackBoxOptim.jl to find the values. We set our
search range to be from `0.5` to `5.0` for both of the parameters and let it
optimize:
Expand Down
Loading

0 comments on commit 761c32d

Please sign in to comment.