From 84ab23cfcebadd28b2d3d2d9cbbd18e410e97c90 Mon Sep 17 00:00:00 2001 From: ArnoStrouwen Date: Wed, 11 Jan 2023 21:42:26 +0100 Subject: [PATCH] [skip ci] LanguageTool --- docs/src/getting_started.md | 37 ++++++++++--------- docs/src/methods/collocation_loss.md | 2 +- .../src/methods/optimization_based_methods.md | 16 ++++---- docs/src/methods/recommended_methods.md | 4 +- docs/src/tutorials/ensemble.md | 20 +++++----- docs/src/tutorials/generalized_likelihood.md | 26 ++++++------- docs/src/tutorials/global_optimization.md | 10 ++--- docs/src/tutorials/stochastic_evaluations.md | 4 +- 8 files changed, 60 insertions(+), 59 deletions(-) diff --git a/docs/src/getting_started.md b/docs/src/getting_started.md index 92c62a13..ba0f6e59 100644 --- a/docs/src/getting_started.md +++ b/docs/src/getting_started.md @@ -1,17 +1,17 @@ # Getting Started with Optimization-Based ODE Parameter Estimation -In this tutorial we will showcase how to estimate the parameters of an ordinary -differential equation using DiffEqParamEstim.jl. DiffEqParamEstim.jl is a high -level tool that makes common parameter estimation tasks simple. Here we will show +In this tutorial, we will showcase how to estimate the parameters of an ordinary +differential equation using DiffEqParamEstim.jl. DiffEqParamEstim.jl is a high-level +tool that makes common parameter estimation tasks simple. Here, we will show how to use its cost function generation to estimate the parameters of the Lotka-Volterra equation against simulated data. ## Installation -First we will make sure DiffEqParamEstim.jl is installed correctly. To do this, we use the +First, we will make sure DiffEqParamEstim.jl is installed correctly. To do this, we use the Julia package REPL, opened by typing `]` in the REPL and seeing `pkg>` appear in blue. Then we type: `add DiffEqParamEstim` and hit enter. This will run the package management -sequence and we will be good to go. +sequence, and we will be good to go. ## Required Dependencies @@ -57,7 +57,7 @@ randomized = VectorOfArray([(sol(t[i]) + .01randn(2)) for i in 1:length(t)]) data = convert(Array,randomized) ``` -Here we used `VectorOfArray` from [RecursiveArrayTools.jl](https://docs.sciml.ai/RecursiveArrayTools/stable/) +Here, we used `VectorOfArray` from [RecursiveArrayTools.jl](https://docs.sciml.ai/RecursiveArrayTools/stable/) to turn the result of an ODE into a matrix. If we plot the solution with the parameter at `a=1.42`, we get the following: @@ -69,7 +69,7 @@ plot(sol) plot!(newsol) ``` -Notice that after one period this solution begins to drift very far off: this +Notice that after one period, this solution begins to drift very far off: this problem is sensitive to the choice of `a`. To build the objective function for Optim.jl, we simply call the `build_loss_objective` @@ -88,7 +88,7 @@ error more quickly when in bad regions of the parameter space, speeding up the process. If the integrator stops early (due to divergence), then those parameters are given an infinite loss, and thus this is a quick way to avoid bad parameters. We set `verbose=false` because this divergence can get noisy. The `Optimization.AutoForwardDiff()` -is a choice of automatic differentiation, i.e. how the gradients are calculated. +is a choice of automatic differentiation, i.e., how the gradients are calculated. For more information on this choice, see [the automatic differentiation choice API](https://docs.sciml.ai/Optimization/stable/API/optimization_function/#Automatic-Differentiation-Construction-Choice-Recommendations). @@ -127,13 +127,13 @@ plot(sol) plot!(newsol) ``` -Note that some of the algorithms may be sensitive to the initial condition. For more +Note that some algorithms may be sensitive to the initial condition. For more details on using Optim.jl, see the [documentation for Optim.jl](https://julianlsolvers.github.io/Optim.jl/stable/). ### Adding Bounds Constraints We can improve our solution by noting that the Lotka-Volterra equation requires that -the parameters are positive. Thus +the parameters are positive. Thus, [following the Optimization.jl documentation](https://docs.sciml.ai/Optimization/stable/API/optimization_problem/) we can add box constraints to ensure the optimizer only checks between `0.0` and `3.0` which improves the efficiency of our algorithm. We pass the `lb` and `ub` keyword @@ -176,7 +176,7 @@ result_bfgs = solve(optprob, BFGS()) ### Alternative Cost Functions for Increased Robustness The `build_loss_objective` with `L2Loss` is the most naive approach for parameter estimation. -There are many other +There are many others. We can also use First-Differences in L2Loss by passing the kwarg `differ_weight` which decides the contribution of the differencing loss to the total loss. @@ -211,8 +211,8 @@ ms_obj = multiple_shooting_objective(ms_prob, Tsit5(), L2Loss(t, data), Optimiza discontinuity_weight = 1.0, abstol = 1e-12, reltol = 1e-12) ``` -This creates the objective function that can be passed to an optimizer from which we can then get the parameter values -and the initial values of the short time periods keeping in mind the indexing. Now we mix this with +This creates the objective function that can be passed to an optimizer, from which we can then get the parameter values +and the initial values of the short time periods, keeping in mind the indexing. Now we mix this with a global optimization method to improve robustness even more: ```@example ode @@ -226,14 +226,14 @@ optsol_ms.u[end-1:end] Here as our model had 2 parameters, we look at the last 2 indexes of `result` to get our parameter values and the rest of the values are the initial values of the shorter timespans as described in the reference section. -We can also use a gradient based optimizer with the multiple shooting objective. +We can also use a gradient-based optimizer with the multiple shooting objective. ```@example ode optsol_ms = solve(optprob, BFGS()) optsol_ms.u[end-1:end] ``` -The objective function for Two Stage method can be created and passed to an optimizer as +The objective function for the Two Stage method can be created and passed to an optimizer as ```@example ode two_stage_obj = two_stage_objective(ms_prob, t, data, Optimization.AutoForwardDiff()) @@ -241,7 +241,7 @@ optprob = Optimization.OptimizationProblem(two_stage_obj, [1.3,0.8,2.8,1.2]) result = solve(optprob, Optim.BFGS()) ``` -The default kernel used in the method is `Epanechnikov` others that are available are `Uniform`, `Triangular`, +The default kernel used in the method is `Epanechnikov`, available others are `Uniform`, `Triangular`, `Quartic`, `Triweight`, `Tricube`, `Gaussian`, `Cosine`, `Logistic` and `Sigmoid`, this can be passed by the `kernel` keyword argument. `loss_func` keyword argument can be used to pass the loss function (cost function) you want to use and passing a valid @@ -250,5 +250,6 @@ The default kernel used in the method is `Epanechnikov` others that are availabl ## Conclusion There are many more choices for how to improve the robustness of a parameter estimation. With all of these tools, -one likely should never do the simple "solve it with `p` and check the L2 loss", but instead use these tricks to -improve the loss landscape and increase the ability for optimizers to find the globally best parameters. +one likely should never do the simple “solve it with `p` and check the L2 loss”. +Instead, we should use these tricks to improve the loss landscape +and increase the ability for optimizers to find globally the best parameters. diff --git a/docs/src/methods/collocation_loss.md b/docs/src/methods/collocation_loss.md index 28218538..93d4bb91 100644 --- a/docs/src/methods/collocation_loss.md +++ b/docs/src/methods/collocation_loss.md @@ -7,7 +7,7 @@ fit by least squares) and optimizing the derivative function and the data's timepoints to match the derivatives of the smoothed trajectory. This method has less accuracy than other methods but is much faster, and is a good method to try first to get in the general -"good parameter" region, to then finish using one of the other methods. +“good parameter” region, to then finish using one of the other methods. ```julia function two_stage_objective(prob::DEProblem, tpoints, data, adtype = SciMLBase.NoAD(),; diff --git a/docs/src/methods/optimization_based_methods.md b/docs/src/methods/optimization_based_methods.md index 8ad4a05a..3ceed4fc 100644 --- a/docs/src/methods/optimization_based_methods.md +++ b/docs/src/methods/optimization_based_methods.md @@ -18,12 +18,12 @@ function build_loss_objective(prob::DEProblem, alg, loss, The first argument is the `DEProblem` to solve, and next is the `alg` to use. The `alg` must match the problem type, which can be any `DEProblem` -(ODEs, SDEs, DAEs, DDEs, etc.). `regularization` defaults to nothing which has no regularization function. +(ODEs, SDEs, DAEs, DDEs, etc.). `regularization` defaults to nothing, which has no regularization function. The extra keyword arguments are passed to the differential equation solver. ### Multiple Shooting -Multiple Shooting is generally used in Boundary Value Problems (BVP) and is +Multiple Shooting is often used in Boundary Value Problems (BVP) and is more robust than the regular objective function used in these problems. It proceeds as follows: @@ -47,7 +47,7 @@ function multiple_shooting_objective(prob::DiffEqBase.DEProblem, alg, loss, For consistency `multiple_shooting_objective` takes exactly the same arguments as `build_loss_objective`. It also has the option for `discontinuity_weight` as -a keyword argument which assigns weight to the error occurring due to the +a keyword argument, which assigns weight to the error occurring due to the discontinuity that arises from the breaking up of the time span. ## Detailed Explanations of Arguments @@ -58,7 +58,7 @@ discontinuity that arises from the breaking up of the time span. loss(sol) ``` -is the function which reduces the problem's solution to a scalar which the +is the function, which reduces the problem's solution to a scalar, which the optimizer will try to minimize. While this is very flexible, two convenience routines are included for fitting to data with standard cost functions: @@ -68,7 +68,7 @@ L2Loss(t, data; differ_weight=nothing, data_weight=nothing, colloc_grad=nothing, dudt=nothing) ``` -where `t` is the set of timepoints which the data is found at, and +where `t` is the set of timepoints which the data are found at, and `data` are the values that are known where each column corresponds to measures of the values of the system. `L2Loss` is an optimized version of the L2-distance. The `data_weight` is a scalar or vector @@ -98,7 +98,7 @@ is the likelihood distributions from a `UnivariateDistribution` from corresponds to the likelihood at `t[i]` for component `j`. The second case is where `distributions[i]` is a `MultivariateDistribution` which corresponds to the likelihood at `t[i]` over the vector of components. This likelihood function -then calculates the negative of the total loglikelihood over time as its objective +then calculates the negative of the total log-likelihood over time as its objective value (negative since optimizers generally find minimums, and thus this corresponds to maximum likelihood estimation). The third term, `diff_distributions`, acts similarly but allows putting a distribution on the first difference terms @@ -138,7 +138,7 @@ L2Loss(t, data, differ_weight=0.3, data_weight=0.7) First differencing incorporates the differences of data points at consecutive time points which adds more information about the trajectory in the loss function. Adding first differencing is helpful in cases where the `L2Loss` -alone leads to non-identifiable parameters but adding a first differencing +alone leads to non-identifiable parameters, but adding a first differencing term makes it more identifiable. This can be noted on stochastic differential equation models, where this aims to capture the autocorrelation and therefore helps us avoid getting the same stationary distribution despite different @@ -205,7 +205,7 @@ which simply uses `p` as the initial condition in the initial value problem. ## Using the Objectives for MAP estimates You can also add a prior option to `build_loss_objective` and `multiple_shooting_objective` that -essentially turns it into MAP by multiplying the loglikelihood (the cost) by the prior. The option is available +essentially turns it into MAP by multiplying the log-likelihood (the cost) by the prior. The option is available as the keyword argument `priors`, it can take in either an array of univariate distributions for each of the parameters or a multivariate distribution. diff --git a/docs/src/methods/recommended_methods.md b/docs/src/methods/recommended_methods.md index 9224c858..7cef4cd5 100644 --- a/docs/src/methods/recommended_methods.md +++ b/docs/src/methods/recommended_methods.md @@ -5,9 +5,9 @@ of your choice. This method can thus be paired with global optimizers from packages like BlackBoxOptim.jl or NLopt.jl which can be much less prone to finding local minima than local optimization methods. Also, it allows the user to define the cost function in the way they choose as a function -`loss(sol)`, and thus can fit using any cost function on the solution, +`loss(sol)`. This package can thus fit using any cost function on the solution, making it applicable to fitting non-temporal data and other types of -problems. Also, `build_loss_objective` works for all of the `DEProblem` +problems. Also, `build_loss_objective` works for all the `DEProblem` types, allowing it to optimize parameters on ODEs, SDEs, DDEs, DAEs, etc. diff --git a/docs/src/tutorials/ensemble.md b/docs/src/tutorials/ensemble.md index f42fe52e..a90f2ce4 100644 --- a/docs/src/tutorials/ensemble.md +++ b/docs/src/tutorials/ensemble.md @@ -1,11 +1,11 @@ # Fitting Ensembles of ODE Models to Data -In this tutoiral we will showcase how to fit multiple models simultaniously to respective +In this tutorial, we will showcase how to fit multiple models simultaneously to respective data sources. Let's dive right in! ## Formulating the Ensemble Model -First you want to create a problem which solves multiple problems at the same time. This is +First, you want to create a problem which solves multiple problems at the same time. This is the `EnsembleProblem`. When the parameter estimation tools say it will take any DEProblem, it really means ANY DEProblem, which includes `EnsembleProblem`. @@ -48,7 +48,7 @@ sim = solve(enprob,Tsit5(),trajectories=N) plot(sim) ``` -`trajectories=N` means "run N times", and each time it runs the problem returned by the `prob_func`, which is always the same problem but with the `i`th initial condition. +`trajectories=N` means “run N times”, and each time it runs the problem returned by the `prob_func`, which is always the same problem but with the `i`th initial condition. Now let's generate a dataset from that. Let's get data points at every t=0.1 using `saveat`, and then convert the solution into an array. @@ -65,8 +65,8 @@ is the `k`th solution). So `data[i,j,k]` is the `j`th timepoint of the `i`th var trajectory. Now let's build a loss function. A loss function is some `loss(sol)` that spits out a scalar -for how far from optimal we are. In the documentation I show that we normally do -`loss = L2Loss(t,data)`, but we can bootstrap off of this. Instead lets build an array of `N` loss +for how far from optimal we are. In the documentation, I show that we normally do +`loss = L2Loss(t,data)`, but we can bootstrap off of this. Instead, let's build an array of `N` loss functions, each one with the correct piece of data. ```@example ensemble @@ -92,8 +92,8 @@ sim = solve(enprob,Tsit5(),trajectories=N,saveat=data_times) loss(sim) ``` -and get a non-zero loss. So we now have our problem, our data, and our loss function... we -have what we need. +and get a non-zero loss. So, we now have our problem, our data, and our loss function… +we have what we need. Put this into build_loss_objective. @@ -105,9 +105,9 @@ obj = build_loss_objective(enprob,Tsit5(),loss,Optimization.AutoForwardDiff(),tr Notice that we added the kwargs for `solve` of the `EnsembleProblem` into this. They get passed to the internal `solve` command, so then the loss is computed on `N` trajectories at `data_times`. -Thus we take this objective function over to any optimization package. Here, since the Lotka-Volterra equation requires positive parameters, -we use Fminbox to make sure the parameters stay within passed bounds. Let's start the optimization with -[1.3,0.9], Optim spits out that the true parameters are: +Thus, we take this objective function over to any optimization package. Here, since the Lotka-Volterra equation requires positive parameters, +we use Fminbox to make sure the parameters stay within the passed bounds. +Let's start the optimization with [1.3,0.9], Optim spits out that the true parameters are: ```@example ensemble lower = zeros(2) diff --git a/docs/src/tutorials/generalized_likelihood.md b/docs/src/tutorials/generalized_likelihood.md index 23637bfe..09f8d72e 100644 --- a/docs/src/tutorials/generalized_likelihood.md +++ b/docs/src/tutorials/generalized_likelihood.md @@ -1,8 +1,8 @@ # Generalized Likelihood Inference -In this example we will demo the likelihood-based approach to parameter fitting. -First let's generate a dataset to fit. We will re-use the Lotka-Volterra equation -but in this case fit just two parameters. +In this example, we will demo the likelihood-based approach to parameter fitting. +First, let's generate a dataset to fit. We will re-use the Lotka-Volterra equation, +but in this case, fit just two parameters. ```@example likelihood using DifferentialEquations, DiffEqParamEstim, Optimization, OptimizationBBO @@ -31,11 +31,11 @@ end aggregate_data = convert(Array,VectorOfArray([generate_data(sol,t) for i in 1:100])) ``` -here with `t` we measure the solution at 200 evenly spaced points. Thus `aggregate_data` +here, with `t` we measure the solution at 200 evenly spaced points. Thus, `aggregate_data` is a 2x200x100 matrix where `aggregate_data[i,j,k]` is the `i`th component at time `j` of the `k`th dataset. What we first want to do is get a matrix of distributions where `distributions[i,j]` is the likelihood of component `i` at take `j`. We -can do this via `fit_mle` on a chosen distributional form. For simplicity we +can do this via `fit_mle` on a chosen distributional form. For simplicity, we choose the `Normal` distribution. `aggregate_data[i,j,:]` is the array of points at the given component and time, and thus we find the distribution parameters which fits best at each time point via: @@ -51,13 +51,13 @@ Notice for example that we have: distributions[1,1] ``` -that is, it fit the distribution to have its mean just about where our original -solution was and the variance is about how much noise we added to the dataset. -This this is a good check to see that the distributions we are trying to fit +that is, it fits the distribution to have its mean just about where our original +solution was, and the variance is about how much noise we added to the dataset. +This is a good check to see that the distributions we are trying to fit our parameters to makes sense. -Note that in this case the `Normal` distribution was a good choice, and in many -cases it's a nice go-to choice, but one should experiment with other choices +Note that in this case the `Normal` distribution was a good choice, and often +it's a nice go-to choice, but one should experiment with other choices of distributions as well. For example, a `TDist` can be an interesting way to incorporate robustness to outliers since low degrees of free T-distributions act like Normal distributions but with longer tails (though `fit_mle` does not @@ -72,7 +72,7 @@ obj = build_loss_objective(prob1,Tsit5(),LogLikeLoss(t,distributions), maxiters=10000,verbose=false) ``` -First let's use the objective function to plot the likelihood landscape: +First, let's use the objective function to plot the likelihood landscape: ```julia using Plots; plotly() @@ -84,7 +84,7 @@ heatmap(prange,prange,[obj([j,i]) for i in prange, j in prange], ![2 Parameter Likelihood](../assets/2paramlike.png) -Recall that this is the negative loglikelihood and thus the minimum is the +Recall that this is the negative log-likelihood, and thus the minimum is the maximum of the likelihood. There is a clear valley where the first parameter is 1.5, while the second parameter's likelihood is more muddled. By taking a one-dimensional slice: @@ -97,7 +97,7 @@ plot(prange,[obj([1.5,i]) for i in prange],lw=3, ![1 Parameter Likelihood](../assets/1paramlike.png) -we can see that there's still a clear minimum at the true value. Thus we will +we can see that there's still a clear minimum at the true value. Thus, we will use the global optimizers from BlackBoxOptim.jl to find the values. We set our search range to be from `0.5` to `5.0` for both of the parameters and let it optimize: diff --git a/docs/src/tutorials/global_optimization.md b/docs/src/tutorials/global_optimization.md index 076a536b..07ada193 100644 --- a/docs/src/tutorials/global_optimization.md +++ b/docs/src/tutorials/global_optimization.md @@ -1,7 +1,7 @@ # Global Optimization via NLopt -The `build_loss_objective` function builds an objective function which is able -to be used with MathOptInterface-associated solvers. This includes packages like +The `build_loss_objective` function builds an objective function compatible +with MathOptInterface-associated solvers. This includes packages like IPOPT, NLopt, MOSEK, etc. Building off of the previous example, we can build a cost function for the single parameter optimization problem like: @@ -26,7 +26,7 @@ data = convert(Array,randomized) obj = build_loss_objective(prob,Tsit5(),L2Loss(t,data),Optimization.AutoForwardDiff()) ``` -You can either use the NLopt package directly or through either the OptimizationNLopt or OptimizationMOI which provides interface for all MathOptInterface compatible non-linear solvers. +You can either use the NLopt package directly or through either the OptimizationNLopt or OptimizationMOI which provides an interface for all MathOptInterface compatible non-linear solvers. We can now use this `obj` as the objective function with MathProgBase solvers. For our example, we will use NLopt. To use the local derivative-free @@ -66,5 +66,5 @@ min_objective!(opt, obj) (minf,minx,ret) = NLopt.optimize(opt,[1.3]) ``` -For more information, see the NLopt documentation for more details. And give IPOPT -or MOSEK a try! +For more information, see the NLopt documentation for more details. +And give IPOPT or MOSEK a try! diff --git a/docs/src/tutorials/stochastic_evaluations.md b/docs/src/tutorials/stochastic_evaluations.md index aad97bce..5c50601f 100644 --- a/docs/src/tutorials/stochastic_evaluations.md +++ b/docs/src/tutorials/stochastic_evaluations.md @@ -25,7 +25,7 @@ prob = SDEProblem(pf_func,pg_func,u0,tspan,p) sol = solve(prob,SRIW1()) ``` -Now lets generate a dataset from 10,000 solutions of the SDE +Now let's generate a dataset from 10,000 solutions of the SDE ```@example sde using RecursiveArrayTools # for VectorOfArray @@ -62,7 +62,7 @@ Parameter Estimation in case of SDE's with a regular `L2Loss` can have poor accu result.original ``` -Instead when we use `L2Loss` with first differencing enabled we get much more accurate estimates. +Instead, when we use `L2Loss` with first differencing enabled, we get much more accurate estimates. ```@example sde obj = build_loss_objective(monte_prob,SRIW1(),L2Loss(t,aggregate_data,differ_weight=1.0,data_weight=0.5), Optimization.AutoForwardDiff(),