Merge pull request #210 from ArnoStrouwen/LT

[skip ci] LanguageTool
SciML · Jan 12, 2023 · 761c32d · 761c32d
2 parents b1ad9b1 + 84ab23c
commit 761c32d
Show file tree

Hide file tree

Showing 8 changed files with 60 additions and 59 deletions.
diff --git a/docs/src/getting_started.md b/docs/src/getting_started.md
@@ -1,17 +1,17 @@
 # Getting Started with Optimization-Based ODE Parameter Estimation
 
-In this tutorial we will showcase how to estimate the parameters of an ordinary
-differential equation using DiffEqParamEstim.jl. DiffEqParamEstim.jl is a high
-level tool that makes common parameter estimation tasks simple. Here we will show
+In this tutorial, we will showcase how to estimate the parameters of an ordinary
+differential equation using DiffEqParamEstim.jl. DiffEqParamEstim.jl is a high-level
+tool that makes common parameter estimation tasks simple. Here, we will show
 how to use its cost function generation to estimate the parameters of the Lotka-Volterra
 equation against simulated data.
 
 ## Installation
 
-First we will make sure DiffEqParamEstim.jl is installed correctly. To do this, we use the
+First, we will make sure DiffEqParamEstim.jl is installed correctly. To do this, we use the
 Julia package REPL, opened by typing `]` in the REPL and seeing `pkg>` appear in blue.
 Then we type: `add DiffEqParamEstim` and hit enter. This will run the package management
-sequence and we will be good to go.
+sequence, and we will be good to go.
 
 ## Required Dependencies
 
@@ -57,7 +57,7 @@ randomized = VectorOfArray([(sol(t[i]) + .01randn(2)) for i in 1:length(t)])
 data = convert(Array,randomized)
 ```
 
-Here we used `VectorOfArray` from [RecursiveArrayTools.jl](https://docs.sciml.ai/RecursiveArrayTools/stable/)
+Here, we used `VectorOfArray` from [RecursiveArrayTools.jl](https://docs.sciml.ai/RecursiveArrayTools/stable/)
 to turn the result of an ODE into a matrix.
 
 If we plot the solution with the parameter at `a=1.42`, we get the following:
@@ -69,7 +69,7 @@ plot(sol)
 plot!(newsol)
 ```
 
-Notice that after one period this solution begins to drift very far off: this
+Notice that after one period, this solution begins to drift very far off: this
 problem is sensitive to the choice of `a`.
 
 To build the objective function for Optim.jl, we simply call the `build_loss_objective`
@@ -88,7 +88,7 @@ error more quickly when in bad regions of the parameter space, speeding up the
 process. If the integrator stops early (due to divergence), then those parameters
 are given an infinite loss, and thus this is a quick way to avoid bad parameters.
 We set `verbose=false` because this divergence can get noisy. The `Optimization.AutoForwardDiff()`
-is a choice of automatic differentiation, i.e. how the gradients are calculated.
+is a choice of automatic differentiation, i.e., how the gradients are calculated.
 For more information on this choice, see
 [the automatic differentiation choice API](https://docs.sciml.ai/Optimization/stable/API/optimization_function/#Automatic-Differentiation-Construction-Choice-Recommendations).
 
@@ -127,13 +127,13 @@ plot(sol)
 plot!(newsol)
 ```
 
-Note that some of the algorithms may be sensitive to the initial condition. For more
+Note that some algorithms may be sensitive to the initial condition. For more
 details on using Optim.jl, see the [documentation for Optim.jl](https://julianlsolvers.github.io/Optim.jl/stable/).
 
 ### Adding Bounds Constraints
 
 We can improve our solution by noting that the Lotka-Volterra equation requires that
-the parameters are positive. Thus
+the parameters are positive. Thus,
 [following the Optimization.jl documentation](https://docs.sciml.ai/Optimization/stable/API/optimization_problem/)
 we can add box constraints to ensure the optimizer only checks between `0.0` and `3.0`
 which improves the efficiency of our algorithm. We pass the `lb` and `ub` keyword
@@ -176,7 +176,7 @@ result_bfgs = solve(optprob, BFGS())
 ### Alternative Cost Functions for Increased Robustness
 
 The `build_loss_objective` with `L2Loss` is the most naive approach for parameter estimation.
-There are many other
+There are many others.
 
 We can also use First-Differences in L2Loss by passing the kwarg `differ_weight` which decides the contribution of the
 differencing loss to the total loss.
@@ -211,8 +211,8 @@ ms_obj = multiple_shooting_objective(ms_prob, Tsit5(), L2Loss(t, data), Optimiza
                                      discontinuity_weight = 1.0, abstol = 1e-12, reltol = 1e-12)
 ```
 
-This creates the objective function that can be passed to an optimizer from which we can then get the parameter values
-and the initial values of the short time periods keeping in mind the indexing. Now we mix this with
+This creates the objective function that can be passed to an optimizer, from which we can then get the parameter values
+and the initial values of the short time periods, keeping in mind the indexing. Now we mix this with
 a global optimization method to improve robustness even more:
 
 ```@example ode
@@ -226,22 +226,22 @@ optsol_ms.u[end-1:end]
 
 Here as our model had 2 parameters, we look at the last 2 indexes of `result` to get our parameter values and
 the rest of the values are the initial values of the shorter timespans as described in the reference section.
-We can also use a gradient based optimizer with the multiple shooting objective.
+We can also use a gradient-based optimizer with the multiple shooting objective.
 
 ```@example ode
 optsol_ms = solve(optprob, BFGS())
 optsol_ms.u[end-1:end]
 ```
 
-The objective function for Two Stage method can be created and passed to an optimizer as
+The objective function for the Two Stage method can be created and passed to an optimizer as
 
 ```@example ode
 two_stage_obj = two_stage_objective(ms_prob, t, data, Optimization.AutoForwardDiff())
 optprob = Optimization.OptimizationProblem(two_stage_obj, [1.3,0.8,2.8,1.2])
 result = solve(optprob, Optim.BFGS())
 ```
 
-The default kernel used in the method is `Epanechnikov` others that are available are `Uniform`,  `Triangular`,
+The default kernel used in the method is `Epanechnikov`, available others are `Uniform`,  `Triangular`,
 `Quartic`, `Triweight`, `Tricube`, `Gaussian`, `Cosine`, `Logistic` and `Sigmoid`, this can be passed by the
 `kernel` keyword argument. `loss_func` keyword argument can be used to pass the loss function (cost function) you want
  to use and passing a valid
@@ -250,5 +250,6 @@ The default kernel used in the method is `Epanechnikov` others that are availabl
 ## Conclusion
 
 There are many more choices for how to improve the robustness of a parameter estimation. With all of these tools,
-one likely should never do the simple "solve it with `p` and check the L2 loss", but instead use these tricks to
-improve the loss landscape and increase the ability for optimizers to find the globally best parameters.
+one likely should never do the simple “solve it with `p` and check the L2 loss”.
+Instead, we should use these tricks to improve the loss landscape
+and increase the ability for optimizers to find globally the best parameters.
diff --git a/docs/src/methods/collocation_loss.md b/docs/src/methods/collocation_loss.md
@@ -7,7 +7,7 @@ fit by least squares) and optimizing
 the derivative function and the data's timepoints to match the derivatives
 of the smoothed trajectory. This method has less accuracy than other methods
 but is much faster, and is a good method to try first to get in the general
-"good parameter" region, to then finish using one of the other methods.
+“good parameter” region, to then finish using one of the other methods.
 
 ```julia
 function two_stage_objective(prob::DEProblem, tpoints, data, adtype = SciMLBase.NoAD(),;

diff --git a/docs/src/methods/optimization_based_methods.md b/docs/src/methods/optimization_based_methods.md
@@ -18,12 +18,12 @@ function build_loss_objective(prob::DEProblem, alg, loss,
 
 The first argument is the `DEProblem` to solve, and next is the `alg` to use.
 The `alg` must match the problem type, which can be any `DEProblem`
-(ODEs, SDEs, DAEs, DDEs, etc.). `regularization` defaults to nothing which has no regularization function.
+(ODEs, SDEs, DAEs, DDEs, etc.). `regularization` defaults to nothing, which has no regularization function.
 The extra keyword arguments are passed to the differential equation solver.
 
 ### Multiple Shooting
 
-Multiple Shooting is generally used in Boundary Value Problems (BVP) and is
+Multiple Shooting is often used in Boundary Value Problems (BVP) and is
 more robust than the regular objective function used in these problems. It
 proceeds as follows:
 
@@ -47,7 +47,7 @@ function multiple_shooting_objective(prob::DiffEqBase.DEProblem, alg, loss,
 
 For consistency `multiple_shooting_objective` takes exactly the same arguments
 as `build_loss_objective`. It also has the option for `discontinuity_weight` as
-a keyword argument which assigns weight to the error occurring due to the
+a keyword argument, which assigns weight to the error occurring due to the
 discontinuity that arises from the breaking up of the time span.
 
 ## Detailed Explanations of Arguments
@@ -58,7 +58,7 @@ discontinuity that arises from the breaking up of the time span.
 loss(sol)
 ```
 
-is the function which reduces the problem's solution to a scalar which the
+is the function, which reduces the problem's solution to a scalar, which the
 optimizer will try to minimize. While this is very
 flexible, two convenience routines are included for fitting to data with standard
 cost functions:
@@ -68,7 +68,7 @@ L2Loss(t, data; differ_weight=nothing, data_weight=nothing,
               colloc_grad=nothing, dudt=nothing)
 ```
 
-where `t` is the set of timepoints which the data is found at, and
+where `t` is the set of timepoints which the data are found at, and
 `data` are the values that are known where each column corresponds to measures
 of the values of the system. `L2Loss` is an optimized version
 of the L2-distance. The `data_weight` is a scalar or vector
@@ -98,7 +98,7 @@ is the likelihood distributions from a `UnivariateDistribution` from
 corresponds to the likelihood at `t[i]` for component `j`. The second case is
 where `distributions[i]` is a `MultivariateDistribution` which corresponds to
 the likelihood at `t[i]` over the vector of components. This likelihood function
-then calculates the negative of the total loglikelihood over time as its objective
+then calculates the negative of the total log-likelihood over time as its objective
 value (negative since optimizers generally find minimums, and thus this corresponds
 to maximum likelihood estimation). The third term, `diff_distributions`, acts
 similarly but allows putting a distribution on the first difference terms
@@ -138,7 +138,7 @@ L2Loss(t, data, differ_weight=0.3, data_weight=0.7)
 First differencing incorporates the differences of data points at consecutive
 time points which adds more information about the trajectory in the loss
 function. Adding first differencing is helpful in cases where the `L2Loss`
-alone leads to non-identifiable parameters but adding a first differencing
+alone leads to non-identifiable parameters, but adding a first differencing
 term makes it more identifiable. This can be noted on stochastic differential
 equation models, where this aims to capture the autocorrelation and therefore
 helps us avoid getting the same stationary distribution despite different
@@ -205,7 +205,7 @@ which simply uses `p` as the initial condition in the initial value problem.
 ## Using the Objectives for MAP estimates
 
 You can also add a prior option to `build_loss_objective` and `multiple_shooting_objective` that
-essentially turns it into MAP by multiplying the loglikelihood (the cost) by the prior. The option is available
+essentially turns it into MAP by multiplying the log-likelihood (the cost) by the prior. The option is available
 as the keyword argument `priors`, it can take in either an array of univariate distributions for each of
 the parameters or a multivariate distribution.
 

diff --git a/docs/src/methods/recommended_methods.md b/docs/src/methods/recommended_methods.md
@@ -5,9 +5,9 @@ of your choice. This method can thus be paired with global optimizers
 from packages like BlackBoxOptim.jl or NLopt.jl which can be much less prone to
 finding local minima than local optimization methods. Also, it allows the user
 to define the cost function in the way they choose as a function
-`loss(sol)`, and thus can fit using any cost function on the solution,
+`loss(sol)`. This package can thus fit using any cost function on the solution,
 making it applicable to fitting non-temporal data and other types of
-problems. Also, `build_loss_objective` works for all of the `DEProblem`
+problems. Also, `build_loss_objective` works for all the `DEProblem`
 types, allowing it to optimize parameters on ODEs, SDEs, DDEs, DAEs,
 etc.
 

diff --git a/docs/src/tutorials/ensemble.md b/docs/src/tutorials/ensemble.md
@@ -1,11 +1,11 @@
 # Fitting Ensembles of ODE Models to Data
 
-In this tutoiral we will showcase how to fit multiple models simultaniously to respective
+In this tutorial, we will showcase how to fit multiple models simultaneously to respective
 data sources. Let's dive right in!
 
 ## Formulating the Ensemble Model
 
-First you want to create a problem which solves multiple problems at the same time. This is
+First, you want to create a problem which solves multiple problems at the same time. This is
 the `EnsembleProblem`. When the parameter estimation tools say it will take any DEProblem,
 it really means ANY DEProblem, which includes `EnsembleProblem`.
 
@@ -48,7 +48,7 @@ sim = solve(enprob,Tsit5(),trajectories=N)
 plot(sim)
 ```
 
-`trajectories=N` means "run N times", and each time it runs the problem returned by the `prob_func`, which is always the same problem but with the `i`th initial condition.
+`trajectories=N` means “run N times”, and each time it runs the problem returned by the `prob_func`, which is always the same problem but with the `i`th initial condition.
 
 Now let's generate a dataset from that. Let's get data points at every t=0.1 using `saveat`,
 and then convert the solution into an array.
@@ -65,8 +65,8 @@ is the `k`th solution). So `data[i,j,k]` is the `j`th timepoint of the `i`th var
 trajectory.
 
 Now let's build a loss function. A loss function is some `loss(sol)` that spits out a scalar
-for how far from optimal we are. In the documentation I show that we normally do
-`loss = L2Loss(t,data)`, but we can bootstrap off of this. Instead lets build an array of `N` loss
+for how far from optimal we are. In the documentation, I show that we normally do
+`loss = L2Loss(t,data)`, but we can bootstrap off of this. Instead, let's build an array of `N` loss
 functions, each one with the correct piece of data.
 
 ```@example ensemble
@@ -92,8 +92,8 @@ sim = solve(enprob,Tsit5(),trajectories=N,saveat=data_times)
 loss(sim)
 ```
 
-and get a non-zero loss. So we now have our problem, our data, and our loss function... we
-have what we need.
+and get a non-zero loss. So, we now have our problem, our data, and our loss function…
+we have what we need.
 
 Put this into build_loss_objective.
 
@@ -105,9 +105,9 @@ obj = build_loss_objective(enprob,Tsit5(),loss,Optimization.AutoForwardDiff(),tr
 Notice that we added the kwargs for `solve` of the `EnsembleProblem` into this. They get passed to the internal `solve`
 command, so then the loss is computed on `N` trajectories at `data_times`.
 
-Thus we take this objective function over to any optimization package. Here, since the Lotka-Volterra equation requires positive parameters,
-we use Fminbox to make sure the parameters stay within passed bounds. Let's start the optimization with
-[1.3,0.9], Optim spits out that the true parameters are:
+Thus, we take this objective function over to any optimization package. Here, since the Lotka-Volterra equation requires positive parameters,
+we use Fminbox to make sure the parameters stay within the passed bounds.
+Let's start the optimization with [1.3,0.9], Optim spits out that the true parameters are:
 
 ```@example ensemble
 lower = zeros(2)

diff --git a/docs/src/tutorials/generalized_likelihood.md b/docs/src/tutorials/generalized_likelihood.md
@@ -1,8 +1,8 @@
 # Generalized Likelihood Inference
 
-In this example we will demo the likelihood-based approach to parameter fitting.
-First let's generate a dataset to fit. We will re-use the Lotka-Volterra equation
-but in this case fit just two parameters.
+In this example, we will demo the likelihood-based approach to parameter fitting.
+First, let's generate a dataset to fit. We will re-use the Lotka-Volterra equation,
+but in this case, fit just two parameters.
 
 ```@example likelihood
 using DifferentialEquations, DiffEqParamEstim, Optimization, OptimizationBBO
@@ -31,11 +31,11 @@ end
 aggregate_data = convert(Array,VectorOfArray([generate_data(sol,t) for i in 1:100]))
 ```
 
-here with `t` we measure the solution at 200 evenly spaced points. Thus `aggregate_data`
+here, with `t` we measure the solution at 200 evenly spaced points. Thus, `aggregate_data`
 is a 2x200x100 matrix where `aggregate_data[i,j,k]` is the `i`th component at time
 `j` of the `k`th dataset. What we first want to do is get a matrix of distributions
 where `distributions[i,j]` is the likelihood of component `i` at take `j`. We
-can do this via `fit_mle` on a chosen distributional form. For simplicity we
+can do this via `fit_mle` on a chosen distributional form. For simplicity, we
 choose the `Normal` distribution. `aggregate_data[i,j,:]` is the array of points
 at the given component and time, and thus we find the distribution parameters
 which fits best at each time point via:
@@ -51,13 +51,13 @@ Notice for example that we have:
 distributions[1,1]
 ```
 
-that is, it fit the distribution to have its mean just about where our original
-solution was and the variance is about how much noise we added to the dataset.
-This this is a good check to see that the distributions we are trying to fit
+that is, it fits the distribution to have its mean just about where our original
+solution was, and the variance is about how much noise we added to the dataset.
+This is a good check to see that the distributions we are trying to fit
 our parameters to makes sense.
 
-Note that in this case the `Normal` distribution was a good choice, and in many
-cases it's a nice go-to choice, but one should experiment with other choices
+Note that in this case the `Normal` distribution was a good choice, and often
+it's a nice go-to choice, but one should experiment with other choices
 of distributions as well. For example, a `TDist` can be an interesting way to
 incorporate robustness to outliers since low degrees of free T-distributions
 act like Normal distributions but with longer tails (though `fit_mle` does not
@@ -72,7 +72,7 @@ obj = build_loss_objective(prob1,Tsit5(),LogLikeLoss(t,distributions),
                                      maxiters=10000,verbose=false)
 ```
 
-First let's use the objective function to plot the likelihood landscape:
+First, let's use the objective function to plot the likelihood landscape:
 
 ```julia
 using Plots; plotly()
@@ -84,7 +84,7 @@ heatmap(prange,prange,[obj([j,i]) for i in prange, j in prange],
 
 ![2 Parameter Likelihood](../assets/2paramlike.png)
 
-Recall that this is the negative loglikelihood and thus the minimum is the
+Recall that this is the negative log-likelihood, and thus the minimum is the
 maximum of the likelihood. There is a clear valley where the first parameter
 is 1.5, while the second parameter's likelihood is more muddled. By taking a
 one-dimensional slice:
@@ -97,7 +97,7 @@ plot(prange,[obj([1.5,i]) for i in prange],lw=3,
 
 ![1 Parameter Likelihood](../assets/1paramlike.png)
 
-we can see that there's still a clear minimum at the true value. Thus we will
+we can see that there's still a clear minimum at the true value. Thus, we will
 use the global optimizers from BlackBoxOptim.jl to find the values. We set our
 search range to be from `0.5` to `5.0` for both of the parameters and let it
 optimize: