diff --git a/dev/.documenter-siteinfo.json b/dev/.documenter-siteinfo.json index 5e829c48..7bfef948 100644 --- a/dev/.documenter-siteinfo.json +++ b/dev/.documenter-siteinfo.json @@ -1 +1 @@ -{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-10-30T13:28:26","documenter_version":"1.7.0"}} \ No newline at end of file +{"documenter":{"julia_version":"1.11.1","generation_timestamp":"2024-11-12T07:52:12","documenter_version":"1.7.0"}} \ No newline at end of file diff --git a/dev/LICENSE/index.html b/dev/LICENSE/index.html index 47037b0f..76a9083e 100644 --- a/dev/LICENSE/index.html +++ b/dev/LICENSE/index.html @@ -1,2 +1,2 @@ -License · Optim

Optim.jl is licensed under the MIT License:

Copyright (c) 2012: John Myles White, Tim Holy, and other contributors. Copyright (c) 2016: Patrick Kofod Mogensen, John Myles White, Tim Holy, and other contributors. Copyright (c) 2017: Patrick Kofod Mogensen, Asbjørn Nilsen Riseth, John Myles White, Tim Holy, and other contributors.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

+License · Optim

Optim.jl is licensed under the MIT License:

Copyright (c) 2012: John Myles White, Tim Holy, and other contributors. Copyright (c) 2016: Patrick Kofod Mogensen, John Myles White, Tim Holy, and other contributors. Copyright (c) 2017: Patrick Kofod Mogensen, Asbjørn Nilsen Riseth, John Myles White, Tim Holy, and other contributors.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

diff --git a/dev/algo/adam_adamax/index.html b/dev/algo/adam_adamax/index.html index d33b8d1f..9ead9418 100644 --- a/dev/algo/adam_adamax/index.html +++ b/dev/algo/adam_adamax/index.html @@ -5,4 +5,4 @@ epsilon=1e-8)

where alpha is the step length or learning parameter. beta_mean and beta_var are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more. epsilon should rarely be changed, and just exists to avoid a division by 0.

AdaMax(; alpha=0.002,
          beta_mean=0.9,
          beta_var=0.999,
-         epsilon=1e-8)

where alpha is the step length or learning parameter. beta_mean and beta_var are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more.

References

Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).

+ epsilon=1e-8)

where alpha is the step length or learning parameter. beta_mean and beta_var are exponential decay parameters for the first and second moments estimates. Setting these closer to 0 will cause past iterates to matter less for the current steps and setting them closer to 1 means emphasizing past iterates more.

References

Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).

diff --git a/dev/algo/brent/index.html b/dev/algo/brent/index.html index f19050fd..5edc3df9 100644 --- a/dev/algo/brent/index.html +++ b/dev/algo/brent/index.html @@ -1,2 +1,2 @@ -Brent's Method · Optim
+Brent's Method · Optim
diff --git a/dev/algo/cg/index.html b/dev/algo/cg/index.html index 203d549e..59ad259c 100644 --- a/dev/algo/cg/index.html +++ b/dev/algo/cg/index.html @@ -42,4 +42,4 @@ * stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 53 - * Gradient Calls: 53

We see that for this objective and starting point, ConjugateGradient() requires fewer gradient evaluations to reach convergence.

References

+ * Gradient Calls: 53

We see that for this objective and starting point, ConjugateGradient() requires fewer gradient evaluations to reach convergence.

References

diff --git a/dev/algo/complex/index.html b/dev/algo/complex/index.html index 2c16f22c..1d141578 100644 --- a/dev/algo/complex/index.html +++ b/dev/algo/complex/index.html @@ -62,4 +62,4 @@ * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 48 - * Gradient Calls: 48

Automatic differentiation support for complex inputs may come when Cassete.jl is ready.

References

+ * Gradient Calls: 48

Automatic differentiation support for complex inputs may come when Cassete.jl is ready.

References

diff --git a/dev/algo/goldensection/index.html b/dev/algo/goldensection/index.html index 0538a53c..1f43118d 100644 --- a/dev/algo/goldensection/index.html +++ b/dev/algo/goldensection/index.html @@ -1,2 +1,2 @@ -Golden Section · Optim
+Golden Section · Optim
diff --git a/dev/algo/gradientdescent/index.html b/dev/algo/gradientdescent/index.html index da62d563..ae5351ff 100644 --- a/dev/algo/gradientdescent/index.html +++ b/dev/algo/gradientdescent/index.html @@ -2,4 +2,4 @@ Gradient Descent · Optim

Gradient Descent

Constructor

GradientDescent(; alphaguess = LineSearches.InitialPrevious(),
                   linesearch = LineSearches.HagerZhang(),
                   P = nothing,
-                  precondprep = (P, x) -> nothing)

Description

Gradient Descent a common name for a quasi-Newton solver. This means that it takes steps according to

\[x_{n+1} = x_n - P^{-1}\nabla f(x_n)\]

where $P$ is a positive definite matrix. If $P$ is the Hessian, we get Newton's method. In Gradient Descent, $P$ is simply an appropriately dimensioned identity matrix, such that we go in the exact opposite direction of the gradient. This means that we do not use the curvature information from the Hessian, or an approximation of it. While it does seem quite logical to go in the opposite direction of the fastest increase in objective value, the procedure can be very slow if the problem is ill-conditioned. See the section on preconditioners for ways to remedy this when using Gradient Descent.

As with the other quasi-Newton solvers in this package, a scalar $\alpha$ is introduced as follows

\[x_{n+1} = x_n - \alpha P^{-1}\nabla f(x_n)\]

and is chosen by a linesearch algorithm such that each step gives sufficient descent.

Example

References

+ precondprep = (P, x) -> nothing)

Description

Gradient Descent a common name for a quasi-Newton solver. This means that it takes steps according to

\[x_{n+1} = x_n - P^{-1}\nabla f(x_n)\]

where $P$ is a positive definite matrix. If $P$ is the Hessian, we get Newton's method. In Gradient Descent, $P$ is simply an appropriately dimensioned identity matrix, such that we go in the exact opposite direction of the gradient. This means that we do not use the curvature information from the Hessian, or an approximation of it. While it does seem quite logical to go in the opposite direction of the fastest increase in objective value, the procedure can be very slow if the problem is ill-conditioned. See the section on preconditioners for ways to remedy this when using Gradient Descent.

As with the other quasi-Newton solvers in this package, a scalar $\alpha$ is introduced as follows

\[x_{n+1} = x_n - \alpha P^{-1}\nabla f(x_n)\]

and is chosen by a linesearch algorithm such that each step gives sufficient descent.

Example

References

diff --git a/dev/algo/index.html b/dev/algo/index.html index d7c1129f..0c67475a 100644 --- a/dev/algo/index.html +++ b/dev/algo/index.html @@ -1,2 +1,2 @@ -Solvers · Optim
+Solvers · Optim
diff --git a/dev/algo/ipnewton/index.html b/dev/algo/ipnewton/index.html index d2ae0b68..56e88126 100644 --- a/dev/algo/ipnewton/index.html +++ b/dev/algo/ipnewton/index.html @@ -1,4 +1,4 @@ Interior point Newton · Optim

Interior point Newton method

Optim.IPNewtonType

Interior-point Newton

Constructor

IPNewton(; linesearch::Function = Optim.backtrack_constrained_grad,
          μ0::Union{Symbol,Number} = :auto,
-         show_linesearch::Bool = false)

The initial barrier penalty coefficient μ0 can be chosen as a number, or set to :auto to let the algorithm decide its value, see initialize_μ_λ!.

Note: For constrained optimization problems, we recommend always enabling allow_f_increases and successive_f_tol in the options passed to optimize. The default is set to Optim.Options(allow_f_increases = true, successive_f_tol = 2).

As of February 2018, the line search algorithm is specialised for constrained interior-point methods. In future we hope to support more algorithms from LineSearches.jl.

Description

The IPNewton method implements an interior-point primal-dual Newton algorithm for solving nonlinear, constrained optimization problems. See Nocedal and Wright (Ch. 19, 2006) for a discussion of interior-point methods for constrained optimization.

References

The algorithm was originally written by Tim Holy (@timholy, tim.holy@gmail.com).

  • J Nocedal, SJ Wright (2006), Numerical optimization, second edition. Springer.
  • A Wächter, LT Biegler (2006), On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming. Mathematical Programming 106 (1), 25-57.
source

Examples

+ show_linesearch::Bool = false)

The initial barrier penalty coefficient μ0 can be chosen as a number, or set to :auto to let the algorithm decide its value, see initialize_μ_λ!.

Note: For constrained optimization problems, we recommend always enabling allow_f_increases and successive_f_tol in the options passed to optimize. The default is set to Optim.Options(allow_f_increases = true, successive_f_tol = 2).

As of February 2018, the line search algorithm is specialised for constrained interior-point methods. In future we hope to support more algorithms from LineSearches.jl.

Description

The IPNewton method implements an interior-point primal-dual Newton algorithm for solving nonlinear, constrained optimization problems. See Nocedal and Wright (Ch. 19, 2006) for a discussion of interior-point methods for constrained optimization.

References

The algorithm was originally written by Tim Holy (@timholy, tim.holy@gmail.com).

source

Examples

diff --git a/dev/algo/lbfgs/index.html b/dev/algo/lbfgs/index.html index 253a5cc0..aa56d8a2 100644 --- a/dev/algo/lbfgs/index.html +++ b/dev/algo/lbfgs/index.html @@ -9,4 +9,4 @@ P = nothing, precondprep = (P, x) -> nothing, manifold = Flat(), - scaleinvH0::Bool = true && (typeof(P) <: Nothing))

Description

This means that it takes steps according to

\[x_{n+1} = x_n - P^{-1}\nabla f(x_n)\]

where $P$ is a positive definite matrix. If $P$ is the Hessian, we get Newton's method. In (L-)BFGS, the matrix is an approximation to the Hessian built using differences in the gradient across iterations. As long as the initial matrix is positive definite it is possible to show that all the follow matrices will be as well. The starting matrix could simply be the identity matrix, such that the first step is identical to the Gradient Descent algorithm, or even the actual Hessian.

There are two versions of BFGS in the package: BFGS, and L-BFGS. The latter is different from the former because it doesn't use a complete history of the iterative procedure to construct $P$, but rather only the latest $m$ steps. It doesn't actually build the Hessian approximation matrix either, but computes the direction directly. This makes more suitable for large scale problems, as the memory requirement to store the relevant vectors will grow quickly in large problems.

As with the other quasi-Newton solvers in this package, a scalar $\alpha$ is introduced as follows

\[x_{n+1} = x_n - \alpha P^{-1}\nabla f(x_n)\]

and is chosen by a linesearch algorithm such that each step gives sufficient descent.

Example

References

Wright, Stephen, and Jorge Nocedal (2006) "Numerical optimization." Springer

+ scaleinvH0::Bool = true && (typeof(P) <: Nothing))

Description

This means that it takes steps according to

\[x_{n+1} = x_n - P^{-1}\nabla f(x_n)\]

where $P$ is a positive definite matrix. If $P$ is the Hessian, we get Newton's method. In (L-)BFGS, the matrix is an approximation to the Hessian built using differences in the gradient across iterations. As long as the initial matrix is positive definite it is possible to show that all the follow matrices will be as well. The starting matrix could simply be the identity matrix, such that the first step is identical to the Gradient Descent algorithm, or even the actual Hessian.

There are two versions of BFGS in the package: BFGS, and L-BFGS. The latter is different from the former because it doesn't use a complete history of the iterative procedure to construct $P$, but rather only the latest $m$ steps. It doesn't actually build the Hessian approximation matrix either, but computes the direction directly. This makes more suitable for large scale problems, as the memory requirement to store the relevant vectors will grow quickly in large problems.

As with the other quasi-Newton solvers in this package, a scalar $\alpha$ is introduced as follows

\[x_{n+1} = x_n - \alpha P^{-1}\nabla f(x_n)\]

and is chosen by a linesearch algorithm such that each step gives sufficient descent.

Example

References

Wright, Stephen, and Jorge Nocedal (2006) "Numerical optimization." Springer

diff --git a/dev/algo/linesearch/index.html b/dev/algo/linesearch/index.html index 45b58f46..ac716682 100644 --- a/dev/algo/linesearch/index.html +++ b/dev/algo/linesearch/index.html @@ -37,4 +37,4 @@ * Reached Maximum Number of Iterations: false * Objective Calls: 17 * Gradient Calls: 17 - * Hessian Calls: 14

References

+ * Hessian Calls: 14

References

diff --git a/dev/algo/manifolds/index.html b/dev/algo/manifolds/index.html index b1fe63ed..f6b07ee8 100644 --- a/dev/algo/manifolds/index.html +++ b/dev/algo/manifolds/index.html @@ -7,4 +7,4 @@ x0 = randn(n) manif = Optim.Sphere() -Optim.optimize(f, g!, x0, Optim.ConjugateGradient(manifold=manif))

Supported solvers and manifolds

All first-order optimization methods are supported.

The following manifolds are currently supported:

The following meta-manifolds construct manifolds out of pre-existing ones:

See test/multivariate/manifolds.jl for usage examples.

Implementing new manifolds is as simple as adding methods project_tangent!(M::YourManifold,g,x) and retract!(M::YourManifold,x). If you implement another manifold or optimization method, please contribute a PR!

References

The Geometry of Algorithms with Orthogonality Constraints, Alan Edelman, Tomás A. Arias, Steven T. Smith, SIAM. J. Matrix Anal. & Appl., 20(2), 303–353

Optimization Algorithms on Matrix Manifolds, P.-A. Absil, R. Mahony, R. Sepulchre, Princeton University Press, 2008

+Optim.optimize(f, g!, x0, Optim.ConjugateGradient(manifold=manif))

Supported solvers and manifolds

All first-order optimization methods are supported.

The following manifolds are currently supported:

The following meta-manifolds construct manifolds out of pre-existing ones:

See test/multivariate/manifolds.jl for usage examples.

Implementing new manifolds is as simple as adding methods project_tangent!(M::YourManifold,g,x) and retract!(M::YourManifold,x). If you implement another manifold or optimization method, please contribute a PR!

References

The Geometry of Algorithms with Orthogonality Constraints, Alan Edelman, Tomás A. Arias, Steven T. Smith, SIAM. J. Matrix Anal. & Appl., 20(2), 303–353

Optimization Algorithms on Matrix Manifolds, P.-A. Absil, R. Mahony, R. Sepulchre, Princeton University Press, 2008

diff --git a/dev/algo/nelder_mead/index.html b/dev/algo/nelder_mead/index.html index 1b18dc87..f2e15593 100644 --- a/dev/algo/nelder_mead/index.html +++ b/dev/algo/nelder_mead/index.html @@ -17,4 +17,4 @@ initial_simplex[j+1][j] += initial_simplex[j+1][j] != zero(T) ? S.b * initial_simplex[j+1][j] : S.a end initial_simplex -end

The parameters of Nelder-Mead

The different types of steps in the algorithm are governed by four parameters: $\alpha$ for the reflection, $\beta$ for the expansion, $\gamma$ for the contraction, and $\delta$ for the shrink step. We default to the adaptive parameters scheme in Gao and Han (2010). These are based on the dimensionality of the problem, and are given by

\[\alpha = 1, \quad \beta = 1+2/n,\quad \gamma =0.75 - 1/2n,\quad \delta = 1-1/n\]

It is also possible to specify the original parameters from Nelder and Mead (1965)

\[\alpha = 1,\quad \beta = 2, \quad\gamma = 1/2, \quad\delta = 1/2\]

by specifying parameters = Optim.FixedParameters(). For specifying custom values, parameters = Optim.FixedParameters(α = a, β = b, γ = g, δ = d) is used, where a, b, g, d are the chosen values. If another parameter specification is wanted, it is possible to create a custom sub-type ofOptim.NMParameters, and add a method to the parameters function. It should take the new type as the first positional argument, and the dimensionality of x as the second positional argument, and return a 4-tuple of parameters. However, it will often be easier to simply supply the wanted parameters to FixedParameters.

References

Nelder, John A. and R. Mead (1965). "A simplex method for function minimization". Computer Journal 7: 308–313. doi:10.1093/comjnl/7.4.308.

Lagarias, Jeffrey C., et al. "Convergence properties of the Nelder–Mead simplex method in low dimensions." SIAM Journal on optimization 9.1 (1998): 112-147.

Gao, Fuchang and Lixing Han (2010). "Implementing the Nelder-Mead simplex algorithm with adaptive parameters". Computational Optimization and Applications [DOI 10.1007/s10589-010-9329-3]

+end

The parameters of Nelder-Mead

The different types of steps in the algorithm are governed by four parameters: $\alpha$ for the reflection, $\beta$ for the expansion, $\gamma$ for the contraction, and $\delta$ for the shrink step. We default to the adaptive parameters scheme in Gao and Han (2010). These are based on the dimensionality of the problem, and are given by

\[\alpha = 1, \quad \beta = 1+2/n,\quad \gamma =0.75 - 1/2n,\quad \delta = 1-1/n\]

It is also possible to specify the original parameters from Nelder and Mead (1965)

\[\alpha = 1,\quad \beta = 2, \quad\gamma = 1/2, \quad\delta = 1/2\]

by specifying parameters = Optim.FixedParameters(). For specifying custom values, parameters = Optim.FixedParameters(α = a, β = b, γ = g, δ = d) is used, where a, b, g, d are the chosen values. If another parameter specification is wanted, it is possible to create a custom sub-type ofOptim.NMParameters, and add a method to the parameters function. It should take the new type as the first positional argument, and the dimensionality of x as the second positional argument, and return a 4-tuple of parameters. However, it will often be easier to simply supply the wanted parameters to FixedParameters.

References

Nelder, John A. and R. Mead (1965). "A simplex method for function minimization". Computer Journal 7: 308–313. doi:10.1093/comjnl/7.4.308.

Lagarias, Jeffrey C., et al. "Convergence properties of the Nelder–Mead simplex method in low dimensions." SIAM Journal on optimization 9.1 (1998): 112-147.

Gao, Fuchang and Lixing Han (2010). "Implementing the Nelder-Mead simplex algorithm with adaptive parameters". Computational Optimization and Applications [DOI 10.1007/s10589-010-9329-3]

diff --git a/dev/algo/newton/index.html b/dev/algo/newton/index.html index 22f8cbfa..e98c7cae 100644 --- a/dev/algo/newton/index.html +++ b/dev/algo/newton/index.html @@ -1,3 +1,3 @@ Newton · Optim

Newton's Method

Constructor

Newton(; alphaguess = LineSearches.InitialStatic(),
-         linesearch = LineSearches.HagerZhang())

The constructor takes two keywords:

  • linesearch = a(d, x, p, x_new, g_new, phi0, dphi0, c), a function performing line search, see the line search section.
  • alphaguess = a(state, dphi0, d), a function for setting the initial guess for the line search algorithm, see the line search section.

Description

Newton's method for optimization has a long history, and is in some sense the gold standard in unconstrained optimization of smooth functions, at least from a theoretical viewpoint. The main benefit is that it has a quadratic rate of convergence near a local optimum. The main disadvantage is that the user has to provide a Hessian. This can be difficult, complicated, or simply annoying. It can also be computationally expensive to calculate it.

Newton's method for optimization consists of applying Newton's method for solving systems of equations, where the equations are the first order conditions, saying that the gradient should equal the zero vector.

\[\nabla f(x) = 0\]

A second order Taylor expansion of the left-hand side leads to the iterative scheme

\[x_{n+1} = x_n - H(x_n)^{-1}\nabla f(x_n)\]

where the inverse is not calculated directly, but the step size is instead calculated by solving

\[H(x) \textbf{s} = \nabla f(x_n).\]

This is equivalent to minimizing a quadratic model, $m_k$ around the current $x_n$

\[m_k(s) = f(x_n) + \nabla f(x_n)^\top \textbf{s} + \frac{1}{2} \textbf{s}^\top H(x_n) \textbf{s}\]

For functions where $H(x_n)$ is difficult, or computationally expensive to obtain, we might replace the Hessian with another positive definite matrix that approximates it. Such methods are called Quasi-Newton methods; see (L-)BFGS and Gradient Descent.

In a sufficiently small neighborhood around the minimizer, Newton's method has quadratic convergence, but globally it might have slower convergence, or it might even diverge. To ensure convergence, a line search is performed for each $\textbf{s}$. This amounts to replacing the step formula above with

\[x_{n+1} = x_n - \alpha \textbf{s}\]

and finding a scalar $\alpha$ such that we get sufficient descent; see the line search section for more information.

Additionally, if the function is locally concave, the step taken in the formulas above will go in a direction of ascent, as the Hessian will not be positive (semi)definite. To avoid this, we use a specialized method to calculate the step direction. If the Hessian is positive semidefinite then the method used is standard, but if it is not, a correction is made using the functionality in PositiveFactorizations.jl.

Example

show the example from the issue

References

+ linesearch = LineSearches.HagerZhang())

The constructor takes two keywords:

Description

Newton's method for optimization has a long history, and is in some sense the gold standard in unconstrained optimization of smooth functions, at least from a theoretical viewpoint. The main benefit is that it has a quadratic rate of convergence near a local optimum. The main disadvantage is that the user has to provide a Hessian. This can be difficult, complicated, or simply annoying. It can also be computationally expensive to calculate it.

Newton's method for optimization consists of applying Newton's method for solving systems of equations, where the equations are the first order conditions, saying that the gradient should equal the zero vector.

\[\nabla f(x) = 0\]

A second order Taylor expansion of the left-hand side leads to the iterative scheme

\[x_{n+1} = x_n - H(x_n)^{-1}\nabla f(x_n)\]

where the inverse is not calculated directly, but the step size is instead calculated by solving

\[H(x) \textbf{s} = \nabla f(x_n).\]

This is equivalent to minimizing a quadratic model, $m_k$ around the current $x_n$

\[m_k(s) = f(x_n) + \nabla f(x_n)^\top \textbf{s} + \frac{1}{2} \textbf{s}^\top H(x_n) \textbf{s}\]

For functions where $H(x_n)$ is difficult, or computationally expensive to obtain, we might replace the Hessian with another positive definite matrix that approximates it. Such methods are called Quasi-Newton methods; see (L-)BFGS and Gradient Descent.

In a sufficiently small neighborhood around the minimizer, Newton's method has quadratic convergence, but globally it might have slower convergence, or it might even diverge. To ensure convergence, a line search is performed for each $\textbf{s}$. This amounts to replacing the step formula above with

\[x_{n+1} = x_n - \alpha \textbf{s}\]

and finding a scalar $\alpha$ such that we get sufficient descent; see the line search section for more information.

Additionally, if the function is locally concave, the step taken in the formulas above will go in a direction of ascent, as the Hessian will not be positive (semi)definite. To avoid this, we use a specialized method to calculate the step direction. If the Hessian is positive semidefinite then the method used is standard, but if it is not, a correction is made using the functionality in PositiveFactorizations.jl.

Example

show the example from the issue

References

diff --git a/dev/algo/newton_trust_region/index.html b/dev/algo/newton_trust_region/index.html index 931d569e..466dbfb5 100644 --- a/dev/algo/newton_trust_region/index.html +++ b/dev/algo/newton_trust_region/index.html @@ -5,4 +5,4 @@ rho_lower = 0.25, rho_upper = 0.75)

The constructor takes keywords that determine the initial and maximal size of the trust region, when to grow and shrink the region, and how close the function should be to the quadratic approximation. The notation follows chapter four of Numerical Optimization. Below, rho $=\rho$ refers to the ratio of the actual function change to the change in the quadratic approximation for a given step.

Description

Newton's method with a trust region is designed to take advantage of the second-order information in a function's Hessian, but with more stability than Newton's method when functions are not globally well-approximated by a quadratic. This is achieved by repeatedly minimizing quadratic approximations within a dynamically-sized "trust region" in which the function is assumed to be locally quadratic [1].

Newton's method optimizes a quadratic approximation to a function. When a function is well approximated by a quadratic (for example, near an optimum), Newton's method converges very quickly by exploiting the second-order information in the Hessian matrix. However, when the function is not well-approximated by a quadratic, either because the starting point is far from the optimum or the function has a more irregular shape, Newton steps can be erratically large, leading to distant, irrelevant areas of the space.

Trust region methods use second-order information but restrict the steps to be within a "trust region" where the function is believed to be approximately quadratic. At iteration $k$, a trust region method chooses a step $p$ to minimize a quadratic approximation to the objective such that the step size is no larger than a given trust region size, $\Delta_k$.

\[\underset{p\in\mathbb{R}^n}\min m_k(p) = f_k + g_k^T p + \frac{1}{2}p^T B_k p \quad\textrm{such that } ||p||\le \Delta_k\]

Here, $p$ is the step to take at iteration $k$, so that $x_{k+1} = x_k + p$. In the definition of $m_k(p)$, $f_k = f(x_k)$ is the value at the previous location, $g_k=\nabla f(x_k)$ is the gradient at the previous location, $B_k = \nabla^2 f(x_k)$ is the Hessian matrix at the previous iterate, and $||\cdot||$ is the Euclidian norm.

If the trust region size, $\Delta_k$, is large enough that the minimizer of the quadratic approximation $m_k(p)$ has $||p|| \le \Delta_k$, then the step is the same as an ordinary Newton step. However, if the unconstrained quadratic minimizer lies outside the trust region, then the minimizer to the constrained problem will occur on the boundary, i.e. we will have $||p|| = \Delta_k$. It turns out that when the Cholesky decomposition of $B_k$ can be computed, the optimal $p$ can be found numerically with relative ease. ([1], section 4.3) This is the method currently used in Optim.

It makes sense to adapt the trust region size, $\Delta_k$, as one moves through the space and assesses the quality of the quadratic fit. This adaptation is controlled by the parameters $\eta$, $\rho_{lower}$, and $\rho_{upper}$, which are parameters to the NewtonTrustRegion optimization method. For each step, we calculate

\[\rho_k := \frac{f(x_{k+1}) - f(x_k)}{m_k(p) - m_k(0)}\]

Intuitively, $\rho_k$ measures the quality of the quadratic approximation: if $\rho_k \approx 1$, then our quadratic approximation is reasonable. If $p$ was on the boundary and $\rho_k > \rho_{upper}$, then perhaps we can benefit from larger steps. In this case, for the next iteration we grow the trust region geometrically up to a maximum of $\hat\Delta$:

\[\rho_k > \rho_{upper} \Rightarrow \Delta_{k+1} = \min(2 \Delta_k, \hat\Delta).\]

Conversely, if $\rho_k < \rho_{lower}$, then we shrink the trust region geometrically:

$\rho_k < \rho_{lower} \Rightarrow \Delta_{k+1} = 0.25 \Delta_k$. Finally, we only accept a point if its decrease is appreciable compared to the quadratic approximation. Specifically, a step is only accepted $\rho_k > \eta$. As long as we choose $\eta$ to be less than $\rho_{lower}$, we will shrink the trust region whenever we reject a step. Eventually, if the objective function is locally quadratic, $\Delta_k$ will become small enough that a quadratic approximation will be accurate enough to make progress again.

Example

using Optim, OptimTestProblems
 prob = UnconstrainedProblems.examples["Rosenbrock"];
-res = Optim.optimize(prob.f, prob.g!, prob.h!, prob.initial_x, NewtonTrustRegion())

References

[1] Nocedal, Jorge, and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.

+res = Optim.optimize(prob.f, prob.g!, prob.h!, prob.initial_x, NewtonTrustRegion())

References

[1] Nocedal, Jorge, and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.

diff --git a/dev/algo/ngmres/index.html b/dev/algo/ngmres/index.html index 3a6ab7d5..85db7ec6 100644 --- a/dev/algo/ngmres/index.html +++ b/dev/algo/ngmres/index.html @@ -93,4 +93,4 @@ * Stopped by an increasing objective: false * Reached Maximum Number of Iterations: false * Objective Calls: 222 - * Gradient Calls: 222

References

[1] De Sterck. Steepest descent preconditioning for nonlinear GMRES optimization. NLAA, 2013. [2] Washio and Oosterlee. Krylov subspace acceleration for nonlinear multigrid schemes. ETNA, 1997. [3] Riseth. Objective acceleration for unconstrained optimization. 2018.

+ * Gradient Calls: 222

References

[1] De Sterck. Steepest descent preconditioning for nonlinear GMRES optimization. NLAA, 2013. [2] Washio and Oosterlee. Krylov subspace acceleration for nonlinear multigrid schemes. ETNA, 1997. [3] Riseth. Objective acceleration for unconstrained optimization. 2018.

diff --git a/dev/algo/particle_swarm/index.html b/dev/algo/particle_swarm/index.html index c311b9a7..5bfe3079 100644 --- a/dev/algo/particle_swarm/index.html +++ b/dev/algo/particle_swarm/index.html @@ -1,4 +1,4 @@ Particle Swarm · Optim

Particle Swarm

Constructor

ParticleSwarm(; lower = [],
                 upper = [],
-                n_particles = 0)

The constructor takes three keywords:

  • lower = [], a vector of lower bounds, unbounded below if empty or Inf's
  • upper = [], a vector of upper bounds, unbounded above if empty or Inf's
  • n_particles = 0, number of particles in the swarm, defaults to least three

Description

The Particle Swarm implementation in Optim.jl is the so-called Adaptive Particle Swarm algorithm in [1]. It attempts to improve global coverage and convergence by switching between four evolutionary states: exploration, exploitation, convergence, and jumping out. In the jumping out state it intentially tries to take the best particle and move it away from its (potentially and probably) local optimum, to improve the ability to find a global optimum. Of course, this comes a the cost of slower convergence, but hopefully converges to the global optimum as a result.

References

[1] Zhan, Zhang, and Chung. Adaptive particle swarm optimization, IEEE Transactions on Systems, Man, and Cybernetics, Part B: CyberneticsVolume 39, Issue 6, 2009, Pages 1362-1381 (2009)

+ n_particles = 0)

The constructor takes three keywords:

Description

The Particle Swarm implementation in Optim.jl is the so-called Adaptive Particle Swarm algorithm in [1]. It attempts to improve global coverage and convergence by switching between four evolutionary states: exploration, exploitation, convergence, and jumping out. In the jumping out state it intentially tries to take the best particle and move it away from its (potentially and probably) local optimum, to improve the ability to find a global optimum. Of course, this comes a the cost of slower convergence, but hopefully converges to the global optimum as a result.

References

[1] Zhan, Zhang, and Chung. Adaptive particle swarm optimization, IEEE Transactions on Systems, Man, and Cybernetics, Part B: CyberneticsVolume 39, Issue 6, 2009, Pages 1362-1381 (2009)

diff --git a/dev/algo/precondition/index.html b/dev/algo/precondition/index.html index 55999098..7fe632be 100644 --- a/dev/algo/precondition/index.html +++ b/dev/algo/precondition/index.html @@ -7,4 +7,4 @@ f(x) = plap([0; x; 0]) g!(G, x) = copyto!(G, (plap1([0; x; 0]))[2:end-1]) result = Optim.optimize(f, g!, initial_x, method = ConjugateGradient(P = nothing)) -result = Optim.optimize(f, g!, initial_x, method = ConjugateGradient(P = precond(100)))

The former optimize call converges at a slower rate than the latter. Looking at a plot of the 2D version of the function shows the problem.

plap

The contours are shaped like ellipsoids, but we would rather want them to be circles. Using the preconditioner effectively changes the coordinates such that the contours becomes less ellipsoid-like. Benchmarking shows that using preconditioning provides an approximate speed-up factor of 15 in this 100 dimensional case.

References

+result = Optim.optimize(f, g!, initial_x, method = ConjugateGradient(P = precond(100)))

The former optimize call converges at a slower rate than the latter. Looking at a plot of the 2D version of the function shows the problem.

plap

The contours are shaped like ellipsoids, but we would rather want them to be circles. Using the preconditioner effectively changes the coordinates such that the contours becomes less ellipsoid-like. Benchmarking shows that using preconditioning provides an approximate speed-up factor of 15 in this 100 dimensional case.

References

diff --git a/dev/algo/samin/index.html b/dev/algo/samin/index.html index 4f21e05e..1631e0b7 100644 --- a/dev/algo/samin/index.html +++ b/dev/algo/samin/index.html @@ -72,4 +72,4 @@ * Reached Maximum Number of Iterations: false * Objective Calls: 12051 * Gradient Calls: 0 -

References

+

References

diff --git a/dev/algo/simulated_annealing/index.html b/dev/algo/simulated_annealing/index.html index 75dc522d..e5f4efbf 100644 --- a/dev/algo/simulated_annealing/index.html +++ b/dev/algo/simulated_annealing/index.html @@ -5,4 +5,4 @@ for i in eachindex(x) x_proposal[i] = x[i]+randn() end -end

As we see, it is not really possible to disentangle the role of the different components of the algorithm. For example, both the functional form of the acceptance function, the temperature and (indirectly) the neighbor function determine if the next draw of x is accepted or not.

The current implementation of Simulated Annealing is very rough. It lacks quite a few features which are normally part of a proper SA implementation. A better implementation is under way, see this issue.

Example

References

+end

As we see, it is not really possible to disentangle the role of the different components of the algorithm. For example, both the functional form of the acceptance function, the temperature and (indirectly) the neighbor function determine if the next draw of x is accepted or not.

The current implementation of Simulated Annealing is very rough. It lacks quite a few features which are normally part of a proper SA implementation. A better implementation is under way, see this issue.

Example

References

diff --git a/dev/dev/contributing/index.html b/dev/dev/contributing/index.html index 093b95c7..0b920320 100644 --- a/dev/dev/contributing/index.html +++ b/dev/dev/contributing/index.html @@ -24,4 +24,4 @@ function update!{T}(d, state::MinimState{T}, method::Minim) # code for Minim here false # should the procedure force quit? -end +end diff --git a/dev/dev/index.html b/dev/dev/index.html index 924b7bde..dbfdc632 100644 --- a/dev/dev/index.html +++ b/dev/dev/index.html @@ -1,2 +1,2 @@ -- · Optim
+- · Optim
diff --git a/dev/examples/generated/ipnewton_basics/index.html b/dev/examples/generated/ipnewton_basics/index.html index b16dc56e..63989ad1 100644 --- a/dev/examples/generated/ipnewton_basics/index.html +++ b/dev/examples/generated/ipnewton_basics/index.html @@ -294,4 +294,4 @@ lx, ux, lc, uc) res = optimize(df, dfc, x0, IPNewton()) -# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl

This page was generated using Literate.jl.

+# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl

This page was generated using Literate.jl.

diff --git a/dev/examples/generated/maxlikenlm/index.html b/dev/examples/generated/maxlikenlm/index.html index e9c4d3d5..3768c2ba 100644 --- a/dev/examples/generated/maxlikenlm/index.html +++ b/dev/examples/generated/maxlikenlm/index.html @@ -254,4 +254,4 @@ println("parameter estimates:", parameters) println("t-statsitics: ", t_stats) -# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl

This page was generated using Literate.jl.

+# This file was generated using Literate.jl, https://github.com/fredrikekre/Literate.jl

This page was generated using Literate.jl.

diff --git a/dev/examples/generated/rasch/index.html b/dev/examples/generated/rasch/index.html index 32b3bb9e..4576f005 100644 --- a/dev/examples/generated/rasch/index.html +++ b/dev/examples/generated/rasch/index.html @@ -132,4 +132,4 @@ -0.242781 0.242732 1.22279 1.66615 -3.05756 -2.62454 - 0.667647 1.10274

This page was generated using Literate.jl.

+ 0.667647 1.10274

This page was generated using Literate.jl.

diff --git a/dev/index.html b/dev/index.html index eb1ed00f..97879e6e 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,2 +1,2 @@ -Home · Optim

Optim.jl

Univariate and multivariate optimization in Julia.

Optim.jl is part of the JuliaNLSolvers family.

SourceBuild StatusSocialReferences to cite
SourceBuild StatusJOSS
Codecov branchBuild StatusDOI

What

Optim is a Julia package for optimizing functions of various kinds. While there is some support for box constrained and Riemannian optimization, most of the solvers try to find an $x$ that minimizes a function $f(x)$ without any constraints. Thus, the main focus is on unconstrained optimization. The provided solvers, under certain conditions, will converge to a local minimum. In the case where a global minimum is desired we supply some methods such as (bounded) simulated annealing and particle swarm. For a dedicated package for global optimization techniques, see e.g. BlackBoxOptim.

Why

There are many solvers available from both free and commercial sources, and many of them are accessible from Julia. Few of them are written in Julia. Performance-wise this is rarely a problem, as they are often written in either Fortran or C. However, solvers written directly in Julia does come with some advantages.

When writing Julia software (packages) that require something to be optimized, the programmer can either choose to write their own optimization routine, or use one of the many available solvers. For example, this could be something from the NLopt suite. This means adding a dependency which is not written in Julia, and more assumptions have to be made as to the environment the user is in. Does the user have the proper compilers? Is it possible to use GPL'ed code in the project? Optim is released under the MIT license, and installation is a simple Pkg.add, so it really doesn't get much freer, easier, and lightweight than that.

It is also true, that using a solver written in C or Fortran makes it impossible to leverage one of the main benefits of Julia: multiple dispatch. Since Optim is entirely written in Julia, we can currently use the dispatch system to ease the use of custom preconditioners. A planned feature along these lines is to allow for user controlled choice of solvers for various steps in the algorithm, entirely based on dispatch, and not predefined possibilities chosen by the developers of Optim.

Being a Julia package also means that Optim has access to the automatic differentiation features through the packages in JuliaDiff.

How

The package is a registered package, and can be installed with Pkg.add.

julia> using Pkg; Pkg.add("Optim")

or through the pkg REPL mode by typing

] add Optim
+Home · Optim

Optim.jl

Univariate and multivariate optimization in Julia.

Optim.jl is part of the JuliaNLSolvers family.

SourceBuild StatusSocialReferences to cite
SourceBuild StatusJOSS
Codecov branchBuild StatusDOI

What

Optim is a Julia package for optimizing functions of various kinds. While there is some support for box constrained and Riemannian optimization, most of the solvers try to find an $x$ that minimizes a function $f(x)$ without any constraints. Thus, the main focus is on unconstrained optimization. The provided solvers, under certain conditions, will converge to a local minimum. In the case where a global minimum is desired we supply some methods such as (bounded) simulated annealing and particle swarm. For a dedicated package for global optimization techniques, see e.g. BlackBoxOptim.

Why

There are many solvers available from both free and commercial sources, and many of them are accessible from Julia. Few of them are written in Julia. Performance-wise this is rarely a problem, as they are often written in either Fortran or C. However, solvers written directly in Julia does come with some advantages.

When writing Julia software (packages) that require something to be optimized, the programmer can either choose to write their own optimization routine, or use one of the many available solvers. For example, this could be something from the NLopt suite. This means adding a dependency which is not written in Julia, and more assumptions have to be made as to the environment the user is in. Does the user have the proper compilers? Is it possible to use GPL'ed code in the project? Optim is released under the MIT license, and installation is a simple Pkg.add, so it really doesn't get much freer, easier, and lightweight than that.

It is also true, that using a solver written in C or Fortran makes it impossible to leverage one of the main benefits of Julia: multiple dispatch. Since Optim is entirely written in Julia, we can currently use the dispatch system to ease the use of custom preconditioners. A planned feature along these lines is to allow for user controlled choice of solvers for various steps in the algorithm, entirely based on dispatch, and not predefined possibilities chosen by the developers of Optim.

Being a Julia package also means that Optim has access to the automatic differentiation features through the packages in JuliaDiff.

How

The package is a registered package, and can be installed with Pkg.add.

julia> using Pkg; Pkg.add("Optim")

or through the pkg REPL mode by typing

] add Optim
diff --git a/dev/user/algochoice/index.html b/dev/user/algochoice/index.html index 6d6453a6..6f473894 100644 --- a/dev/user/algochoice/index.html +++ b/dev/user/algochoice/index.html @@ -1,2 +1,2 @@ -Algorithm choice · Optim

Algorithm choice

There are two main settings you must choose in Optim: the algorithm and the linesearch.

Algorithms

The first choice to be made is that of the order of the method. Zeroth-order methods do not have gradient information, and are very slow to converge, especially in high dimension. First-order methods do not have access to curvature information and can take a large number of iterations to converge for badly conditioned problems. Second-order methods can converge very quickly once in the vicinity of a minimizer. Of course, this enhanced performance comes at a cost: the objective function has to be differentiable, you have to supply gradients and Hessians, and, for second order methods, a linear system has to be solved at each step.

If you can provide analytic gradients and Hessians, and the dimension of the problem is not too large, then second order methods are very efficient. The Newton method with trust region is the method of choice.

When you do not have an explicit Hessian or when the dimension becomes large enough that the linear solve in the Newton method becomes the bottleneck, first order methods should be preferred. BFGS is a very efficient method, but also requires a linear system solve. LBFGS usually has a performance very close to that of BFGS, and avoids linear system solves (the parameter m can be tweaked: increasing it can improve the convergence, at the expense of memory and time spent in linear algebra operations). The conjugate gradient method usually converges less quickly than LBFGS, but requires less memory. Gradient descent should only be used for testing. Acceleration methods are experimental.

When the objective function is non-differentiable or you do not want to use gradients, use zeroth-order methods. Nelder-Mead is currently the most robust.

Linesearches

Linesearches are used in every first- and second-order method except for the trust-region Newton method. Linesearch routines attempt to locate quickly an approximate minimizer of the univariate function $\alpha \to f(x+ \alpha d)$, where $d$ is the descent direction computed by the algorithm. They vary in how accurate this minimization is. Two good linesearches are BackTracking and HagerZhang, the former being less stringent than the latter. For well-conditioned objective functions and methods where the step is usually well-scaled (such as LBFGS or Newton), a rough linesearch such as BackTracking is usually the most performant. For badly behaved problems or when extreme accuracy is needed (gradients below the square root of the machine epsilon, about $10^{-8}$ with Float64), the HagerZhang method proves more robust. An exception is the conjugate gradient method which requires an accurate linesearch to be efficient, and should be used with the HagerZhang linesearch.

Summary

As a very crude heuristic:

For a low-dimensional problem with analytic gradients and Hessians, use the Newton method with trust region. For larger problems or when there is no analytic Hessian, use LBFGS, and tweak the parameter m if needed. If the function is non-differentiable, use Nelder-Mead. Use the HagerZhang linesearch for robustness and BackTracking for speed.

+Algorithm choice · Optim

Algorithm choice

There are two main settings you must choose in Optim: the algorithm and the linesearch.

Algorithms

The first choice to be made is that of the order of the method. Zeroth-order methods do not have gradient information, and are very slow to converge, especially in high dimension. First-order methods do not have access to curvature information and can take a large number of iterations to converge for badly conditioned problems. Second-order methods can converge very quickly once in the vicinity of a minimizer. Of course, this enhanced performance comes at a cost: the objective function has to be differentiable, you have to supply gradients and Hessians, and, for second order methods, a linear system has to be solved at each step.

If you can provide analytic gradients and Hessians, and the dimension of the problem is not too large, then second order methods are very efficient. The Newton method with trust region is the method of choice.

When you do not have an explicit Hessian or when the dimension becomes large enough that the linear solve in the Newton method becomes the bottleneck, first order methods should be preferred. BFGS is a very efficient method, but also requires a linear system solve. LBFGS usually has a performance very close to that of BFGS, and avoids linear system solves (the parameter m can be tweaked: increasing it can improve the convergence, at the expense of memory and time spent in linear algebra operations). The conjugate gradient method usually converges less quickly than LBFGS, but requires less memory. Gradient descent should only be used for testing. Acceleration methods are experimental.

When the objective function is non-differentiable or you do not want to use gradients, use zeroth-order methods. Nelder-Mead is currently the most robust.

Linesearches

Linesearches are used in every first- and second-order method except for the trust-region Newton method. Linesearch routines attempt to locate quickly an approximate minimizer of the univariate function $\alpha \to f(x+ \alpha d)$, where $d$ is the descent direction computed by the algorithm. They vary in how accurate this minimization is. Two good linesearches are BackTracking and HagerZhang, the former being less stringent than the latter. For well-conditioned objective functions and methods where the step is usually well-scaled (such as LBFGS or Newton), a rough linesearch such as BackTracking is usually the most performant. For badly behaved problems or when extreme accuracy is needed (gradients below the square root of the machine epsilon, about $10^{-8}$ with Float64), the HagerZhang method proves more robust. An exception is the conjugate gradient method which requires an accurate linesearch to be efficient, and should be used with the HagerZhang linesearch.

Summary

As a very crude heuristic:

For a low-dimensional problem with analytic gradients and Hessians, use the Newton method with trust region. For larger problems or when there is no analytic Hessian, use LBFGS, and tweak the parameter m if needed. If the function is non-differentiable, use Nelder-Mead. Use the HagerZhang linesearch for robustness and BackTracking for speed.

diff --git a/dev/user/config/index.html b/dev/user/config/index.html index a2ac4b8f..00b32be0 100644 --- a/dev/user/config/index.html +++ b/dev/user/config/index.html @@ -13,4 +13,4 @@ iterations = 10, store_trace = true, show_trace = false, - show_warnings = true)

Notice the need to specify the method using a keyword if this syntax is used. This approach might be deprecated in the future, and as a result we recommend writing code that has to maintained using the Optim.Options approach.

+ show_warnings = true)

Notice the need to specify the method using a keyword if this syntax is used. This approach might be deprecated in the future, and as a result we recommend writing code that has to maintained using the Optim.Options approach.

diff --git a/dev/user/gradientsandhessians/index.html b/dev/user/gradientsandhessians/index.html index eeebe6d8..c03c5391 100644 --- a/dev/user/gradientsandhessians/index.html +++ b/dev/user/gradientsandhessians/index.html @@ -35,4 +35,4 @@ julia> Optim.minimizer(optimize(f, initial_x, Newton(); autodiff = :forward)) 2-element Array{Float64,1}: 1.0 - 1.0

Indeed, the minimizer was found, without providing any gradients or Hessians.

+ 1.0

Indeed, the minimizer was found, without providing any gradients or Hessians.

diff --git a/dev/user/minimization/index.html b/dev/user/minimization/index.html index 9e7bf45d..c42c9a84 100644 --- a/dev/user/minimization/index.html +++ b/dev/user/minimization/index.html @@ -39,4 +39,4 @@ -1.49994 julia> Optim.minimum(res) - -2.8333333205768865

Complete list of functions

A complete list of functions can be found below.

Defined for all methods:

Defined for univariate optimization:

Defined for multivariate optimization:

Defined for NelderMead with the option trace_simplex=true:

Input types

Most users will input Vector's as their initial_x's, and get an Optim.minimizer(res) out that is also a vector. For zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays. The only restriction imposed by leaving the Vector case is, that it is no longer possible to use finite difference approximations or automatic differentiation. Second order methods (variants of Newton's method) do not support this more general input type.

Notes on convergence flags and checks

Currently, it is possible to access a minimizer using Optim.minimizer(result) even if all convergence flags are false. This means that the user has to be a bit careful when using the output from the solvers. It is advised to include checks for convergence if the minimizer or minimum is used to carry out further calculations.

A related note is that first and second order methods makes a convergence check on the gradient before entering the optimization loop. This is done to prevent line search errors if initial_x is a stationary point. Notice, that this is only a first order check. If initial_x is any type of stationary point, g_converged will be true. This includes local minima, saddle points, and local maxima. If iterations is 0 and g_converged is true, the user needs to keep this point in mind.

+ -2.8333333205768865

Complete list of functions

A complete list of functions can be found below.

Defined for all methods:

Defined for univariate optimization:

Defined for multivariate optimization:

Defined for NelderMead with the option trace_simplex=true:

Input types

Most users will input Vector's as their initial_x's, and get an Optim.minimizer(res) out that is also a vector. For zeroth and first order methods, it is also possible to pass in matrices, or even higher dimensional arrays. The only restriction imposed by leaving the Vector case is, that it is no longer possible to use finite difference approximations or automatic differentiation. Second order methods (variants of Newton's method) do not support this more general input type.

Notes on convergence flags and checks

Currently, it is possible to access a minimizer using Optim.minimizer(result) even if all convergence flags are false. This means that the user has to be a bit careful when using the output from the solvers. It is advised to include checks for convergence if the minimizer or minimum is used to carry out further calculations.

A related note is that first and second order methods makes a convergence check on the gradient before entering the optimization loop. This is done to prevent line search errors if initial_x is a stationary point. Notice, that this is only a first order check. If initial_x is any type of stationary point, g_converged will be true. This includes local minima, saddle points, and local maxima. If iterations is 0 and g_converged is true, the user needs to keep this point in mind.

diff --git a/dev/user/tipsandtricks/index.html b/dev/user/tipsandtricks/index.html index a9e5b91c..75fea797 100644 --- a/dev/user/tipsandtricks/index.html +++ b/dev/user/tipsandtricks/index.html @@ -192,4 +192,4 @@ * Convergence: false * √(Σ(yᵢ-ȳ)²)/n < 1.0e-08: false * Reached Maximum Number of Iterations: false - * Objective Function Calls: 24 + * Objective Function Calls: 24