Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding the minimizer to high precision #1120

Open
kbarros opened this issue Nov 14, 2024 · 4 comments
Open

Finding the minimizer to high precision #1120

kbarros opened this issue Nov 14, 2024 · 4 comments

Comments

@kbarros
Copy link

kbarros commented Nov 14, 2024

Convergence of optimize typically finds the cost function minimum to full numerical precision $\epsilon$ (about 16 digits), but the minimizer, and the gradient of the cost function, will be converged to a much reduced precision of $\sqrt{\epsilon}$ (about 8 digits). This makes sense when considering a quadratic cost function of the form $f(x) = -c x^2$; convergence in $x$ will be much slower than convergence in $f$. Optim will by default terminate when the value of $f$ converges, even if g_tol is set to a very tight tolerance.

For some applications, it's important to have the true minimizer $x$ to high precision. I discovered that this is possible by setting two fields in Options, namely f_reltol=NaN, f_abstol=NaN. With both set, the ConjugateGradient method will ignore convergence of $f$ in its termination condition, and will then respect a very tight g_tol. And these options seem to be correctly propagated to the internal line search that conjugate gradient uses!

Is setting f_reltol=NaN, f_abstol=NaN an "officially supported" use of Optim?

If so, it seems like f_reltol and f_abstol both need to be mentioned in the documentation (finding them currently requires reading the source code). Note that setting f_tol=NaN alone does not seem to work. It would also be great to have test cases to check this use case.

@pkofod
Copy link
Member

pkofod commented Nov 14, 2024

Yes it should be mentioned. The whole section on options and tolerances need to be updated. See also #1102 . Currently, Optim's defaults are 0 for the f_tols and if you only set one to NaN, the other will still "converge" (or better, be satisfied) because in your case it seems like you've run out of precision in the objective, but you can still take steps that make the gradient elements smaller.

So yes, you can set the two tolerances to NaN or to negative values (I believe). Do you need this level of precision? I'm am curious, because if you need that level of precision have you considered using another number type? In my other package NLSolvers.jl I support DoubleFloats

julia> using NLSolvers

julia> using DoubleFloats

julia> function fdouble(x)
               fx = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2
               return fx
           end
fdouble (generic function with 1 method)



julia>     function fgdouble(G, x)
               fx = (1 - x[1])^2 + 100 * (x[2] - x[1]^2)^2

               G1 = -2 * (1 - x[1]) - 400 * (x[2] - x[1]^2) * x[1]
               G2 = 200 * (x[2] - x[1]^2)
               G = [G1, G2]
               return fx, G
           end
fgdouble (generic function with 1 method)

julia>     f_obj = OptimizationProblem(
               ScalarObjective(
                   f=fdouble,
                   fg=fgdouble,
               ),
           )
OptimizationProblem{ScalarObjective{typeof(fdouble), Nothing, typeof(fgdouble), Nothing, Nothing, Nothing, Nothing, Nothing}, Nothing, NLSolvers.Euclidean{Tuple{0}}, Nothing, NLSolvers.InPlace, Nothing}(ScalarObjective{typeof(fdouble), Nothing, typeof(fgdouble), Nothing, Nothing, Nothing, Nothing, Nothing}(fdouble, nothing, fgdouble, nothing, nothing, nothing, nothing, nothing), nothing, NLSolvers.Euclidean{Tuple{0}}(), nothing, NLSolvers.InPlace(), nothing)

julia>     res =
               res = solve(
                   f_obj,
                   Double64.([1, 2]),
                   LineSearch(GradientDescent(Inverse())),
                   OptimizationOptions(; g_abstol = 1e-32, maxiter = 100000),
               )
Results of minimization

* Algorithm:
  Gradient Descent with backtracking (no interp)

* Candidate solution:
  Final objective value:    6.94e-65
  Final gradient norm:      9.91e-33

  Initial objective value:  1.00e+02
  Initial gradient norm:    4.00e+02

* Stopping criteria
  |x - x'|              = 2.09e-35 <= 0.00e+00 (false)
  |x - x'|/|x|          = 1.48e-35 <= 0.00e+00 (false)
  |f(x) - f(x')|        = 1.11e-67 <= -Inf (false)
  |f(x) - f(x')|/|f(x)| = 1.60e-03 <= -Inf (false)
  |g(x)|                = 9.91e-33 <= 1.00e-32 (true)
  |g(x)|/|g(x₀)|        = 2.48e-35 <= 0.00e+00 (false)

* Work counters
  Seconds run:   4.82e-01
  Iterations:    88677


julia>     res =
               res = solve(
                   f_obj,
                   Double64.([1, 2]),
                   LineSearch(BFGS(Inverse())),
                   OptimizationOptions(; g_abstol = 1e-32, maxiter = 100000),
               )
Results of minimization

* Algorithm:
  Inverse BFGS with backtracking (no interp)

* Candidate solution:
  Final objective value:    1.47e-70
  Final gradient norm:      1.47e-34

  Initial objective value:  1.00e+02
  Initial gradient norm:    4.00e+02

* Stopping criteria
  |x - x'|              = 1.98e-30 <= 0.00e+00 (false)
  |x - x'|/|x|          = 1.40e-30 <= 0.00e+00 (false)
  |f(x) - f(x')|        = 9.86e-60 <= -Inf (false)
  |f(x) - f(x')|/|f(x)| = 1.00e+00 <= -Inf (false)
  |g(x)|                = 1.47e-34 <= 1.00e-32 (true)
  |g(x)|/|g(x₀)|        = 3.67e-37 <= 0.00e+00 (false)

* Work counters
  Seconds run:   1.19e-04
  Iterations:    20

@kbarros
Copy link
Author

kbarros commented Nov 15, 2024

Hi @pkofod, thanks for your reply. It's great to know that setting f_reltol=NaN, f_abstol=NaN together can be officially supported. I've incorporated this approach into our package Sunny, and it works great:
SunnySuite/Sunny.jl#333

Do you need this level of precision? I'm am curious, because if you need that level of precision have you considered using another number type?

Our application is quantum magnetism. The physical observables (excitation energies, associated with scattering of neutron or x-ray beams) can depend somewhat sensitively on the accuracy of the minimizer "x" (magnetic configuration) not just on the accuracy of the "f" function (ground state energy). Now we can easily get "x" to 12 digits of accuracy, which is enough. We're also speed sensitive, so it's nice to be able to use Float64.

@pkofod
Copy link
Member

pkofod commented Nov 15, 2024

Great. The actionable item here is to improve documentation then.

@kbarros
Copy link
Author

kbarros commented Nov 15, 2024

Yes, correct.

Also, could you please consider including a test like this to make sure that this feature continues to work in future releases?

import Optim
import Test: @test

rosenbrock(x) = (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2

# Need high-precision gradient to impose tight tolerances below
function g_rosenbrock!(g, x)
    g[1] = 2 * (1.0 - x[1]) * (-1) + 2 * 100 * (x[2] - x[1]^2) * (-2 * x[1])
    g[2] = 2 * 100 * (x[2] - x[1]^2)
end

# To set tight tolerance on gradient g, need to disable any check on f
options = Optim.Options(g_tol=1e-10, f_reltol=NaN, f_abstol=NaN)
result = Optim.optimize(rosenbrock, g_rosenbrock!, zeros(2), Optim.ConjugateGradient(), options)
@test Optim.g_residual(result) < 1e-10

# To set tight tolerance on x, need to also disable default gradient tolerance, g_tol=1e-8
options = Optim.Options(x_tol=1e-10, g_tol=NaN, f_reltol=NaN, f_abstol=NaN)
result = Optim.optimize(rosenbrock, g_rosenbrock!, zeros(2), Optim.ConjugateGradient(), options)
@test Optim.x_abschange(result) < 1e-10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants