Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exit early when nan in grad or hessian #1046

Conversation

mohamed82008
Copy link
Contributor

This PR checks that the gradient and Hessian are all finite, otherwise it breaks early with a warning.

@pkofod
Copy link
Member

pkofod commented Aug 10, 2023

I agree that this is a good idea because the current behavior will keep iterating until iterations number of iterations. I am wondering if it would make sense to simply restart the hessian approximation if the method is a quasi-newton instead of failing, but that's for another PR.

@pkofod
Copy link
Member

pkofod commented Aug 11, 2023

Tests fail because TwiceDifferentiableHV does not have an H field, but you could check for NaNs in the Hessian-Vector product. I know it should be a different counter, but that can be another PR.

@mohamed82008
Copy link
Contributor Author

I excluded TwiceDifferentiableHV from the check.

@codecov
Copy link

codecov bot commented Aug 15, 2023

Codecov Report

Merging #1046 (8757ec9) into master (fd93033) will increase coverage by 0.02%.
The diff coverage is 100.00%.

❗ Current head 8757ec9 differs from pull request most recent head 141aa1c. Consider uploading reports for the commit 141aa1c to get more accurate results

@@            Coverage Diff             @@
##           master    #1046      +/-   ##
==========================================
+ Coverage   85.26%   85.29%   +0.02%     
==========================================
  Files          43       43              
  Lines        3210     3216       +6     
==========================================
+ Hits         2737     2743       +6     
  Misses        473      473              
Files Changed Coverage Δ
src/multivariate/optimize/optimize.jl 91.42% <100.00%> (+0.80%) ⬆️

@pkofod pkofod merged commit 2a44548 into JuliaNLSolvers:master Aug 16, 2023
@MilesCranmer
Copy link
Contributor

Is there a way to hide this warning? PySR and SymbolicRegression.jl have started generating TONS of these warnings during the expression search (which is fine; they will just skip such an expression).

@MilesCranmer
Copy link
Contributor

MilesCranmer commented Aug 17, 2023

Does this interfere with the behavior of Backtracking line searches? If so I think this is a breaking change and should be by default turned off, right? It seems to be messing with PySR & SymbolicRegression.jl's optimization loop

@MilesCranmer
Copy link
Contributor

Yeah I think this is a breaking change. See LineSearches.BackTracking: https://github.com/JuliaNLSolvers/LineSearches.jl/blob/ded667a80f47886c77d67e8890f6adb127679ab4/src/backtracking.jl#L64

    # Hard-coded backtrack until we find a finite function value
    iterfinite = 0
    while !isfinite(ϕx_1) && iterfinite < iterfinitemax
        iterfinite += 1
        α_1 = α_2
        α_2 = α_1/2

        ϕx_1 = ϕ(α_2)
    end

^ i.e., it will backtrack when it hits a NaN. Whereas this PR makes it exit when it hits a NaN, which is fundamentally different behavior.

Could you please roll this change back and make it optional? It has changed the algorithmic behavior of PySR and SymbolicRegression.jl.

@mohamed82008
Copy link
Contributor Author

can you share an example that is broken now but used to work?

@mohamed82008
Copy link
Contributor Author

Gradient-based optimisers in Optim work as follows:

  1. Evaluate the objective, gradient and optionally Hessian at the current point x
  2. Find a search direction based on the gradient (and Hessian if available)
  3. Do a line search along the search direction and find the best solution on the line by objective value f(x)
  4. Move to the best solution on the line and go back to step 1

The check here is in step 1 checking if it results in NaN gradient/Hessian. For the current point in step 1 to have a NaN gradient, it must not have had a NaN objective because it was the output of the line search in step 3. None of the line search algorithms in step 3 will terminate at a point with a NaN objective if the initial point had a finite objective value but they can terminate at a point with NaN gradient and/or Hessian. So if the gradient/Hessian in step 1 has NaNs then the search direction will have NaNs and the line search will fail because all steps along a NaN search direction lead to a NaN x and then NaN objective value. So the rest of the iterations will all be spent failing line searches and doing unnecessary function evaluations.

Given the above reasoning, I am curious what exactly did this PR break?

@MilesCranmer
Copy link
Contributor

Hm, I wonder if it was due to the special handling of NaNs in my package - I treat NaNs as a signal. I will have a look this weekend and see what the issue is. I fixed the version to 1.7.6 in SymbolicRegression.jl for now.

In any case, the warnings would be nice to turn off; is that possible?

@mohamed82008
Copy link
Contributor Author

It should be possible to turn the warnings off with a flag but that requires a new option in Optim so the decision will be Patrick's. It's also possible to turn off warnings from your end as a user.

@MilesCranmer
Copy link
Contributor

MilesCranmer commented Aug 29, 2023

Thanks @mohamed82008. I made a PR in #1049 to turn off the warning messages when show_trace = false. i.e., this avoids the need of making a new option by just using the verbosity option already present.

@pkofod
Copy link
Member

pkofod commented Oct 7, 2023

Yeah I think this is a breaking change.

I think this is false. This check happens after the backtracking out of non-finite values loop.

@MilesCranmer
Copy link
Contributor

Thanks. Could you please see #1049? That is blocking me from updating to the latest Optim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants