-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for comment: (breaking) change gradient and Jacobian calculation algorithms? #168
Comments
The changes you are proposing aims at improving accuracy and consistency through the package. So I am inclined to give it a go. I don't know if that should be seen as a breaking change though as it is likely to make unit tests of dependent packages fail but not core functionalities in the sense that we are not e.g. removing mandatory or optional arguments of current functions but rather improving accuracy of result. |
Hi, @astamm. I say it’s breaking in the semantic version sense, as rerunning the exact same code will not necessarily return the same results after the change. It’s a change for the better, I believe, but a change nonetheless. |
I am out of my numerical analysis depth here but the five-point stencil sounds reasonable. Is it (much) more expensive? Also, what about AD? Can we possibly fold that in one day? |
I guess we could use torch to implement AD whenever analytical gradient is not provided by the user. Even better in terms of accuracy. Less light in terms of dependencies and might be less trivial to make it easy to use. |
I don't know that much about AD but I wrapped one simple little library in a package to test (RcppFastAD). I know there is more in terms of libraries so maybe we can just tag it as possible future work. |
I know next to nothing about AD, other than that Stan uses it. My assumption is that it requires parsing the passed formulas/functions and that it would be a significant change from what we have now. What I was suggesting is just a more accurate estimate of finite differences using a few more points, which is orders of magnitude easier to implement. |
As a google query shows there is more at CRAN. I think TMB was one of the earlier ones. |
I followed a tutorial on automatic differentiation via torch. I have no experience of any other package but I feel like using torch could be an easy way to set up AD in nloptr. On the con side, it comes with a bunch of dependencies itself. But broadly speaking, I agree that now that AD is available, if we want to spend time on improving differentiation and induce a breaking change, it might be more appropriate to switch directly to AD rather than improving the numerical scheme. I can try to set up an example with torch and then we can discuss whether it seems like a good idea. |
And maybe we can try to set up the same example via RcppFastAD which is very lightweight. |
Incremental change is good, and for a suggested package the dependency tail is less critical. (Fewer is of course still better.) |
Personally, I think that while adding AD is a worthwhile endeavor, it is enough of a major change to the package that it should be branched off and slowly developed on its own. In the interim, here is a brief test script I cobbled together to demonstrate the effects of moving to a more refined estimate. Executive summary: for these two examples, the five-point central difference is about 50% slower---a median 18 microseconds vs. 12 for the 5-variable gradient example and 21 vs 14 for the three-input/two output Jacobian example---but is anywhere one to three orders of magnitude more accurate. However, testing it against the HS100, there isn't necessarily much advantage, At least on my computer, bith 3 and 5 fail at xtol_rel = 1e-6 or more coarse, and pass at xtol_rel = 1e-7 or better. If we decide to update the grad/jac functions, they will be written more efficiently (no switches, etc.). Test script below for your enjoyment; my results posted below the raw script :) Test Script# Necessary Functions
grad.3pc <- function(x0, fn, hh, heps, ...) {
(fn(x0 + hh) - fn(x0 - hh)) / (2 * heps)
}
grad.5pc <- function(x0, fn, hh, heps, ...) {
(fn(x0 - 2 * hh) - 8 * fn(x0 - hh) + 8 * fn(x0 + hh) - fn(x0 + 2 * hh)) /
(12 * heps)
}
nl.grad <- function(x0, fn, points = 3, ...) {
if (!is.numeric(x0)) stop("Argument 'x0' must be a numeric value.")
fun <- match.fun(fn)
fn <- function(x) fun(x, ...)
if (length(fn(x0)) != 1)
stop("Function 'f' must be a scalar function (return a single value).")
points <- as.character(points)
gradCalc <- switch(points, "3" = grad.3pc, grad.5pc)
heps <- .Machine$double.eps ^ (1 / switch(points, "3" = 3, 5))
n <- length(x0)
hh <- gr <- rep(0, n)
for (i in seq_len(n)) {
hh[i] <- heps
gr[i] <- gradCalc(x0, fn, hh, heps)
hh[i] <- 0
}
gr
}
nl.jacobian <- function(x0, fn, points = 3, ...) {
n <- length(x0)
if (!is.numeric(x0) || n == 0)
stop("Argument 'x' must be a non-empty numeric vector.")
fun <- match.fun(fn)
fn <- function(x) fun(x, ...)
points <- as.character(points)
gradCalc <- switch(points, "3" = grad.3pc, grad.5pc)
heps <- .Machine$double.eps ^ (1 / switch(points, "3" = 3, 5))
jacob <- matrix(NA_real_, length(fn(x0)), n)
hh <- rep(0, n)
for (i in seq_len(n)) {
hh[i] <- heps
jacob[, i] <- gradCalc(x0, fn, hh, heps)
hh[i] <- 0
}
jacob
}
# Test Examples
## Gradient of e^(abcde)
x0 <- c(-2, 2, 2, -1, -1)
fnE <- function(x) exp(x[1] * x[2] * x[3] * x[4] * x[5])
grE <- function(x) {
c(x[2] * x[3] * x[4] * x[5],
x[1] * x[3] * x[4] * x[5],
x[1] * x[2] * x[4] * x[5],
x[1] * x[2] * x[3] * x[5],
x[1] * x[2] * x[3] * x[4]) * exp(prod(x))
}
print(grE(x0) - nl.grad(x0, fnE, 3L), digits = 17L)
print(grE(x0) - nl.grad(x0, fnE, 5L), digits = 17L)
# Difference Order
print(abs((grE(x0) - nl.grad(x0, fnE, 3L)) /
(grE(x0) - nl.grad(x0, fnE, 5L))), digits = 17L)
# Timings
microbenchmark::microbenchmark(
Three = nl.grad(x0, fnE, 3L),
Five = nl.grad(x0, fnE, 5L),
check = "equal",
times = 1000L,
control = list(order = "block")
)
## Jacobian test
x0 <- 1:3
fn1 <- function(x) {
c(3 * x[1L] ^ 2 * x[2L] * log(x[3L]),
x[3] ^ 3 - 2 * x[1L] * x[2L])
}
jac1 <- function(x) {
matrix(c(6 * x[1L] * x[2L] * log(x[3L]),
3 * x[1L] ^ 2 * log(x[3L]),
3 * x[1L] ^ 2 * x[2L] / x[3L],
-2 * x[2L], -2 * x[1L], 3 * x[3L] ^ 2),
nrow = 2L, byrow = TRUE)
}
print(jac1(x0) - nl.jacobian(x0, fn1, 3L), digits = 17L)
print(jac1(x0) - nl.jacobian(x0, fn1, 5L), digits = 17L)
# Difference Order
print(abs((jac1(x0) - nl.jacobian(x0, fn1, 3L)) /
(jac1(x0) - nl.jacobian(x0, fn1, 5L))), digits = 17L)
# Timings
microbenchmark::microbenchmark(
Three = nl.jacobian(x0, fn1, 3L),
Five = nl.jacobian(x0, fn1, 5L),
check = "equal",
times = 1000L,
control = list(order = "block")
)
## Problem Example: HS-100
## https://apmonitor.com/wiki/uploads/Apps/hs100.apm
## best known objective = 680.6300573
x0.hs100 <- c(1, 2, 0, 4, 0, 1, 1)
fn.hs100 <- function(x) {(x[1] - 10) ^ 2 + 5 * (x[2] - 12) ^ 2 + x[3] ^ 4 +
3 * (x[4] - 11) ^ 2 + 10 * x[5] ^ 6 + 7 * x[6] ^ 2 +
x[7] ^ 4 - 4 * x[6] * x[7] - 10 * x[6] - 8 * x[7]}
gr.hs100 <- function(x) {
c( 2 * x[1] - 20,
10 * x[2] - 120,
4 * x[3] ^ 3,
6 * x[4] - 66,
60 * x[5] ^ 5,
14 * x[6] - 4 * x[7] - 10,
4 * x[7] ^ 3 - 4 * x[6] - 8)
}
hin.hs100 <- function(x) {c(
2 * x[1] ^ 2 + 3 * x[2] ^ 4 + x[3] + 4 * x[4] ^ 2 + 5 * x[5] - 127,
7 * x[1] + 3 * x[2] + 10 * x[3] ^ 2 + x[4] - x[5] - 282,
23 * x[1] + x[2] ^ 2 + 6 * x[6] ^ 2 - 8 * x[7] - 196,
4 * x[1] ^ 2 + x[2] ^ 2 - 3 * x[1] * x[2] + 2 * x[3] ^ 2 + 5 * x[6] -
11 * x[7])
}
hinjac.hs100 <- function(x) {
matrix(c(4 * x[1], 12 * x[2] ^ 3, 1, 8 * x[4], 5, 0, 0,
7, 3, 20 * x[3], 1, -1, 0, 0,
23, 2 * x[2], 0, 0, 0, 12 * x[6], -8,
8 * x[1] - 3 * x[2], 2 * x[2] - 3 * x[1], 4 * x[3], 0, 0, 5, -11),
nrow = 4, byrow = TRUE)
}
gr.hs100.computed.3 <- function(x) {
nl.grad(x, fn.hs100, points = 3L)
}
hinjac.hs100.computed.3 <- function(x) {
nl.jacobian(x, hin.hs100, points = 3L)
}
gr.hs100.computed.5 <- function(x) {
nl.grad(x, fn.hs100, points = 5L)
}
hinjac.hs100.computed.5 <- function(x) {
nl.jacobian(x, hin.hs100, points = 5L)
}
nloptr(x0 = x0.hs100,
eval_f = fn.hs100,
eval_grad_f = gr.hs100,
eval_g_ineq = hin.hs100,
eval_jac_g_ineq = hinjac.hs100,
opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-7,
maxeval = 1000L))
# No significant difference between 3 and 5 point on my computer. Both fail at
# xtol_rel of 1e-6 and both pass at 1e-7.
nloptr(x0 = x0.hs100,
eval_f = fn.hs100,
eval_grad_f = gr.hs100.computed.3,
eval_g_ineq = hin.hs100,
eval_jac_g_ineq = hinjac.hs100.computed.3,
opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-7,
maxeval = 1000L))
nloptr(x0 = x0.hs100,
eval_f = fn.hs100,
eval_grad_f = gr.hs100.computed.5,
eval_g_ineq = hin.hs100,
eval_jac_g_ineq = hinjac.hs100.computed.5,
opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-7,
maxeval = 1000L)) @aadler results> # Necessary Functions
>
> grad.3pc <- function(x0, fn, hh, heps, ...) {
+ (fn(x0 + hh) - fn(x0 - hh)) / (2 * heps)
+ }
> grad.5pc <- function(x0, fn, hh, heps, ...) {
+ (fn(x0 - 2 * hh) - 8 * fn(x0 - hh) + 8 * fn(x0 + hh) - fn(x0 + 2 * hh)) /
+ (12 * heps)
+ }
> nl.grad <- function(x0, fn, points = 3, ...) {
+
+ if (!is.numeric(x0)) stop("Argument 'x0' must be a numeric value.")
+
+ fun <- match.fu .... [TRUNCATED]
> nl.jacobian <- function(x0, fn, points = 3, ...) {
+
+ n <- length(x0)
+ if (!is.numeric(x0) || n == 0)
+ stop("Argument 'x' must be a no ..." ... [TRUNCATED]
> # Test Examples
> ## Gradient of e^(abcde)
> x0 <- c(-2, 2, 2, -1, -1)
> fnE <- function(x) exp(x[1] * x[2] * x[3] * x[4] * x[5])
> grE <- function(x) {
+ c(x[2] * x[3] * x[4] * x[5],
+ x[1] * x[3] * x[4] * x[5],
+ x[1] * x[2] * x[4] * x[5],
+ x[1] * x[2] * x[3] * x .... [TRUNCATED]
> print(grE(x0) - nl.grad(x0, fnE, 3L), digits = 17L)
[1] -1.3422509631544344e-13 1.3422509631544344e-13 1.3422509631544344e-13 -1.0383460584406645e-12 -1.0383460584406645e-12
> print(grE(x0) - nl.grad(x0, fnE, 5L), digits = 17L)
[1] 3.3586414899255956e-15 -3.3404268934278392e-15 -3.3404268934278392e-15 1.0970955047207021e-13 1.0970955047207021e-13
> # Difference Order
> print(abs((grE(x0) - nl.grad(x0, fnE, 3L)) /
+ (grE(x0) - nl.grad(x0, fnE, 5L))), digits = 17L)
[1] 39.9641035573632877 40.1820188250568009 40.1820188250568009 9.4645001640491273 9.4645001640491273
> # Timings
> microbenchmark::microbenchmark(
+ Three = nl.grad(x0, fnE, 3L),
+ Five = nl.grad(x0, fnE, 5L),
+ check = "equal",
+ times = 1000 .... [TRUNCATED]
Unit: microseconds
expr min lq mean median uq max neval cld
Three 13.901 14.201 14.79719 14.401 14.701 64.502 1000 a
Five 20.700 21.002 22.58859 21.201 21.701 96.201 1000 b
> ## Jacobian test
> x0 <- 1:3
> fn1 <- function(x) {
+ c(3 * x[1L] ^ 2 * x[2L] * log(x[3L]),
+ x[3] ^ 3 - 2 * x[1L] * x[2L])
+ }
> jac1 <- function(x) {
+ matrix(c(6 * x[1L] * x[2L] * log(x[3L]),
+ 3 * x[1L] ^ 2 * log(x[3L]),
+ 3 * x[1L] ^ 2 * x[2L] / x[3 .... [TRUNCATED]
> print(jac1(x0) - nl.jacobian(x0, fn1, 3L), digits = 17L)
[,1] [,2] [,3]
[1,] 6.7961636318614183e-11 -1.3438139490062895e-12 -4.7106762934845392e-11
[2,] 9.4213525869690784e-11 4.7106762934845392e-11 -9.2928686967752583e-10
> print(jac1(x0) - nl.jacobian(x0, fn1, 5L), digits = 17L)
[,1] [,2] [,3]
[1,] -1.2168044349891716e-12 1.2962964035523328e-12 2.7666757773658901e-13
[2,] 4.6469494918710552e-12 5.2358117841322382e-13 -1.8651746813702630e-12
> # Difference Order
> print(abs((jac1(x0) - nl.jacobian(x0, fn1, 3L)) /
+ (jac1(x0) - nl.jacobian(x0, fn1, 5L))), digits = 17L)
[,1] [,2] [,3]
[1,] 55.852554744525548 1.0366563891743747 170.26484751203853
[2,] 20.274273700305809 89.9703138252756531 498.23047619047617
> # Timings
> microbenchmark::microbenchmark(
+ Three = nl.jacobian(x0, fn1, 3L),
+ Five = nl.jacobian(x0, fn1, 5L),
+ check = "equal",
+ time .... [TRUNCATED]
Unit: microseconds
expr min lq mean median uq max neval cld
Three 14.300 14.601 16.23849 14.900 15.401 83.001 1000 a
Five 20.101 20.501 23.62570 21.001 22.600 91.101 1000 b
> ## Problem Example: HS-100
> ## https://apmonitor.com/wiki/uploads/Apps/hs100.apm
> ## best known objective = 680.6300573
>
> x0.hs100 <- c(1, 2, 0 .... [TRUNCATED]
> fn.hs100 <- function(x) {(x[1] - 10) ^ 2 + 5 * (x[2] - 12) ^ 2 + x[3] ^ 4 +
+ 3 * (x[4] - 11) ^ 2 + 10 * x[5] ^ 6 + 7 * x[6] ^ 2 +
+ x[7] ^ .... [TRUNCATED]
> gr.hs100 <- function(x) {
+ c( 2 * x[1] - 20,
+ 10 * x[2] - 120,
+ 4 * x[3] ^ 3,
+ 6 * x[4] - 66,
+ 60 * x[5] ^ 5,
+ 14 * .... [TRUNCATED]
> hin.hs100 <- function(x) {c(
+ 2 * x[1] ^ 2 + 3 * x[2] ^ 4 + x[3] + 4 * x[4] ^ 2 + 5 * x[5] - 127,
+ 7 * x[1] + 3 * x[2] + 10 * x[3] ^ 2 + x[4] .... [TRUNCATED]
> hinjac.hs100 <- function(x) {
+ matrix(c(4 * x[1], 12 * x[2] ^ 3, 1, 8 * x[4], 5, 0, 0,
+ 7, 3, 20 * x[3], 1, -1, 0, 0,
+ 23 .... [TRUNCATED]
> gr.hs100.computed.3 <- function(x) {
+ nl.grad(x, fn.hs100, points = 3L)
+ }
> hinjac.hs100.computed.3 <- function(x) {
+ nl.jacobian(x, hin.hs100, points = 3L)
+ }
> gr.hs100.computed.5 <- function(x) {
+ nl.grad(x, fn.hs100, points = 5L)
+ }
> hinjac.hs100.computed.5 <- function(x) {
+ nl.jacobian(x, hin.hs100, points = 5L)
+ }
> nloptr(x0 = x0.hs100,
+ eval_f = fn.hs100,
+ eval_grad_f = gr.hs100,
+ eval_g_ineq = hin.hs100,
+ eval_jac_g_ineq = hinj .... [TRUNCATED]
Call:
nloptr(x0 = x0.hs100, eval_f = fn.hs100, eval_grad_f = gr.hs100, eval_g_ineq = hin.hs100, eval_jac_g_ineq = hinjac.hs100,
opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-07, maxeval = 1000L))
Minimization using NLopt version 2.7.1
NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached. )
Number of Iterations....: 39
Termination conditions: xtol_rel: 1e-07 maxeval: 1000
Number of inequality constraints: 4
Number of equality constraints: 0
Optimal value of objective function: 680.630057367193
Optimal value of controls: 2.330499 1.951372 -0.4775409 4.365727 -0.6244865 1.038131 1.594226
> # No significant difference between 3 and 5 point on my computer. Both fail at
> # xtol_rel of 1e-6 and both pass at 1e-7.
>
> nloptr(x0 = x0.hs100 .... [TRUNCATED]
Call:
nloptr(x0 = x0.hs100, eval_f = fn.hs100, eval_grad_f = gr.hs100.computed.3, eval_g_ineq = hin.hs100, eval_jac_g_ineq = hinjac.hs100.computed.3,
opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-07, maxeval = 1000L))
Minimization using NLopt version 2.7.1
NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached. )
Number of Iterations....: 60
Termination conditions: xtol_rel: 1e-07 maxeval: 1000
Number of inequality constraints: 4
Number of equality constraints: 0
Optimal value of objective function: 680.630057364075
Optimal value of controls: 2.330497 1.951371 -0.4775421 4.36573 -0.6244872 1.038139 1.594228
> nloptr(x0 = x0.hs100,
+ eval_f = fn.hs100,
+ eval_grad_f = gr.hs100.computed.5,
+ eval_g_ineq = hin.hs100,
+ eval_jac_g_ .... [TRUNCATED]
Call:
nloptr(x0 = x0.hs100, eval_f = fn.hs100, eval_grad_f = gr.hs100.computed.5, eval_g_ineq = hin.hs100, eval_jac_g_ineq = hinjac.hs100.computed.5,
opts = list(algorithm = "NLOPT_LD_SLSQP", xtol_rel = 1e-07, maxeval = 1000L))
Minimization using NLopt version 2.7.1
NLopt solver status: 4 ( NLOPT_XTOL_REACHED: Optimization stopped because xtol_rel or xtol_abs (above) was reached. )
Number of Iterations....: 39
Termination conditions: xtol_rel: 1e-07 maxeval: 1000
Number of inequality constraints: 4
Number of equality constraints: 0
Optimal value of objective function: 680.630057364095
Optimal value of controls: 2.330499 1.951372 -0.4775413 4.365726 -0.624487 1.038131 1.594227 |
Great, thanks. I guess one thing that has to be done is keeping only one function for computing the derivative and use everywhere the same function, be it 3-point or 5-point central finite difference. We could then opt in for the 3-point method by default but offer to the user the possibility to switch to the 5-point should (s)he aim at improving accuracy at the expense of computation time. |
@astamm. Adding the option for 3 vs. 5 without breaking existing code is possible if we add the choice after |
@astamm, I'm going to have to change |
This isn't an issue per se, but more of a request for comment. First, we have two separate algorithms for calculating numerical gradients. The first is a classic two point—well, technically three point with a middle coefficient of 0—centered estimate implemented by @hwborchers in gradients.R. This is used in both
nl.grad
andnl.jacobian
. The second is a two-point forward difference implemented by @jyypma in finite.diff.R. Clearly, these will return different results even for the same input being that they are using different estimation algorithms. Interestingly, Jelmer's code is used only for checking derivatives as seen in check.derivatives.R. While Hans's code is used for all cases where a gradient or Jacobian is needed and not passed by the user, as found in the many wrapper functions he wrote.When I was redoing a lot of the examples, I found that some of the issues Hans faced with the Hock & Schittkowski problem 100, among others, was that the gradient was not quite precise enough (See #152 for example). Calculating the analytic gradients and Jacobians for the examples obviated the problem in the examples, but the basic issue remains.
I think the second issue is easier to address. I would like to change the calculations in
nl.grad
andnl.jacobian
to the five-point stencil, which is also the result of applying one iteration of Richardson extrapolation to the centered difference (see Numerical Analysis, Sauer, Timothy (2012), p. 250). Following the analysis on page 248 of Sauer (2012) , this will change the (near-)optimalheps
to the fifth root ofeps
instead of the cube root, and should result in a reduction in the error of about two orders of magnitude. However, this will be a breaking change, since the results of the numeric differentiation will be different. I think moving to an even higher order accuracy will run into issues with floating point representation, but I think the jump from central to second-order/Richardson is worth it.Eventually, I think we should also look into having only one numerical differentiation algorithm and replace
finite.diff
withnl.grad
, especially ifnl.grad
is made more precise, but that is of lower priority as FD is only used for checking.Thoughts about the changes and if/when to implement them? I am especially looking for feedback from @astamm, @eddelbuettel, @jyypma, and @hwborchers, but would appreciate any and all feedback. Thank you.
The text was updated successfully, but these errors were encountered: