vignettes/trustOptim-quick.Rmd
trustOptim-quick.Rmd
This is a quick demo of how to use the trustOptim package. For this example, the objective function is the Rosenbrock function. \[ f(x_{1:N},y_{1:N})=\sum_{i=1}^N \left[100\left(x^2_i-y_i\right)^2+\left(x_i-1\right)^2\right] \]
The parameter vector contains \(2N\) variables ordered as \(x_1, y_1, x_2, y_2, ... x_n, y_n\). The optimum of the function is a vector of ones, and the value at the minimum is zero.
The following functions return the objective, gradient, and Hessian (in sparse format) of this function.
require(trustOptim)
require(Matrix)
f <- function(V) {
N <- length(V)/2
x <- V[seq(1,2*N-1,by=2)]
y <- V[seq(2,2*N,by=2)]
return(sum(100*(x^2-y)^2+(x-1)^2))
}
df <- function(V) {
N <- length(V)/2
x <- V[seq(1,2*N-1,by=2)]
y <- V[seq(2,2*N,by=2)]
t <- x^2-y
dxi <- 400*t*x+2*(x-1)
dyi <- -200*t
return(as.vector(rbind(dxi,dyi)))
}
hess <- function(V) {
N <- length(V)/2
x <- V[seq(1,2*N-1,by=2)]
y <- V[seq(2,2*N,by=2)]
d0 <- rep(200,N*2)
d0[seq(1,(2*N-1),by=2)] <- 1200*x^2-400*y+2
d1 <- rep(0,2*N-1)
d1[seq(1,(2*N-1),by=2)] <- -400*x
H <- bandSparse(2*N,
k=c(-1,0,1),
diagonals=list(d1,d0,d1),
symmetric=FALSE,
repr='C')
return(drop0(H))
}
For this demo, we start at a random vector.
Next, we call trust.optim
, with all default arguments.
opt <- trust.optim(start, fn=f, gr=df, hs=hess, method="Sparse")
# Beginning optimization
#
# iter f nrm_gr status
# 1 10015.031437 10987.613876 Continuing - TR expand
# 2 10015.031437 10987.613876 Continuing - TR contract
# 3 219.817975 1588.461373 Continuing - TR expand
# 4 219.817975 1588.461373 Continuing - TR contract
# 5 219.817975 1588.461373 Continuing - TR contract
# 6 219.817975 1588.461373 Continuing - TR contract
# 7 82.628848 703.991158 Continuing
# 8 17.092094 196.558106 Continuing - TR expand
# 9 17.092094 196.558106 Continuing - TR contract
# 10 17.092094 196.558106 Continuing - TR contract
# 11 17.092094 196.558106 Continuing - TR contract
# 12 15.903946 87.156878 Continuing
# 13 10.985480 39.443766 Continuing - TR expand
# 14 10.985480 39.443766 Continuing - TR contract
# 15 9.991010 96.253961 Continuing
# 16 7.829903 26.847358 Continuing
# 17 7.829903 26.847358 Continuing - TR contract
# 18 6.689434 33.066636 Continuing - TR expand
# 19 6.689434 33.066636 Continuing - TR contract
# 20 6.221074 58.822745 Continuing
# 21 4.451638 16.401178 Continuing
# 22 4.451638 16.401178 Continuing - TR contract
# 23 3.834185 30.511940 Continuing - TR expand
# 24 2.924576 8.993870 Continuing
# 25 2.924576 8.993870 Continuing - TR contract
#
# iter f nrm_gr status
# 26 2.924576 8.993870 Continuing - TR contract
# 27 2.532653 20.239445 Continuing
# 28 1.786237 5.608208 Continuing
# 29 1.786237 5.608208 Continuing - TR contract
# 30 1.380482 7.023595 Continuing - TR expand
# 31 1.019554 5.780867 Continuing
# 32 0.725310 4.438917 Continuing
# 33 0.502544 4.865616 Continuing
# 34 0.324202 3.268848 Continuing
# 35 0.207341 5.565837 Continuing
# 36 0.111685 1.780744 Continuing
# 37 0.072992 7.082851 Continuing
# 38 0.022568 0.361310 Continuing
# 39 0.022568 0.361310 Continuing - TR contract
# 40 0.022568 0.361310 Continuing - TR contract
# 41 0.009448 2.928907 Continuing - TR expand
# 42 0.001544 0.247652 Continuing
# 43 0.000153 0.498492 Continuing
# 44 0.000001 0.005509 Continuing
# 45 0.000000 0.000359 Continuing
# 46 0.000000 0.000000 Continuing
#
# Iteration has terminated
# 46 0.000000 0.000000 Success
In the above output, f
is the objective function, and nrm_gr
is the norm of the gradient. The status
messages illustrate how the underlying trust region algorithm is progressing, and are useful mainly for debugging purposes. Note that the objective value is non-increasing at each iteration, but the norm of the gradient is not. The algorithm will continue until either the norm of the gradient is less than the control parameter prec
, the trust region radius is less than stop.trust.radius
, or the iteration count exceeds maxit
. See the package manual for details of the control parameters. We use the default control parameters for this demo (hence, there is no control list here.
The result contains the objective value, the minimum, the gradient at the minimum (should be numerically zero for all elements), and the Hessian at the minimum.
opt
# $fval
# [1] 2.233e-19
#
# $solution
# [1] 1 1 1 1 1 1
#
# $gradient
# [1] 2.330e-09 -1.631e-09 0.000e+00 0.000e+00 0.000e+00 0.000e+00
#
# $hessian
# 6 x 6 sparse Matrix of class "dsCMatrix"
#
# [1,] 802 -400 . . . .
# [2,] -400 200 . . . .
# [3,] . . 802 -400 . .
# [4,] . . -400 200 . .
# [5,] . . . . 802 -400
# [6,] . . . . -400 200
#
# $iterations
# [1] 46
#
# $status
# [1] "Success"
#
# $trust.radius
# [1] 0.5006
#
# $nnz
# [1] 9
#
# $method
# [1] "Sparse"
Note that opt$fval
, and all elements of opt$gradient
are zero, within machine precision. The solution is correct, and the Hessian is returned as a compressed sparse Matrix object (refer to the Matrix package for details).
One way to potentially speed up convergence (but not necessarily compute time) is to apply a preconditioner to the algorithm. Other than the identity matrix (the default), the package current supports only a modified Cholesky preconditioner. This is implemented with a control parameter preconditioner=1
. To save space, we report the optimizer status only ever 10 iterations.
opt1 <- trust.optim(start, fn=f, gr=df, hs=hess, method="Sparse",
control=list(preconditioner=1, report.freq=10))
# Beginning optimization
#
# iter f nrm_gr status
# 10 13.648174 7.496606 Continuing - TR contract
# 20 7.151990 37.094469 Continuing - TR expand
# 30 3.408752 18.596836 Continuing - TR expand
# 40 0.836712 12.994715 Continuing
# 50 0.127632 6.011876 Continuing
#
# Iteration has terminated
# 59 0.000000 0.000000 Success
Here, we see that adding the preconditioner actually increases the number of iterations. Sometimes preconditioners help a lot, and sometimes not at all.
##Quasi-Newton Methods
The trust.optim
function also supports quasi-Newton approximations to the Hessian. The two options are BFGS and SR1 updates. See Nocedal and Wright (2006) for details. You do not need to provide the Hessian for these methods, and they are often preferred when the Hessian is dense. However, they may take longer to converge, which is why we need to change the maxit
control parameter. To save space, we report the status of the optimizer only every 10 iterations.
opt.bfgs <- trust.optim(start, fn=f, gr=df, method="BFGS", control=list(maxit=5000, report.freq=10))
# Beginning optimization
#
# iter f nrm_gr status
# 10 88.806530 354.018026 Continuing
# 20 1.823163 6.328353 Continuing - TR contract
# 30 1.389553 4.246382 Continuing
# 40 0.802565 3.855513 Continuing - TR expand
# 50 0.571448 2.816390 Continuing - TR expand
# 60 0.208713 10.847588 Continuing - TR contract
# 70 0.007267 1.328119 Continuing - TR contract
# 80 0.005292 0.670023 Continuing - TR expand
# 90 0.000001 0.004485 Continuing
# 100 0.000000 0.000000 Continuing
#
# Iteration has terminated
# 101 0.000000 0.000000 Success
opt.bfgs
# $fval
# [1] 8.831e-23
#
# $solution
# [1] 1 1 1 1 1 1
#
# $gradient
# [1] -1.149e-10 6.506e-11 8.304e-12 -6.639e-12 -7.876e-11 3.619e-11
#
# $iterations
# [1] 101
#
# $status
# [1] "Success"
#
# $trust.radius
# [1] 0.3383
#
# $method
# [1] "BFGS"
#
# $hessian.update.method
# [1] 2
And we can do the same thing with SR1 updates.
opt.sr1 <- trust.optim(start, fn=f, gr=df, method="SR1", control=list(maxit=5000, report.freq=10))
# Beginning optimization
#
# iter f nrm_gr status
# 10 175.256780 287.816322 Continuing - TR contract
# 20 2.931556 2.996846 Continuing
# 30 2.131656 6.411590 Continuing
# 40 1.127632 3.784477 Continuing - TR expand
# 50 0.315880 6.964198 Continuing
# 60 0.208635 1.302632 Continuing - TR expand
# 70 0.132457 5.187551 Continuing - TR contract
# 80 0.108929 2.243055 Continuing - TR contract
# 90 0.103658 0.417444 Continuing - TR expand
# 100 0.039978 2.964973 Continuing - TR contract
# 110 0.007480 2.932051 Continuing - TR contract
# 120 0.006168 1.885221 Continuing - TR contract
# 130 0.000005 0.033647 Continuing - TR contract
# 140 0.000000 0.002637 Continuing
#
# Iteration has terminated
# 144 0.000000 0.000000 Success
opt.sr1
# $fval
# [1] 1.742e-19
#
# $solution
# [1] 1 1 1 1 1 1
#
# $gradient
# [1] -1.423e-09 7.846e-10 4.564e-11 1.104e-10 -1.383e-08 6.725e-09
#
# $iterations
# [1] 144
#
# $status
# [1] "Success"
#
# $trust.radius
# [1] 0.005085
#
# $method
# [1] "SR1"
#
# $hessian.update.method
# [1] 1
Note that the quasi_Newton updates do not return a Hessian. We do not think that the final approximations from BFGS or SR1 updates are particularly reliable. If you need the Hessian, you can use the sparseHessianFD package.