Suppose you have a scalar-valued objective function of N variables. You have an R function that returns the function value, and another that returns the gradient. You still need the Hessian, which would be hard to derive analytically, and translate into an R function. If N is very large, it can take a long time to estimate the Hessian numerically, and a lot of memory to store all N^2 elements.
This scalability problem comes up in at least two areas of statistics:
The sparseHessianFD package solves this problem when the Hessian is sparse, and the sparsity pattern is known in advance. The sparsity pattern is just the row and column indices of the non-zero elements in the lower triangle of the Hessian. Sparse Hessians arise, for example, in hierarchical models in which parameters are conditionally independent across heterogeneous units. In that case, we know which cross-partial derivatives are zero.
See the vignette for more about sparsity patterns in Hessians.
The package is available on CRAN, so a simple
install.packages("sparseHessianFD")should suffice. The package is being developed on Github and you can get development branches there.
To use the package, you need at least the following:
The sparseHessianFD function returns an object of class sparseHessianFD. This class contains methods to return the function value, gradient and Hessian. The Hessian is returned in a sparse, compressed format as a dgCMatrix object (defined in the Matrix package).
A typical usage might look like this.
P <- rnorm(N) ## random "starting value" for initialization
obj <- sparseHessianFD(P, f, df, rows, cols)
X <- rnorm(N) ## another set of variables at which the function is evaluated
val <- obj$fn(X) ## function value
gr <- obj$gr(X) ## gradient
hess <- obj$hessian(X)  ## Hessian, as a sparse dgCMatrix objectSee the vignette and reference pages for more details, options, convenience functions, etc.