The sparseMVN package provides standard multivariate normal (MVN) sampling and density algorithms that are optimized for sparse covariance and precision matrices.
The package can be installed from CRAN:
or from GitHub
rmvn.sparse() function samples from an MVN distribution, and dmvn.sparse() computes an MVN density. The user must first precompute a sparse Cholesky factorization of either the covariance or the precision matrix.
CH <- Matrix::Cholesky(as(S, 'dsCMatrix')) x <- rmvn.sparse(10, mu, CH, prec=FALSE) ## 10 random draws of x d <- dmvn.sparse(x, mu, CH, prec=FALSE) ## densities of the 10 draws
This package relies heavily on classes and algorithms from the Matrix package. In the preceding code,
S is a covariance stored as a dsCMatrix object , which is a sparse, column-compressed symmetric matrix.
CH is a Cholesky factor class (either dCHMsuper or dCHMsimpl) generated by Cholesky() that defines a sparse Cholesky factor, but is not itself a matrix. It is not possible to supply a triangular Cholesky factor using chol() or Matrix::chol() instead.
S were a precision matrix instead, the Cholesky step is unchanged, but the
prec argument in
dmvn.sparse() would be TRUE. By default, dmvn.sparse() returns a log density, but that can be overridden with a
The counterpart MVN functions for base “dense” R matrices are
mvtnorm::dmvnorm(). The key distinction between those functions and
dmvn.sparse() are that the mvtnorm counterparts require a covariance stored as a garden-variety, base R matrix (providing the precision instead is not an option). sparseMVN has the following advantages when either the covariance or precision is sparse, and especially when the dimension of the MVN distribution is large.
In a dense symmetric matrix, all of the elements in the upper triangle are duplicated in the lower triangle. By expressing the covariance/precision as an explicitly symmetric matrix, we avoid storing redundant or duplicate data in memory.
By definition, most elements in a sparse matrix are zeros. It is more efficient to store only the nonzero values, along with pointers to where in the matrix those values belong, rather than store the zeros explicitly. The sparse linear algebra routines defined in Matrix avoid the many wasteful multiply-by-zero that would be done using standard dense matrix algorithms.
Under the hood, every call to
mvtnorm::dmvnorm() involves factoring a dense covariance matrix, which is repetitive if that matrix does not change. These steps are avoided by computing the Cholesky factor just once outside of
dmvn.sparse(), and reusing it for subsequent calls.
In many applications, the precision is more readily available than the covariance (e.g., the Hessian at the maximum log posterior density). Converting that precision to a covariance requires a matrix inversion that can be costly, and possibly numerically unstable.
A more detailed
vignette("sparseMVN2") explains the theory behind the package, and shows evidence of scalability.