The sparseMVN package provides standard multivariate normal (MVN) sampling and density algorithms that are optimized for sparse covariance and precision matrices.

Installation

The package can be installed from CRAN:

install.packages("sparseMVN")

or from GitHub

install_github("braunm/sparseMVN")

Using the package

The rmvn.sparse() function samples from an MVN distribution, and dmvn.sparse() computes an MVN density. The user must first precompute a sparse Cholesky factorization of either the covariance or the precision matrix.

CH <- Matrix::Cholesky(as(S, 'dsCMatrix'))
x <- rmvn.sparse(10, mu, CH, prec=FALSE) ## 10 random draws of x
d <- dmvn.sparse(x, mu, CH, prec=FALSE) ## densities of the 10 draws

This package relies heavily on classes and algorithms from the Matrix package. In the preceding code, S is a covariance stored as a dsCMatrix object , which is a sparse, column-compressed symmetric matrix. CH is a Cholesky factor class (either dCHMsuper or dCHMsimpl) generated by Cholesky() that defines a sparse Cholesky factor, but is not itself a matrix. It is not possible to supply a triangular Cholesky factor using chol() or Matrix::chol() instead.

If S were a precision matrix instead, the Cholesky step is unchanged, but the prec argument in rmvn.sparse() and dmvn.sparse() would be TRUE. By default, dmvn.sparse() returns a log density, but that can be overridden with a log=FALSE argument.

Why use this package?

The counterpart MVN functions for base “dense” R matrices are mvtnorm::rmvnorm() and mvtnorm::dmvnorm(). The key distinction between those functions and rmvn.sparse() or dmvn.sparse() are that the mvtnorm counterparts require a covariance stored as a garden-variety, base R matrix (providing the precision instead is not an option). sparseMVN has the following advantages when either the covariance or precision is sparse, and especially when the dimension of the MVN distribution is large.

  1. In a dense symmetric matrix, all of the elements in the upper triangle are duplicated in the lower triangle. By expressing the covariance/precision as an explicitly symmetric matrix, we avoid storing redundant or duplicate data in memory.

  2. By definition, most elements in a sparse matrix are zeros. It is more efficient to store only the nonzero values, along with pointers to where in the matrix those values belong, rather than store the zeros explicitly. The sparse linear algebra routines defined in Matrix avoid the many wasteful multiply-by-zero that would be done using standard dense matrix algorithms.

  3. Under the hood, every call to mvtnorm::rmvnorm() or mvtnorm::dmvnorm() involves factoring a dense covariance matrix, which is repetitive if that matrix does not change. These steps are avoided by computing the Cholesky factor just once outside of rmvn.sparse() and dmvn.sparse(), and reusing it for subsequent calls.

  4. In many applications, the precision is more readily available than the covariance (e.g., the Hessian at the maximum log posterior density). Converting that precision to a covariance requires a matrix inversion that can be costly, and possibly numerically unstable.

Learn more

A more detailed vignette("sparseMVN2") explains the theory behind the package, and shows evidence of scalability.