Estimate covariance/correlation with low rank and shrinkage

Estimate the covariance/correlation between columns as the weighted sum of a low rank matrix and a scaled identity matrix. The weight acts to shrink the sample correlation matrix towards the identity matrix or the sample covariance matrix towards a scaled identity matrix with constant variance. An estimate of this form is useful because it is fast, and enables fast operations downstream. The method is based on the Gaussian Inverse Wishart Empirical Bayes (GIW-EB) model.

Usage

eclairs(
  X,
  k,
  lambda = NULL,
  compute = c("covariance", "correlation"),
  n.samples = nrow(X)
)

Arguments

X: data matrix with n samples as rows and p features as columns
k: the rank of the low rank component
lambda: shrinkage parameter. If not specified, it is estimated from the data.
compute: evaluate either the "covariance" or "correlation" of X
n.samples: number of samples data is from. Usually nrow(X), but can be other values in special cases.

Value

eclairs object storing:

U:: orthonormal matrix with k columns representing the low rank component
dSq:: eigen-values so that \(U diag(d^2) U^T\) is the low rank component
lambda:: shrinkage parameter \(\lambda\) for the scaled diagonal component
sigma:: standard deviations of input columns
nu:: diagonal value, \(\nu\), of target matrix in shrinkage
n:: number of samples (i.e. rows) in the original data
p:: number of features (i.e. columns) in the original data
k:: rank of low rank component
rownames:: sample names from the original matrix
colnames:: features names from the original matrix
method:: method used for decomposition
call:: the function call

Details

Compute \(U\), \(d^2\) to approximate the correlation matrix between columns of data matrix X by \(U diag(d^2 (1-\lambda)) U^T + I\nu\lambda\). When computing the covariance matrix, scale by the standard deviation of each feature.

Examples

library(Rfast)

n <- 800 # number of samples
p <- 200 # number of features

# create correlation matrix
Sigma <- autocorr.mat(p, .9)

# draw data from correlation matrix Sigma
Y <- rmvnorm(n, rep(0, p), sigma = Sigma * 5.1, seed = 1)
rownames(Y) <- paste0("sample_", seq(n))
colnames(Y) <- paste0("gene_", seq(p))

# eclairs decomposition: covariance
ecl <- eclairs(Y, compute = "covariance")

ecl
#>        Estimate covariance with low rank and shrinkage
#> 
#>   Original data dims: 800 x 200 
#>   Low rank component: 200 
#>   lambda:             0.0265 
#>   nu:                 1 
#>   sd(200):            2.21 2.22 ... 2.18
#>   logLik:             -117651 

# eclairs decomposition: correlation
ecl <- eclairs(Y, compute = "correlation")

ecl
#>        Estimate correlation with low rank and shrinkage
#> 
#>   Original data dims: 800 x 200 
#>   Low rank component: 200 
#>   lambda:             0.0265 
#>   nu:                 1 
#>   Avg corr (EB):      0.0832 
#>   Avg corrSq (EB):    0.0402 
#>   logLik:             -117651