Skip to contents

Estimate the covariance/correlation between columns as the weighted sum of a low rank matrix and a scaled identity matrix. The weight acts to shrink the sample correlation matrix towards the identity matrix or the sample covariance matrix towards a scaled identity matrix with constant variance. An estimate of this form is useful because it is fast, and enables fast operations downstream. The method is based on the Gaussian Inverse Wishart Empirical Bayes (GIW-EB) model.

Usage

eclairs(
  X,
  k,
  lambda = NULL,
  compute = c("covariance", "correlation"),
  n.samples = nrow(X)
)

Arguments

X

data matrix with n samples as rows and p features as columns

k

the rank of the low rank component

lambda

shrinkage parameter. If not specified, it is estimated from the data.

compute

evaluate either the "covariance" or "correlation" of X

n.samples

number of samples data is from. Usually nrow(X), but can be other values in special cases.

Value

eclairs object storing:

U:

orthonormal matrix with k columns representing the low rank component

dSq:

eigen-values so that \(U diag(d^2) U^T\) is the low rank component

lambda:

shrinkage parameter \(\lambda\) for the scaled diagonal component

sigma:

standard deviations of input columns

nu:

diagonal value, \(\nu\), of target matrix in shrinkage

n:

number of samples (i.e. rows) in the original data

p:

number of features (i.e. columns) in the original data

k:

rank of low rank component

rownames:

sample names from the original matrix

colnames:

features names from the original matrix

method:

method used for decomposition

call:

the function call

Details

Compute \(U\), \(d^2\) to approximate the correlation matrix between columns of data matrix X by \(U diag(d^2 (1-\lambda)) U^T + I\nu\lambda\). When computing the covariance matrix, scale by the standard deviation of each feature.

Examples

library(Rfast)

n <- 800 # number of samples
p <- 200 # number of features

# create correlation matrix
Sigma <- autocorr.mat(p, .9)

# draw data from correlation matrix Sigma
Y <- rmvnorm(n, rep(0, p), sigma = Sigma * 5.1, seed = 1)
rownames(Y) <- paste0("sample_", seq(n))
colnames(Y) <- paste0("gene_", seq(p))

# eclairs decomposition: covariance
ecl <- eclairs(Y, compute = "covariance")

ecl
#>        Estimate covariance with low rank and shrinkage
#> 
#>   Original data dims: 800 x 200 
#>   Low rank component: 200 
#>   lambda:             0.0265 
#>   nu:                 1 
#>   sd(200):            2.21 2.22 ... 2.18
#>   logLik:             -117651 

# eclairs decomposition: correlation
ecl <- eclairs(Y, compute = "correlation")

ecl
#>        Estimate correlation with low rank and shrinkage
#> 
#>   Original data dims: 800 x 200 
#>   Low rank component: 200 
#>   lambda:             0.0265 
#>   nu:                 1 
#>   Avg corr (EB):      0.0832 
#>   Avg corrSq (EB):    0.0402 
#>   logLik:             -117651