Estimate the covariance/correlation between columns as the weighted sum of a low rank matrix and a scaled identity matrix. The weight acts to shrink the sample correlation matrix towards the identity matrix or the sample covariance matrix towards a scaled identity matrix with constant variance. An estimate of this form is useful because it is fast, and enables fast operations downstream. The method is based on the Gaussian Inverse Wishart Empirical Bayes (GIW-EB) model.
Arguments
- X
data matrix with n samples as rows and p features as columns
- k
the rank of the low rank component
- lambda
shrinkage parameter. If not specified, it is estimated from the data.
- compute
evaluate either the
"covariance"
or"correlation"
ofX
- n.samples
number of samples data is from. Usually
nrow(X)
, but can be other values in special cases.
Value
eclairs object storing:
- U:
orthonormal matrix with k columns representing the low rank component
- dSq:
eigen-values so that \(U diag(d^2) U^T\) is the low rank component
- lambda:
shrinkage parameter \(\lambda\) for the scaled diagonal component
- sigma:
standard deviations of input columns
- nu:
diagonal value, \(\nu\), of target matrix in shrinkage
- n:
number of samples (i.e. rows) in the original data
- p:
number of features (i.e. columns) in the original data
- k:
rank of low rank component
- rownames:
sample names from the original matrix
- colnames:
features names from the original matrix
- method:
method used for decomposition
- call:
the function call
Details
Compute \(U\), \(d^2\) to approximate the correlation matrix between columns of data matrix X by \(U diag(d^2 (1-\lambda)) U^T + I\nu\lambda\). When computing the covariance matrix, scale by the standard deviation of each feature.
Examples
library(Rfast)
n <- 800 # number of samples
p <- 200 # number of features
# create correlation matrix
Sigma <- autocorr.mat(p, .9)
# draw data from correlation matrix Sigma
Y <- rmvnorm(n, rep(0, p), sigma = Sigma * 5.1, seed = 1)
rownames(Y) <- paste0("sample_", seq(n))
colnames(Y) <- paste0("gene_", seq(p))
# eclairs decomposition: covariance
ecl <- eclairs(Y, compute = "covariance")
ecl
#> Estimate covariance with low rank and shrinkage
#>
#> Original data dims: 800 x 200
#> Low rank component: 200
#> lambda: 0.0265
#> nu: 1
#> sd(200): 2.21 2.22 ... 2.18
#> logLik: -117651
# eclairs decomposition: correlation
ecl <- eclairs(Y, compute = "correlation")
ecl
#> Estimate correlation with low rank and shrinkage
#>
#> Original data dims: 800 x 200
#> Low rank component: 200
#> lambda: 0.0265
#> nu: 1
#> Avg corr (EB): 0.0832
#> Avg corrSq (EB): 0.0402
#> logLik: -117651