Summarize correlation matrix — averageCorr • decorrelate

Summarize correlation matrix as a scalar scalar value, given its SVD and shrinkage parameter. averageCorr() computes the average correlation, and averageCorrSq() computes the average squared correlation, where both exclude the diagonal terms. sumInverseCorr() computes the sum of entries in the inverse correlation matrix to give the 'effective number of independent features'. effVariance() evaluates effective variance of the correlation (or covariance) matrix. These values can be computed using the correlation matrix using standard MLE, or EB shrinkage.

Usage

averageCorr(ecl, method = c("EB", "MLE"))

averageCorrSq(ecl, method = c("EB", "MLE"))

sumInverseCorr(ecl, method = c("EB", "MLE"), absolute = TRUE)

effVariance(ecl, method = c("EB", "MLE"))

tr(ecl, method = c("EB", "MLE"))

Arguments

ecl: estimate of correlation matrix from eclairs() storing \(U\), \(d_1^2\), \(\lambda\) and \(\nu\)
method: compute average correlation for either the empirical Bayes (EB) shinken correlation matrix or the MLE correlation matrix
absolute: if TRUE (default) evaluate on absolute correlation matrix

Value

value of summary statistic

Details

tr(): trace of the matrix. Sum of diagonals is the same as the sum of the eigen-values.

averageCorr(): The average correlation is computed by summing the off-diagonal values in the correlation matrix. The sum of all elements in a matrix is \(g = \sum_{i,j} C_{i,j} = 1^T C 1 \), where \(1\) is a vector of \(p\) elements with all entries 1. This last term is a quadratic form of the correlation matrix that can be computed efficiently using the SVD and shrinkage parameter from eclairs(). Given the value of \(g\), the average is computed by subtracting the diagonal values and dividing by the number of off-diagonal values: \((g - p) / (p(p-1))\).

averageCorrSq(): The average squared correlation is computed using only the eigen-values. Surprisingly, this is a function of the variance of the eigen-values. The is reviewed by Watanabe (2022) and Durand and Le Roux (2017). Letting \(\lambda_i\) be the \(i^{th}\) sample or shrunk eigen-value, and \(\tilde{\lambda}\) be the mean eigen-value, then \(\sum_i (\lambda_i - \tilde{\lambda})^2 / p(p-1)\tilde{\lambda}^2\).

sumInverseCorr(): The 'effective number of independent features' is computed by summing the entires of the inverse covariance matrix. This has the form \(\sum_{i,j} C^{-1}_{i,j} = 1^T C^{-1} 1\). This last term is a quadratic form of the correlation matrix that can be computed efficiently using the SVD and shrinkage parameter from eclairs() as described above.

effVariance(): Compute a metric of the amount of variation represented by a correlation (or covariance) matrix that is comparable across matrices of difference sizes. Proposed by Peña and Rodriguez (2003), the 'effective variance' is \(|C|^\frac{1}{p}\) where \(C\) is a correlation (or covariance matrix) between \(p\) variables. The effective variance is the mean of the log eigen-values.

References

Durand, J. L., & Le Roux, B. (2017). Linkage index of variables and its relationship with variance of eigenvalues in PCA and MCA. Statistica Applicata-Italian Journal of Applied Statistics, (2-3), 123-135.

Peña, D., & Rodriguez, J. (2003). Descriptive measures of multivariate scatter and linear dependence. Journal of Multivariate Analysis, 85(2), 361-374.

Watanabe, J. (2022). Statistics of eigenvalue dispersion indices: Quantifying the magnitude of phenotypic integration. Evolution, 76(1), 4-28.

Examples

library(Rfast)
n <- 200 # number of samples
p <- 800 # number of features

# create correlation matrix
Sigma <- matrix(.2, p, p)
diag(Sigma) <- 1

# draw data from correlation matrix Sigma
Y <- rmvnorm(n, rep(0, p), sigma = Sigma, seed = 1)
rownames(Y) <- paste0("sample_", seq(n))
colnames(Y) <- paste0("gene_", seq(p))

# eclairs decomposition
ecl <- eclairs(Y, compute = "cor")

# Average correlation value
averageCorr(ecl)
#> [1] 0.03586329

# Average squared correlation value
averageCorrSq(ecl)
#> [1] 0.001437541

# Sum elements in inverse correlation matrix
# Gives the effective number of independent features
sumInverseCorr(ecl)
#> [1] 39.78366

# Effective variance
effVariance(ecl)
#> [1] 0.9336296