Given the eclairs decomp of C, compute the eclairs decomp of C^2
Arguments
- ecl
estimate of correlation matrix from
eclairs()
storing \(U\), \(d_1^2\), \(\lambda\) and \(\nu\)- rank1
use the first 'rank' singular vectors from the SVD. Using increasing rank1 will increase the accuracy of the estimation. But note that the computationaly complexity is O(P choose(rank, 2)), where P is the number of features in the dataset
- rank2
rank of
eclairs()
decomposition returned- varianceFraction
fraction of variance to retain after truncating trailing eigen values
Details
Consider a data matrix \(X_{N x P}\) of \(P\) features and \(N\) samples where \(N << P\). Let the columns of X be scaled so that \(C_{P x P} = XX^T\). C is often too big to compute directly since it is O(P^2) and O(P^3) to invert. But we can compute the SVD of X in O(PN^2). The goal is to compute the SVD of the matrix C^2, given only the SVD of C in less than \(O(P^2)\) time. Here we compute this SVD of C^2 in \(O(PN^4)\) time, which is tractible for small N. Moreover, if we use an SVD X = UDV^T with of rank R, we can approximate the SVD of C^2 in O(PR^4) using only D and V
Examples
# Compute correlations directly and using eclairs decomp
n <- 600 # number of samples
p <- 100 # number of features
# create correlation matrix
Sigma <- autocorr.mat(p, .9)
# draw data from correlation matrix Sigma
Y <- Rfast::rmvnorm(n, rep(0, p), sigma = Sigma, seed = 1)
rownames(Y) <- paste0("sample_", seq(n))
colnames(Y) <- paste0("gene_", seq(p))
# correlation computed directly
C <- cor(Y)
# correlation from eclairs decomposition
ecl <- eclairs(Y, compute = "cor")
C.eclairs <- getCor(ecl, lambda = 0)
all.equal(C, C.eclairs)
#> [1] TRUE
# Correlation of Y^2
#-------------------
# exact quadratic way
C <- 2 * cor(Y)^2
# faster low rank
ecl2 <- eclairs_sq(ecl)
C.eclairs <- 2 * getCor(ecl2, lambda = 0)
all.equal(C.eclairs, C)
#> [1] TRUE