Canonical correlation analysis that is scalable to high dimensional data. Uses covariance shrinkage and algorithmic speed ups to be linear time in p when p > n.
Details
Results from standard CCA are based on the SVD of \(\Sigma_{xx}^{-\frac{1}{2}} \Sigma_{xy} \Sigma_{yy}^{-\frac{1}{2}}\).
Avoids computation of \(\Sigma_{xx}^{-\frac{1}{2}}\) by using eclairs. Avoids cov(X,Y) by framing this as a matrix product that can be distributed. Uses low rank SVD. Other regularized CCA adds lambda to covariance like Ridge. Here it is a mixture
Examples
pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]
decorrelate:::cca(pop, oec)
#> Fast regularized canonical correlation analysis
#>
#> Original data rows: 50
#> Original data cols: 2, 3
#> Num components: 2
#> Cor: 0.741 -0.34 ...
#> rho.mod: 13579.5 1.047 ...
#> Cramer's V: 13579.5
#> lambda.x: 0.0123
#> lambda.y: 0.665
#