Canonical correlation analysis — cca • decorrelate

Canonical correlation analysis that is scalable to high dimensional data. Uses covariance shrinkage and algorithmic speed ups to be linear time in p when p > n.

Usage

cca(X, Y, k = min(dim(X), dim(Y)), lambda.x = NULL, lambda.y = NULL)

Arguments

X: first matrix (n x p1)
Y: first matrix (n x p2)
k: number of canonical components to return
lambda.x: optional shrinkage parameter for estimating covariance of X. If NULL, estimate from data.
lambda.y: optional shrinkage parameter for estimating covariance of Y. If NULL, estimate from data.

Value

statistics summarizing CCA

Details

Results from standard CCA are based on the SVD of \(\Sigma_{xx}^{-\frac{1}{2}} \Sigma_{xy} \Sigma_{yy}^{-\frac{1}{2}}\).

Avoids computation of \(\Sigma_{xx}^{-\frac{1}{2}}\) by using eclairs. Avoids cov(X,Y) by framing this as a matrix product that can be distributed. Uses low rank SVD. Other regularized CCA adds lambda to covariance like Ridge. Here it is a mixture

Examples

pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]

decorrelate:::cca(pop, oec)
#>        Fast regularized canonical correlation analysis
#> 
#>   Original data rows: 50 
#>   Original data cols: 2, 3
#>   Num components:     2 
#>   Cor:                0.741 -0.34 ...
#>   rho.mod:            13579.5 1.047 ...
#>   Cramer's V:         13579.5 
#>   lambda.x:           0.0123 
#>   lambda.y:           0.665 
#