Skip to contents

Canonical correlation analysis that is scalable to high dimensional data. Uses covariance shrinkage and algorithmic speed ups to be linear time in p when p > n.

Usage

cca(X, Y, k = min(dim(X), dim(Y)), lambda.x = NULL, lambda.y = NULL)

Arguments

X

first matrix (n x p1)

Y

first matrix (n x p2)

k

number of canonical components to return

lambda.x

optional shrinkage parameter for estimating covariance of X. If NULL, estimate from data.

lambda.y

optional shrinkage parameter for estimating covariance of Y. If NULL, estimate from data.

Value

statistics summarizing CCA

Details

Results from standard CCA are based on the SVD of \(\Sigma_{xx}^{-\frac{1}{2}} \Sigma_{xy} \Sigma_{yy}^{-\frac{1}{2}}\).

Avoids computation of \(\Sigma_{xx}^{-\frac{1}{2}}\) by using eclairs. Avoids cov(X,Y) by framing this as a matrix product that can be distributed. Uses low rank SVD. Other regularized CCA adds lambda to covariance like Ridge. Here it is a mixture

Examples

pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]

decorrelate:::cca(pop, oec)
#>        Fast regularized canonical correlation analysis
#> 
#>   Original data rows: 50 
#>   Original data cols: 2, 3
#>   Num components:     2 
#>   Cor:                0.741 -0.34 ...
#>   rho.mod:            13579.5 1.047 ...
#>   Cramer's V:         13579.5 
#>   lambda.x:           0.0123 
#>   lambda.y:           0.665 
#