Fast canonical correlation analysis — fastcca • decorrelate

Fast Canonical correlation analysis that is scalable to high dimensional data. Uses covariance shrinkage and algorithmic speed ups to be linear time in p when p > n.

Usage

fastcca(X, Y, k = min(dim(X), dim(Y)), lambda.x = NULL, lambda.y = NULL)

Arguments

X: first matrix (n x p1)
Y: first matrix (n x p2)
k: number of canonical components to return
lambda.x: optional shrinkage parameter for estimating covariance of X. If NULL, estimate from data.
lambda.y: optional shrinkage parameter for estimating covariance of Y. If NULL, estimate from data.

Value

fastcca object

Details

summary statistics of CCA

Results from standard CCA are based on the SVD of \(\Sigma_{xx}^{-\frac{1}{2}} \Sigma_{xy} \Sigma_{yy}^{-\frac{1}{2}}\).

Uses eclairs() and empirical Bayes covariance regularization, and applies speed up of RCCA (Tuzhilina, et al. 2023) to perform CCA on n PCs and instead of p features. Memory usage is \(\mathcal{O}(np)\) instead of \(\mathcal{O}(p^2)\). Computation is \(\mathcal{O}(n^2p)\) instead of \(\mathcal{O}(p^3)\) or \(\mathcal{O}(np^2)\)

References

Tuzhilina, E., Tozzi, L., & Hastie, T. (2023). Canonical correlation analysis in high dimensions with structured regularization. Statistical modelling, 23(3), 203-227.

Examples

pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]

decorrelate:::fastcca(pop, oec)
#>        Fast regularized canonical correlation analysis
#> 
#>   Original data rows: 50 
#>   Original data cols: 2, 3
#>   Num components:     2 
#>   Cor:                -0.825 0.365 ...
#>   rho.mod:            0.821 0.338 ...
#>   Cramer's V:         0.821 
#>   lambda.x:           0.0123 
#>   lambda.y:           0.665 
#