Fast Canonical correlation analysis that is scalable to high dimensional data. Uses covariance shrinkage and algorithmic speed ups to be linear time in p when p > n.
Details
summary statistics of CCA
Results from standard CCA are based on the SVD of \(\Sigma_{xx}^{-\frac{1}{2}} \Sigma_{xy} \Sigma_{yy}^{-\frac{1}{2}}\).
Uses eclairs()
and empirical Bayes covariance regularization, and applies speed up of RCCA (Tuzhilina, et al. 2023) to perform CCA on n PCs and instead of p features. Memory usage is \(\mathcal{O}(np)\) instead of \(\mathcal{O}(p^2)\). Computation is \(\mathcal{O}(n^2p)\) instead of \(\mathcal{O}(p^3)\) or \(\mathcal{O}(np^2)\)
References
Tuzhilina, E., Tozzi, L., & Hastie, T. (2023). Canonical correlation analysis in high dimensions with structured regularization. Statistical modelling, 23(3), 203-227.
Examples
pop <- LifeCycleSavings[, 2:3]
oec <- LifeCycleSavings[, -(2:3)]
decorrelate:::fastcca(pop, oec)
#> Fast regularized canonical correlation analysis
#>
#> Original data rows: 50
#> Original data cols: 2, 3
#> Num components: 2
#> Cor: -0.825 0.365 ...
#> rho.mod: 0.821 0.338 ...
#> Cramer's V: 0.821
#> lambda.x: 0.0123
#> lambda.y: 0.665
#