Impute z-score based correlations

Impute z-score from a missing test based on z-scores for other tests, and the genotype matrix

Usage

imputezDecorr(z, X, i, k = min(nrow(X), ncol(X) - length(i)), lambda = NULL)

Arguments

z: vector of observed z-scores
X: matrix genotype data from a reference panel
i: index of z-scores to impute
k: rank of SVD
lambda: (default: NULL) value used to shrink correlation matrix

Value

data.frame storing:

ID: variant identifier
z.stat: imputed z-statistic
se: standard error of imputed z-statistic
r2.pred: metric of accuracy of the imputed z-statistic based on its variance
lambda: shrinkage parameter
averageCorrSq: average correlation squared in X

Details

Uses implicit covariance and emprical Bayes shrinkage of the eigen-values to accelerate computations. imputez() is cubic in the number of features, p, but imputezDecorr() is the minimum of O(n p^2) and O(n^2p). For large number of features, this is can be a dramatic speedup.

References

Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., ... & Price, A. L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics, 30(20), 2906-2914.

Examples

library(GenomicDataStream)
library(mvtnorm)

# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")

# initialize data stream
gds <- GenomicDataStream(file, "DS", initialize=TRUE)

# read genotype data from reference
dat <- getNextChunk(gds)

# simulate z-statistics with correlation structure
# from the LD of the reference panel
C <- cor(dat$X)
set.seed(1)
z <- c(rmvnorm(1, rep(0, 10), C))
names(z) <- colnames(dat$X)

# Impute z-statistics for variants 2 and 3 
# using the other variants and observed z-statistics
# from the reference panel
# Use dat$X directly instead of creating cor(dat$X)
imputezDecorr(z, dat$X, 2:3)
#>            ID        z.stat           se      r2.pred lambda averageCorrSq
#> 1 1:11000:T:C -1.689124e-05 2.890456e-05 8.354737e-10 0.9999  2.012952e-10
#> 2 1:12000:T:C -2.216518e-05 3.287110e-05 1.080509e-09 0.9999  2.012952e-10