Impute z-score from a missing test based on z-scores for other tests, and the genotype matrix
Value
data.frame
storing:
- ID
variant identifier
- z.stat
imputed z-statistic
- se
standard error of imputed z-statistic
- r2.pred
metric of accuracy of the imputed z-statistic based on its variance
- lambda
shrinkage parameter
- averageCorrSq
average correlation squared in X
Details
Uses implicit covariance and emprical Bayes shrinkage of the eigen-values to accelerate computations. imputez()
is cubic in the number of features, p, but imputezDecorr()
is the minimum of O(n p^2) and O(n^2p). For large number of features, this is can be a dramatic speedup.
References
Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., ... & Price, A. L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics, 30(20), 2906-2914.
Examples
library(GenomicDataStream)
library(mvtnorm)
# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# initialize data stream
gds <- GenomicDataStream(file, "DS", initialize=TRUE)
# read genotype data from reference
dat <- getNextChunk(gds)
# simulate z-statistics with correlation structure
# from the LD of the reference panel
C <- cor(dat$X)
set.seed(1)
z <- c(rmvnorm(1, rep(0, 10), C))
names(z) <- colnames(dat$X)
# Impute z-statistics for variants 2 and 3
# using the other variants and observed z-statistics
# from the reference panel
# Use dat$X directly instead of creating cor(dat$X)
imputezDecorr(z, dat$X, 2:3)
#> ID z.stat se r2.pred lambda averageCorrSq
#> 1 1:11000:T:C -1.689124e-05 2.890456e-05 8.354737e-10 0.9999 2.012952e-10
#> 2 1:12000:T:C -2.216518e-05 3.287110e-05 1.080509e-09 0.9999 2.012952e-10