Impute z-score from a missing test based on z-scores for other tests, and the correlation matrix between z-scores
Value
data.frame
storing:
- ID
variant identifier
- z.stat
imputed z-statistic
- se
standard error of imputed z-statistic
- r2.pred
metric of accuracy of the imputed z-statistic based on its variance
- lambda
shrinkage parameter
References
Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., ... & Price, A. L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics, 30(20), 2906-2914.
Examples
library(GenomicDataStream)
library(mvtnorm)
# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# initialize data stream
gds <- GenomicDataStream(file, "DS", initialize=TRUE)
# read genotype data from reference
dat <- getNextChunk(gds)
# simulate z-statistics with correlation structure
# from the LD of the reference panel
C <- cor(dat$X)
set.seed(1)
z <- c(rmvnorm(1, rep(0, 10), C))
names(z) <- colnames(dat$X)
# Impute z-statistics for variants 2 and 3
# using the other variants and observed z-statistics
# from the reference panel
imputez(z, C, 2:3)
#> ID z.stat se r2.pred lambda
#> 1 1:11000:T:C -0.2455986 0.2965194 0.08792378 0.1
#> 2 1:12000:T:C -0.2762561 0.3640927 0.13256353 0.1