Impute z-score based correlations — imputez • imputez

Impute z-score from a missing test based on z-scores for other tests, and the correlation matrix between z-scores

Usage

imputez(z, Sigma, i, lambda = 0.1, useginv = FALSE)

Arguments

z: vector of observed z-scores
Sigma: matrix of correlation between z-scores
i: index of z-scores to impute
lambda: value used to shrink correlation matrix
useginv: if TRUE use pseudoinverse

Value

data.frame storing:

ID: variant identifier
z.stat: imputed z-statistic
se: standard error of imputed z-statistic
r2.pred: metric of accuracy of the imputed z-statistic based on its variance
lambda: shrinkage parameter

Details

Implements method by Pasaniuc, et al. (2014).

References

Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., ... & Price, A. L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics, 30(20), 2906-2914.

Examples

library(GenomicDataStream)
library(mvtnorm)

# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")

# initialize data stream
gds <- GenomicDataStream(file, "DS", initialize=TRUE)

# read genotype data from reference
dat <- getNextChunk(gds)

# simulate z-statistics with correlation structure
# from the LD of the reference panel
C <- cor(dat$X)
set.seed(1)
z <- c(rmvnorm(1, rep(0, 10), C))
names(z) <- colnames(dat$X)

# Impute z-statistics for variants 2 and 3 
# using the other variants and observed z-statistics
# from the reference panel
imputez(z, C, 2:3)
#>            ID     z.stat        se    r2.pred lambda
#> 1 1:11000:T:C -0.2455986 0.2965194 0.08792378    0.1
#> 2 1:12000:T:C -0.2762561 0.3640927 0.13256353    0.1