Impute many z-statistics given observed z-statistics and reference panel
Usage
run_imputez(
df,
gds,
window,
flankWidth,
method = c("decorrelate", "Ledoit-Wolf", "OAS", "Touloumis", "Schafer-Strimmer"),
lambda = NULL,
quiet = FALSE,
...
)
Arguments
- df
data.frame
with columnsID
,z
,GWAS_A1
,GWAS_A2
,CHROM
,POS
A1
,A2
.- gds
GenomicDataStream
of reference panel- window
size of window in bp
- flankWidth
additional window added to
region
- method
method used to estimate shrinkage parameter lambda. default is
"decorrelate"
- lambda
(default: NULL) value used to shrink correlation matrix. Only used if method is
"decorrelate"
- quiet
suppress messages
- ...
additional arguments passed to
impute_region()
Value
tibble
storing imputed results:
- ID
variant identifier
- z.stat
imputed z-statistic
- sigSq
variance of imputed z-statistic
- r2.pred
metric of accuracy of the imputed z-statistic based on its variance
- lambda
shrinkage parameter
- maf
minor allele frequency in reference panel
- nVariants
number of variants used in imputation
References
Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., ... & Price, A. L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics, 30(20), 2906-2914.
Examples
library(GenomicDataStream)
library(mvtnorm)
library(dplyr)
# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# initialize data stream
gds <- GenomicDataStream(file, "DS", initialize=TRUE)
# read genotype data from reference
dat <- getNextChunk(gds)
# simulate z-statistics with correlation structure
# from the LD of the reference panel
set.seed(1)
z <- c(rmvnorm(1, rep(0, 10), cor(dat$X)))
# Combine z-statistics with variant ID, position, etc
df <- dat$info %>%
mutate(z = z, GWAS_A1 = A1, GWAS_A2 = A2) %>%
rename(REF_A1 = A1, REF_A2 = A2)
# Given observed z-statistics and
# GenomicDataStream of reference panel,
# Impute z-statistics from variants missing z-statistics.
# Here drop variant 2, and then impute its z-statistic
res <- run_imputez(df[-2,], gds, 10000, 1000)
# Results of imputed z-statistics
res
#> # A tibble: 1 × 9
#> ID A1 A2 z.stat se r2.pred lambda maf nVariants
#> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1:11000:T:C T C -0.00000550 0.0000291 8.45e-10 1.000 0.0159 9