Skip to contents

Impute many z-statistics given observed z-statistics and reference panel

Usage

run_imputez(
  df,
  gds,
  window,
  flankWidth,
  method = c("decorrelate", "Ledoit-Wolf", "OAS", "Touloumis", "Schafer-Strimmer"),
  lambda = NULL,
  quiet = FALSE,
  ...
)

Arguments

df

data.frame with columns ID, z, GWAS_A1, GWAS_A2, CHROM, POS A1, A2.

gds

GenomicDataStream of reference panel

window

size of window in bp

flankWidth

additional window added to region

method

method used to estimate shrinkage parameter lambda. default is "decorrelate"

lambda

(default: NULL) value used to shrink correlation matrix. Only used if method is "decorrelate"

quiet

suppress messages

...

additional arguments passed to impute_region()

Value

tibble storing imputed results:

ID

variant identifier

z.stat

imputed z-statistic

sigSq

variance of imputed z-statistic

r2.pred

metric of accuracy of the imputed z-statistic based on its variance

lambda

shrinkage parameter

maf

minor allele frequency in reference panel

nVariants

number of variants used in imputation

Details

Implements method by Pasaniuc, et al. (2014).

References

Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., ... & Price, A. L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics, 30(20), 2906-2914.

See also

Examples

library(GenomicDataStream)
library(mvtnorm)
library(dplyr)

# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")

# initialize data stream
gds <- GenomicDataStream(file, "DS", initialize=TRUE)

# read genotype data from reference
dat <- getNextChunk(gds)

# simulate z-statistics with correlation structure
# from the LD of the reference panel
set.seed(1)
z <- c(rmvnorm(1, rep(0, 10), cor(dat$X)))

# Combine z-statistics with variant ID, position, etc
df <- dat$info %>%
    mutate(z = z, GWAS_A1 = A1, GWAS_A2 = A2) %>%
    rename(REF_A1 = A1, REF_A2 = A2)

# Given observed z-statistics and 
# GenomicDataStream of reference panel,
# Impute z-statistics from variants missing z-statistics.
# Here drop variant 2, and then impute its z-statistic
res <- run_imputez(df[-2,], gds, 10000, 1000)

# Results of imputed z-statistics
res
#> # A tibble: 1 × 9
#>   ID          A1    A2         z.stat        se  r2.pred lambda    maf nVariants
#>   <chr>       <chr> <chr>       <dbl>     <dbl>    <dbl>  <dbl>  <dbl>     <int>
#> 1 1:11000:T:C T     C     -0.00000550 0.0000291 8.45e-10  1.000 0.0159         9