Skip to contents

Impute many z-statistics given observed z-statistics and reference panel

Usage

impute_region(
  df,
  gds,
  region,
  flankWidth,
  method = c("decorrelate", "Ledoit-Wolf", "OAS", "Touloumis", "Schafer-Strimmer"),
  lambda = NULL,
  ...
)

Arguments

df

data.frame with columns ID, z, GWAS_A1, GWAS_A2, CHROM, POS REF_A1, REF_A2.

gds

GenomicDataStream of reference panel

region

genomic region to impute

flankWidth

additional window added to region

method

method used to estimate shrinkage parameter lambda. default is "decorrelate"

lambda

(default: NULL) value used to shrink correlation matrix. Only used if method is "decorrelate"

...

additional arguments passed to imputez() and imputezDecorr()

Value

tibble storing imputed results:

ID

variant identifier

z.stat

imputed z-statistic

sigSq

variance of imputed z-statistic

r2.pred

metric of accuracy of the imputed z-statistic based on its variance

lambda

shrinkage parameter

maf

minor allele frequency in reference panel

nVariants

number of variants used in imputation

Examples

library(GenomicDataStream)
library(mvtnorm)
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union

# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")

# initialize data stream
gds <- GenomicDataStream(file, "DS", initialize=TRUE)

# read genotype data from reference
dat <- getNextChunk(gds)

# simulate z-statistics with correlation structure
# from the LD of the reference panel
set.seed(1)
z <- c(rmvnorm(1, rep(0, 10), cor(dat$X)))

# Combine z-statistics with variant ID, position, etc
df <- dat$info %>%
    mutate(z = z, GWAS_A1 = A1, GWAS_A2 = A2) %>%
    rename(REF_A1 = A1, REF_A2 = A2)

# Given observed z-statistics and 
# GenomicDataStream of reference panel,
# Impute z-statistics from variants missing z-statistics.
# Here drop variant 2, and then impute its z-statistic
# Impute variants in the given region
region <- "1:1000-100000"
impute_region(df[-2,], gds, region, 1000)
#> # A tibble: 1 × 9
#>   ID          A1    A2         z.stat        se  r2.pred lambda    maf nVariants
#>   <chr>       <chr> <chr>       <dbl>     <dbl>    <dbl>  <dbl>  <dbl>     <int>
#> 1 1:11000:T:C T     C     -0.00000550 0.0000291 8.45e-10  1.000 0.0159         9