Evaluate
Usage
run_imputability(
df,
gds,
window,
flankWidth,
ids,
method = c("decorrelate", "Ledoit-Wolf", "OAS", "Touloumis", "Schafer-Strimmer"),
lambda = NULL,
quiet = FALSE,
...
)
Arguments
- df
data.frame
with columnsID
,CHROM
,POS
- gds
GenomicDataStream
of reference panel- window
size of window in bp
- flankWidth
additional window added to
region
- ids
variant IDs to evaluate imputation r2 score
- method
method used to estimate shrinkage parameter lambda. default is
"decorrelate"
- lambda
(default: NULL) value used to shrink correlation matrix. Only used if method is
"decorrelate"
- quiet
suppress messages
- ...
additional arguments passed to
impute_region()
Value
tibble
storing imputed results:
- ID
variant identifier
- r2.pred
metric of accuracy of the imputed z-statistic based on its variance
- lambda
shrinkage parameter
- maf
minor allele frequency in reference panel
- nVariants
number of variants used in imputation
References
Pasaniuc, B., Zaitlen, N., Shi, H., Bhatia, G., Gusev, A., Pickrell, J., ... & Price, A. L. (2014). Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics, 30(20), 2906-2914.
Examples
library(GenomicDataStream)
library(tidyverse)
# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# initialize data stream
gds <- GenomicDataStream(file, "DS", initialize=TRUE)
# read file of variant locations
file <- system.file("extdata", "test.map", package = "GenomicDataStream")
df = read_tsv(file, show_col_types=FALSE)
# evaluate imputation r2 for these variants
ids = df$ID[c(1,9)]
# Given GenomicDataStream of reference panel,
# compute imputation r2 for variants in ids
run_imputability(df, gds, 10000, 1000, ids)
#> # A tibble: 2 × 5
#> # Groups: ID [2]
#> ID r2.pred lambda maf nVariants
#> <chr> <dbl> <dbl> <dbl> <int>
#> 1 1:10000:C:A 0.00000000238 1.000 0.347 8
#> 2 1:18000:C:G 0.00000000137 1.000 0.265 8