Imputing z-statistics
An example analysis
Developed by Gabriel Hoffman
Run on 2025-07-28 10:48:40.287666
Source:vignettes/imputez.Rmd
imputez.Rmd
The imputez
package uses GenomicDataStream
to read genetic data from a reference panel. Here, simulate z-statstics
and use run_imputez()
to perform the imputation.
Code
library(imputez)
library(GenomicDataStream)
library(mvtnorm)
library(dplyr)
# VCF file for reference
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# initialize data stream
gds = GenomicDataStream(file, "DS", initialize=TRUE)
# read genotype data from reference
dat = getNextChunk(gds)
# simulate z-statistics with correlation structure
# from the LD of the reference panel
z = c(rmvnorm(1, rep(0, 10), cor(dat$X)))
# Combine z-statistics with variant ID, position, etc
df = dat$info %>%
mutate(z = z, GWAS_A1 = A1, GWAS_A2 = A2) %>%
rename(REF_A1 = A1, REF_A2 = A2)
# Given observed z-statistics and
# GenomicDataStream of reference panel,
# Impute z-statistics from variants missing z-statistics.
# Here drop variant 2, and then impute its z-statistic
# Defaults to run imputezDecorr / decorrelate in the backend
res = run_imputez(df[-2,], gds, 10000, 1000)
# Results of imputed z-statistics
res
## # A tibble: 1 × 9
## ID A1 A2 z.stat se r2.pred lambda maf nVariants
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 1:11000:T:C T C 0.0000602 0.0000291 8.45e-10 1.00 0.0159 9
Session Info
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin23.6.0
## Running under: macOS Sonoma 14.7.1
##
## Matrix products: default
## BLAS: /Users/gabrielhoffman/prog/R-4.4.2/lib/libRblas.dylib
## LAPACK: /opt/homebrew/Cellar/r/4.4.3_1/lib/R/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] dplyr_1.1.4 mvtnorm_1.3-3 GenomicDataStream_0.0.17
## [4] Rdpack_2.6.2 progress_1.2.3 Rcpp_1.0.14
## [7] imputez_1.2.2 BiocStyle_2.34.0
##
## loaded via a namespace (and not attached):
## [1] xfun_0.51 bslib_0.9.0 htmlwidgets_1.6.4
## [4] lattice_0.22-6 vctrs_0.6.5 tools_4.4.2
## [7] generics_0.1.3 parallel_4.4.2 rgl_1.3.17
## [10] tibble_3.2.1 pkgconfig_2.0.3 CovTools_0.5.4
## [13] Matrix_1.7-2 desc_1.4.3 RcppParallel_5.1.10.9000
## [16] scatterplot3d_0.3-44 lifecycle_1.0.4 compiler_4.4.2
## [19] textshaping_1.0.0 minpack.lm_1.2-4 codetools_0.2-20
## [22] decorrelate_0.1.4 htmltools_0.5.8.1 sass_0.4.9
## [25] yaml_2.3.10 pracma_2.4.4 pkgdown_2.0.9
## [28] pillar_1.10.1 crayon_1.5.3 jquerylib_0.1.4
## [31] MASS_7.3-65 cachem_1.1.0 iterators_1.0.14
## [34] foreach_1.5.2 tidyselect_1.2.1 digest_0.6.37
## [37] purrr_1.0.4 bookdown_0.42 ShrinkCovMat_1.4.0
## [40] fastmap_1.2.0 grid_4.4.2 expm_1.0-0
## [43] cli_3.6.4 magrittr_2.0.3 base64enc_0.1-3
## [46] Rfast_2.1.5 corpcor_1.6.10 prettyunits_1.2.0
## [49] rmarkdown_2.29 shapes_1.2.7 igraph_2.1.4
## [52] ragg_1.3.3 hms_1.1.3 CholWishart_1.1.4
## [55] memoise_2.0.1 evaluate_1.0.3 knitr_1.49
## [58] rbibutils_2.3 SHT_0.1.9 doParallel_1.0.17
## [61] irlba_2.3.5.1 rlang_1.1.5 zigg_0.0.2
## [64] glue_1.8.0 flare_1.7.0.2 geigen_2.3
## [67] BiocManager_1.30.25 jsonlite_1.9.1 R6_2.6.1
## [70] systemfonts_1.2.1 fs_1.6.5