GenomicDataStreamRegression
Regression analyses on streaming chunks of data
Developed by Gabriel Hoffman
Run on 2026-05-01 10:55:37
Source:vignettes/GenomicDataStreamRegression.Rmd
GenomicDataStreamRegression.RmdGenomicDataStream designed to chunks of
features rather than chunks of samples. Features are
stored as columns in the matrix returned by R/C++, independent of
the underlying data storage format.
Usage
GenomicDataStreamRegression implements regression models
(linear and GLMs) that stream chucks of features using the
GenomicDataStream interface. In general, variants from
genetic data are used as covariates in lmFitFeatures(), and
genes from single cell data are used as responses in
lmFitResponses().
Example code with R
Read genotype data into R
library(GenomicDataStream)
library(GenomicDataStreamRegression)
# VCF file
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# initialize
gds <- GenomicDataStream(file, "DS", chunkSize=5, initialize=TRUE)
n <- 60
y <- rnorm(n)
design <- matrix(1, n, 1)
rownames(design) <- paste0("I", seq(n))
# loop until break
while( 1 ){
# get data chunk
# data$X matrix with features as columns
# data$info information about each feature as rows
dat <- getNextChunk(gds)
# check if end of stream
if( atEndOfStream(gds) ) break
# do analysis on this chunk of data
fit <- lmFitFeatures(y, design, dat$X)
}Use R to run analysis at C++ level
library(GenomicDataStream)
library(GenomicDataStreamRegression)
# VCF file
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# create object, but don't read yet
# Read DS field storing dosage
gds <- GenomicDataStream(file, "DS", chunkSize=5)
n <- 60
y <- rnorm(n)
design <- matrix(1, n, 1)
rownames(design) <- paste0("I", seq(n))
# regression of y ~ design + X[,j]
# where X[,j] is the jth variant in the GenomicDataStream
# data in GenomicDataStream is only accessed at C++ level
fit <- lmFitFeatures(y, design, gds)## preprojection: 1
Session info
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin23.6.0
## Running under: macOS Sonoma 14.7.1
##
## Matrix products: default
## BLAS/LAPACK: /opt/homebrew/Cellar/openblas/0.3.33/lib/libopenblasp-r0.3.33.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] GenomicDataStreamRegression_0.99.0 GenomicDataStream_0.99.0
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 dplyr_1.2.1 farver_2.1.2
## [4] S7_0.2.2 fastmap_1.2.0 SingleCellExperiment_1.32.0
## [7] digest_0.6.39 lifecycle_1.0.5 statmod_1.5.1
## [10] magrittr_2.0.5 compiler_4.5.1 progress_1.2.3
## [13] rlang_1.2.0 sass_0.4.10 tools_4.5.1
## [16] yaml_2.3.12 knitr_1.51 prettyunits_1.2.0
## [19] S4Arrays_1.10.1 htmlwidgets_1.6.4 reticulate_1.46.0
## [22] DelayedArray_0.36.1 RColorBrewer_1.1-3 abind_1.4-8
## [25] HDF5Array_1.38.0 withr_3.0.2 purrr_1.2.2
## [28] BiocGenerics_0.56.0 desc_1.4.3 grid_4.5.1
## [31] stats4_4.5.1 beachmat_2.26.0 Rhdf5lib_1.32.0
## [34] ggplot2_4.0.3 scales_1.4.0 MASS_7.3-65
## [37] dichromat_2.0-0.1 SummarizedExperiment_1.40.0 cli_3.6.6
## [40] crayon_1.5.3 rmarkdown_2.31 reformulas_0.4.4
## [43] ragg_1.5.2 generics_0.1.4 otel_0.2.0
## [46] RcppParallel_5.1.11-2 fastglmm_0.4.6 minqa_1.2.8
## [49] cachem_1.1.0 rhdf5_2.54.1 stringr_1.6.0
## [52] splines_4.5.1 parallel_4.5.1 XVector_0.50.0
## [55] matrixStats_1.5.0 vctrs_0.7.3 boot_1.3-32
## [58] Matrix_1.7-5 jsonlite_2.0.0 carData_3.0-6
## [61] car_3.1-5 hms_1.1.4 IRanges_2.44.0
## [64] S4Vectors_0.48.1 pbmcapply_1.5.1 Formula_1.2-5
## [67] systemfonts_1.3.2 h5mread_1.2.1 limma_3.66.0
## [70] beachmat.hdf5_1.8.0 jquerylib_0.1.4 glue_1.8.1
## [73] nloptr_2.2.1 pkgdown_2.2.0 codetools_0.2-20
## [76] stringi_1.8.7 gtable_0.3.6 GenomicRanges_1.62.1
## [79] lme4_2.0-1 tibble_3.3.1 pillar_1.11.1
## [82] BatchRegression_0.0.21 htmltools_0.5.9 Seqinfo_1.0.0
## [85] rhdf5filters_1.22.0 R6_2.6.1 Rdpack_2.6.6
## [88] textshaping_1.0.5 evaluate_1.0.5 lattice_0.22-9
## [91] Biobase_2.70.0 rbibutils_2.4.1 png_0.1-9
## [94] bslib_0.10.0 Rcpp_1.1.1-1.1 nlme_3.1-169
## [97] SparseArray_1.10.10 anndataR_1.1.3 xfun_0.57
## [100] fs_2.1.0 MatrixGenerics_1.22.0 pkgconfig_2.0.3
<>