Fit LMM For Many Features — lmmFitFeatures,ANY,ANY,GenomicDataStream-method • GenomicDataStreamRegression

Fit regression model y ~ design + X_features[,j] for each feature j

Usage

# S4 method for class 'ANY,ANY,GenomicDataStream'
lmmFitFeatures(
  y,
  design,
  data,
  U,
  s,
  weights = NULL,
  dcmpMethod = c("general", "categorical"),
  REML = TRUE,
  delta = NULL,
  delta.range = c(-10, 10),
  tol = 1e-06,
  detail = 1,
  lambda = 0,
  nthreads = 1,
  verbose = TRUE,
  ...
)

Arguments

y: response vector
design: design matrix shared across all models
data: feature matrix with model j using feature j
U: eigen-vectors of random effect
s: eigen-values of random effect
weights: sample-level weights
dcmpMethod: use a "general" method (default) for SVD of Z. If Z is a categorical design matrix, used faster method "categorical"
REML: logical scalar - Should the estimates be chosen to optimize the REML criterion vs ML?
delta: if NULL estimate delta, if value is given use this fixed value
delta.range: min and max values (in log space), of the search space for delta to fit the random effect
tol: convergence criterion for the 1D search of the delta space
detail: level of model detail returned, with LEAST = 0, LOW = 1, MEDIUM = 2, HIGH = 3, MOST = 4, MAX = 5. LEAST (beta), LOW (beta, se, sigSq, rdf), MEDIUM (vcov), HIGH (residuals), MOST (hatvalues), MAX (deviance residuals)
lambda: ridge shrinkage parameter
nthreads: number of threads. Each model is fit in serial, analysis is parallelized across features
verbose: show progress
...: other args

Value

List of parameter estimates with entries coef, se, dispersion, rdf and other depending on detail

Examples

library(GenomicDataStream)

# create response, design and weights
y <- rnorm(60)
names(y) <- paste0("I", seq(60))
info <- data.frame(Age = rpois(60, 40))
rownames(info) <- names(y)

design <- model.matrix(~ Age, info)
w <- rep(1, 60)

# VCF file
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")

# Read data into R
# then run lmFitFeatures()
gds <- GenomicDataStream(file, "DS", initialize = TRUE)
dat <- getNextChunk(gds)

res1 <- lmFitFeatures(y, design, dat$X, w)

res1
#> 		 lmFitFeatures 
#> 
#> coefs(1): x
#> features(10): 1:10000:C:A, 1:11000:T:C, ..., 1:18000:C:G, 1:19000:T:G
#> family: gaussian/identity 
#> Estimated: se, dispersion, rdf 
#> 

# Data stays at C++ level
# then run lmFitFeatures()
gds <- GenomicDataStream(file, "DS")

res2 <- lmFitFeatures(y, design, gds, w)
#> preprojection: 1

res2
#> 		 lmFitFeatures 
#> 
#> coefs(1): x
#> features(10): 1:10000:C:A, 1:11000:T:C, ..., 1:18000:C:G, 1:19000:T:G
#> family: gaussian/identity 
#> Estimated: se, dispersion, rdf 
#>