Fit regression model y ~ design + X_features[,j] for each feature j
Usage
# S4 method for class 'ANY,ANY,GenomicDataStream'
lmmFitFeatures(
y,
design,
data,
U,
s,
weights = NULL,
dcmpMethod = c("general", "categorical"),
REML = TRUE,
delta = NULL,
delta.range = c(-10, 10),
tol = 1e-06,
detail = 1,
lambda = 0,
nthreads = 1,
verbose = TRUE,
...
)Arguments
- y
response vector
- design
design matrix shared across all models
- data
feature matrix with model j using feature j
- U
eigen-vectors of random effect
- s
eigen-values of random effect
- weights
sample-level weights
- dcmpMethod
use a
"general"method (default) for SVD ofZ. IfZis a categorical design matrix, used faster method"categorical"- REML
logical scalar - Should the estimates be chosen to optimize the REML criterion vs ML?
- delta
if
NULLestimate delta, if value is given use this fixed value- delta.range
min and max values (in log space), of the search space for delta to fit the random effect
- tol
convergence criterion for the 1D search of the delta space
- detail
level of model detail returned, with LEAST = 0, LOW = 1, MEDIUM = 2, HIGH = 3, MOST = 4, MAX = 5. LEAST (beta), LOW (beta, se, sigSq, rdf), MEDIUM (vcov), HIGH (residuals), MOST (hatvalues), MAX (deviance residuals)
- lambda
ridge shrinkage parameter
- nthreads
number of threads. Each model is fit in serial, analysis is parallelized across features
- verbose
show progress
- ...
other args
Value
List of parameter estimates with entries coef, se, dispersion, rdf and other depending on detail
Examples
library(GenomicDataStream)
# create response, design and weights
y <- rnorm(60)
names(y) <- paste0("I", seq(60))
info <- data.frame(Age = rpois(60, 40))
rownames(info) <- names(y)
design <- model.matrix(~ Age, info)
w <- rep(1, 60)
# VCF file
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# Read data into R
# then run lmFitFeatures()
gds <- GenomicDataStream(file, "DS", initialize = TRUE)
dat <- getNextChunk(gds)
res1 <- lmFitFeatures(y, design, dat$X, w)
res1
#> lmFitFeatures
#>
#> coefs(1): x
#> features(10): 1:10000:C:A, 1:11000:T:C, ..., 1:18000:C:G, 1:19000:T:G
#> family: gaussian/identity
#> Estimated: se, dispersion, rdf
#>
# Data stays at C++ level
# then run lmFitFeatures()
gds <- GenomicDataStream(file, "DS")
res2 <- lmFitFeatures(y, design, gds, w)
#> preprojection: 1
res2
#> lmFitFeatures
#>
#> coefs(1): x
#> features(10): 1:10000:C:A, 1:11000:T:C, ..., 1:18000:C:G, 1:19000:T:G
#> family: gaussian/identity
#> Estimated: se, dispersion, rdf
#>