Fit regression model y ~ design + X_features[,j] for each feature j
Usage
# S4 method for class 'ANY,ANY,GenomicDataStream'
glmFitFeatures(
y,
design,
data,
family,
weights,
offset,
detail = 1,
doCoxReid = length(y) < 100,
shareTheta = FALSE,
fastApprox = FALSE,
nthreads = 1,
epsilon = 1e-08,
maxit = 25,
epsilon_nb = 1e-04,
maxit_nb = 5,
lambda = 0,
...
)Arguments
- y
response vector
- design
design matrix shared across all models
- data
matrixorGenomicDataStreamwith additional features to be fit one at a time- family
a description of the error distribution and link function to be used in the model, just like for
glm(). Also supports negative binomial as string"nb:theta", see details below- weights
vector of sample-level weights
- offset
vector of sample-level offset values
- detail
level of model detail returned, with LEAST = 0, LOW = 1, MEDIUM = 2, HIGH = 3, MOST = 4. LEAST (beta), LOW (beta, se, dispersion, rdf), MEDIUM (vcov), HIGH (pearson residuals), MOST (hatvalues)
- doCoxReid
use Cox-Reid adjustment when estimating overdispersion for negative binomial models. Default TRUE for less than 100 samples
estimate theta from design matrix, and share across all features instead of re-estimating for each feature
- fastApprox
default false. if true, use pre-projection on the working response from an initial regression fit on only the design. Under the null for data, this is a very good approximation and _much_ faster
- nthreads
number of threads. Each model is fit in serial, analysis is parallelized across features
- epsilon
tolerance for GLM IRLS
- maxit
max iterations for GLM IRLS
- epsilon_nb
tolerance for negative binomial
- maxit_nb
max iterations for negative binomial
- lambda
ridge shrinkage parameter
- ...
other args
Value
List of parameter estimates with entries coef, se, dispersion, rdf and other depending on detail
Details
Generalized linear models can be fit with family like in glm() using gaussian(), poisson(), binomial(), binomial("probit"), quasibinomial(), quasipoisson(), negative.binomial(theta), "nb", "nb:theta". Or array of entries of form "nb:theta", where theta is the parameter for the negative binomial distribution
Examples
library(GenomicDataStream)
# create response, design and weights
y <- rnorm(60)
names(y) <- paste0("I", seq(60))
info <- data.frame(Age = rpois(60, 40))
rownames(info) <- names(y)
design <- model.matrix(~ Age, info)
w <- rep(1, 60)
# VCF file
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")
# Read data into R
# then run glmFitFeatures()
gds <- GenomicDataStream(file, "DS", initialize = TRUE)
dat <- getNextChunk(gds)
res1 <- glmFitFeatures(y, design, dat$X, family="gaussian", w)
# Data stays at C++ level
# then run glmFitFeatures()
gds <- GenomicDataStream(file, "DS")
res2 <- glmFitFeatures(y, design, gds, family="gaussian", w)