Fit GLM For Many Features — glmFitFeatures-class • BatchRegression

Fit regression model y ~ design + X_features[,j] for each feature j

Usage

glmFitFeatures(
  y,
  design,
  data,
  family,
  weights,
  offset,
  detail = 1,
  doCoxReid = length(y) < 100,
  shareTheta = FALSE,
  fastApprox = FALSE,
  nthreads = 1,
  epsilon = 1e-08,
  maxit = 25,
  epsilon_nb = 1e-04,
  maxit_nb = 5,
  lambda = 0,
  ...
)

# S4 method for class 'ANY,ANY,matrix'
glmFitFeatures(
  y,
  design,
  data,
  family,
  weights,
  offset,
  detail = 1,
  doCoxReid = length(y) < 100,
  shareTheta = FALSE,
  fastApprox = FALSE,
  nthreads = 1,
  epsilon = 1e-08,
  maxit = 25,
  epsilon_nb = 1e-04,
  maxit_nb = 5,
  lambda = 0,
  ...
)

Arguments

y: response vector
design: design matrix shared across all models
data: feature matrix with model j using feature j
family: a description of the error distribution and link function to be used in the modelm just like for glm(). Also supports negative binomial as string "nb:theta", see details below
weights: vector of sample-level weights
offset: vector of sample-level offset values
detail: level of model detail returned, with LEAST = 0, LOW = 1, MEDIUM = 2, HIGH = 3, MOST = 4, MAX = 5. LEAST (beta), LOW (beta, se, sigSq, rdf), MEDIUM (vcov), HIGH (residuals), MOST (hatvalues), MAX (deviance residuals)
doCoxReid: use Cox-Reid adjustment when estimating overdispersion for negative binomial models. Default TRUE for less than 100 samples
shareTheta: estimate theta from design matrix, and share across all features instead of re-estimating for each feature
fastApprox: default false. if true, use pre-projection on the working response from an initial regression fit on only the design. Under the null for data, this is a very good approximation and _much_ faster
nthreads: number of threads. Each model is fit in serial, analysis is parallelized across features
epsilon: tolerance for GLM IRLS
maxit: max iterations for GLM IRLS
epsilon_nb: tolerance for negative binomial
maxit_nb: max iterations for negative binomial
lambda: ridge shrinkage parameter
...: other args

Value

List of parameter estimates with entries coef, se, dispersion, rdf and other depending on detail

Details

Generalized linear models can be fit with family like in glm() using gaussian(), poisson(), binomial(), binomial("probit"), quasibinomial(), quasipoisson(), negative.binomial(theta), "nb", "nb:theta". Or array of entries of form "nb:theta", where theta is the parameter for the negative binomial distribution

Examples

n <- 100 # number of samples
p <- 10 # number of features
nc <- 3 # number shared covariates
set.seed(1)
y <- rpois(n, 10)
X <- cbind(1, matrix(rnorm(n * p), n, p))
colnames(X) <- seq(ncol(X))
design <- matrix(rnorm(n * nc), n, nc)

# fit regressions with model j including X[,j]
fit <- glmFitFeatures(y, design, X, "poisson")

fit
#> 		 glmFitFeatures 
#> 
#> coefs(4): V1, V2, V3, x
#> features(11): 1, 2, ..., 10, 11
#> family: poisson/log 
#> Estimated: se, dispersion, rdf, varFitted, mu_mean, y_mean 
#>