Skip to contents

Fit regression model y ~ design + X_features[,j] for each feature j

Usage

glmFitFeatures(
  y,
  design,
  data,
  family,
  weights,
  offset,
  detail = 1,
  doCoxReid = length(y) < 100,
  shareTheta = FALSE,
  fastApprox = FALSE,
  nthreads = 1,
  epsilon = 1e-08,
  maxit = 25,
  epsilon_nb = 1e-04,
  maxit_nb = 5,
  lambda = 0,
  ...
)

# S4 method for class 'ANY,ANY,matrix'
glmFitFeatures(
  y,
  design,
  data,
  family,
  weights,
  offset,
  detail = 1,
  doCoxReid = length(y) < 100,
  shareTheta = FALSE,
  fastApprox = FALSE,
  nthreads = 1,
  epsilon = 1e-08,
  maxit = 25,
  epsilon_nb = 1e-04,
  maxit_nb = 5,
  lambda = 0,
  ...
)

Arguments

y

response vector

design

design matrix shared across all models

data

feature matrix with model j using feature j

family

a description of the error distribution and link function to be used in the modelm just like for glm(). Also supports negative binomial as string "nb:theta", see details below

weights

vector of sample-level weights

offset

vector of sample-level offset values

detail

level of model detail returned, with LEAST = 0, LOW = 1, MEDIUM = 2, HIGH = 3, MOST = 4, MAX = 5. LEAST (beta), LOW (beta, se, sigSq, rdf), MEDIUM (vcov), HIGH (residuals), MOST (hatvalues), MAX (deviance residuals)

doCoxReid

use Cox-Reid adjustment when estimating overdispersion for negative binomial models. Default TRUE for less than 100 samples

shareTheta

estimate theta from design matrix, and share across all features instead of re-estimating for each feature

fastApprox

default false. if true, use pre-projection on the working response from an initial regression fit on only the design. Under the null for data, this is a very good approximation and _much_ faster

nthreads

number of threads. Each model is fit in serial, analysis is parallelized across features

epsilon

tolerance for GLM IRLS

maxit

max iterations for GLM IRLS

epsilon_nb

tolerance for negative binomial

maxit_nb

max iterations for negative binomial

lambda

ridge shrinkage parameter

...

other args

Value

List of parameter estimates with entries coef, se, dispersion, rdf and other depending on detail

Details

Generalized linear models can be fit with family like in glm() using gaussian(), poisson(), binomial(), binomial("probit"), quasibinomial(), quasipoisson(), negative.binomial(theta), "nb", "nb:theta". Or array of entries of form "nb:theta", where theta is the parameter for the negative binomial distribution

Examples

n <- 100 # number of samples
p <- 10 # number of features
nc <- 3 # number shared covariates
set.seed(1)
y <- rpois(n, 10)
X <- cbind(1, matrix(rnorm(n * p), n, p))
colnames(X) <- seq(ncol(X))
design <- matrix(rnorm(n * nc), n, nc)

# fit regressions with model j including X[,j]
fit <- glmFitFeatures(y, design, X, "poisson")

fit
#> 		 glmFitFeatures 
#> 
#> coefs(4): V1, V2, V3, x
#> features(11): 1, 2, ..., 10, 11
#> family: poisson/log 
#> Estimated: se, dispersion, rdf, varFitted, mu_mean, y_mean 
#>