Fit GLM For Many Responses — glmFitResponses,sparseMatrix-method • GenomicDataStreamRegression

Fit regression model Y[j,] ~ design for each feature j

Usage

# S4 method for class 'sparseMatrix'
glmFitResponses(
  Y,
  design,
  family,
  weights = NULL,
  offset = NULL,
  detail = 1,
  doCoxReid = nrow(design) < 100,
  nthreads = 1,
  nReaders = 1,
  chunkSize = 1000,
  epsilon = 1e-08,
  maxit = 25,
  epsilon_nb = 1e-04,
  maxit_nb = 5,
  lambda = 0,
  verbose = TRUE,
  ...
)

# S4 method for class 'ANY'
glmFitResponses(
  Y,
  design,
  family,
  weights = NULL,
  offset = NULL,
  detail = 1,
  doCoxReid = nrow(design) < 100,
  nthreads = 1,
  nReaders = 1,
  chunkSize = 1000,
  epsilon = 1e-08,
  maxit = 25,
  epsilon_nb = 1e-04,
  maxit_nb = 5,
  lambda = 0,
  verbose = TRUE,
  ...
)

# S4 method for class 'DelayedArray'
glmFitResponses(
  Y,
  design,
  family,
  weights = NULL,
  offset = NULL,
  detail = 1,
  doCoxReid = nrow(design) < 100,
  nthreads = 1,
  nReaders = 1,
  chunkSize = 1000,
  epsilon = 1e-08,
  maxit = 25,
  epsilon_nb = 1e-04,
  maxit_nb = 5,
  lambda = 0,
  verbose = TRUE,
  ...
)

Arguments

Y: matrix of responses as __rows__
design: design matrix
family: a description of the error distribution and link function to be used in the model, just like for glm(). Also supports negative binomial as string "nb:theta", see details below
weights: vector of sample-level weights
offset: vector of sample-level offset values
detail: level of model detail returned, with LEAST = 0, LOW = 1, MEDIUM = 2, HIGH = 3, MOST = 4. LEAST (beta), LOW (beta, se, dispersion, rdf), MEDIUM (vcov), HIGH (pearson residuals), MOST (hatvalues)
doCoxReid: use Cox-Reid adjustment when estimating overdispersion for negative binomial models. Default TRUE for less than 100 samples
nthreads: number of threads. Each model is fit in serial, analysis is parallelized across responses.
nReaders: number of threads used for reading data from disk
chunkSize: number of features to read per chunk
epsilon: tolerance for GLM IRLS
maxit: max iterations for GLM IRLS
epsilon_nb: tolerance for negative binomial
maxit_nb: max iterations for negative binomial
lambda: ridge shrinkage parameter
verbose: show progress
...: other args

Value

List of parameter estimates with entries coef, se, dispersion, rdf and other depending on detail

Details

Generalized linear models can be fit with family like in glm() using gaussian(), poisson(), binomial(), binomial("probit"), quasibinomial(), quasipoisson(), negative.binomial(theta), "nb", "nb:theta". Or array of entries of form "nb:theta", where theta is the parameter for the negative binomial distribution

Examples

library(DelayedArray)
#> Loading required package: stats4
#> Loading required package: Matrix
#> Loading required package: BiocGenerics
#> Loading required package: generics
#> 
#> Attaching package: ‘generics’
#> The following objects are masked from ‘package:base’:
#> 
#>     as.difftime, as.factor, as.ordered, intersect, is.element, setdiff,
#>     setequal, union
#> 
#> Attaching package: ‘BiocGenerics’
#> The following object is masked from ‘package:GenomicDataStream’:
#> 
#>     rownames
#> The following object is masked from ‘package:BatchRegression’:
#> 
#>     rownames
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, is.unsorted, lapply,
#>     mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     rank, rbind, rownames, sapply, saveRDS, table, tapply, unique,
#>     unsplit, which.max, which.min
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: S4Vectors
#> 
#> Attaching package: ‘S4Vectors’
#> The following objects are masked from ‘package:Matrix’:
#> 
#>     expand, unname
#> The following object is masked from ‘package:utils’:
#> 
#>     findMatches
#> The following objects are masked from ‘package:base’:
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> 
#> Attaching package: ‘IRanges’
#> The following object is masked from ‘package:nlme’:
#> 
#>     collapse
#> Loading required package: S4Arrays
#> Loading required package: abind
#> 
#> Attaching package: ‘S4Arrays’
#> The following object is masked from ‘package:abind’:
#> 
#>     abind
#> The following object is masked from ‘package:base’:
#> 
#>     rowsum
#> Loading required package: SparseArray
#> 
#> Attaching package: ‘DelayedArray’
#> The following objects are masked from ‘package:base’:
#> 
#>     apply, scale, sweep
n <- 100
m <- 5
nc <- 2
set.seed(1)
Y <- matrix(rnorm(n * m), m, n)
Y <- DelayedArray(Y)
X <- matrix(rnorm(n * nc), n, nc)
rownames(Y) <- seq(m)

# fit regressions with model j using Y[j,] as a response
fit <- glmFitResponses(Y, X, family=gaussian())

fit
#> 		 glmFitResponses 
#> 
#> coefs(2): V1, V2
#> responses(5): 1, 2, 3, 4, 5
#> family: gaussian/identity 
#> Estimated: se, dispersion, rdf, varFitted 
#>