Skip to contents

Fit regression model y ~ design + X_features[,j] for each feature j

Usage

# S4 method for class 'ANY,ANY,GenomicDataStream'
lmmFitFeatures(
  y,
  design,
  data,
  U,
  s,
  weights = NULL,
  dcmpMethod = c("general", "categorical"),
  REML = TRUE,
  delta = NULL,
  delta.range = c(-10, 10),
  tol = 1e-06,
  detail = 1,
  lambda = 0,
  nthreads = 1,
  verbose = TRUE,
  ...
)

Arguments

y

response vector

design

design matrix shared across all models

data

feature matrix with model j using feature j

U

eigen-vectors of random effect

s

eigen-values of random effect

weights

sample-level weights

dcmpMethod

use a "general" method (default) for SVD of Z. If Z is a categorical design matrix, used faster method "categorical"

REML

logical scalar - Should the estimates be chosen to optimize the REML criterion vs ML?

delta

if NULL estimate delta, if value is given use this fixed value

delta.range

min and max values (in log space), of the search space for delta to fit the random effect

tol

convergence criterion for the 1D search of the delta space

detail

level of model detail returned, with LEAST = 0, LOW = 1, MEDIUM = 2, HIGH = 3, MOST = 4, MAX = 5. LEAST (beta), LOW (beta, se, sigSq, rdf), MEDIUM (vcov), HIGH (residuals), MOST (hatvalues), MAX (deviance residuals)

lambda

ridge shrinkage parameter

nthreads

number of threads. Each model is fit in serial, analysis is parallelized across features

verbose

show progress

...

other args

Value

List of parameter estimates with entries coef, se, dispersion, rdf and other depending on detail

Examples

library(GenomicDataStream)

# create response, design and weights
y <- rnorm(60)
names(y) <- paste0("I", seq(60))
info <- data.frame(Age = rpois(60, 40))
rownames(info) <- names(y)

design <- model.matrix(~ Age, info)
w <- rep(1, 60)

# VCF file
file <- system.file("extdata", "test.vcf.gz", package = "GenomicDataStream")

# Read data into R
# then run lmFitFeatures()
gds <- GenomicDataStream(file, "DS", initialize = TRUE)
dat <- getNextChunk(gds)

res1 <- lmFitFeatures(y, design, dat$X, w)

res1
#> 		 lmFitFeatures 
#> 
#> coefs(1): x
#> features(10): 1:10000:C:A, 1:11000:T:C, ..., 1:18000:C:G, 1:19000:T:G
#> family: gaussian/identity 
#> Estimated: se, dispersion, rdf 
#> 

# Data stays at C++ level
# then run lmFitFeatures()
gds <- GenomicDataStream(file, "DS")

res2 <- lmFitFeatures(y, design, gds, w)
#> preprojection: 1

res2
#> 		 lmFitFeatures 
#> 
#> coefs(1): x
#> features(10): 1:10000:C:A, 1:11000:T:C, ..., 1:18000:C:G, 1:19000:T:G
#> family: gaussian/identity 
#> Estimated: se, dispersion, rdf 
#>