Skip to contents

Fit linear mixed model for differential expression and preform hypothesis test on fixed effects as specified in the contrast matrix L

Usage

dream(
  exprObj,
  formula,
  data,
  L,
  ddf = c("adaptive", "Satterthwaite", "Kenward-Roger"),
  useWeights = TRUE,
  weightsMatrix = NULL,
  control = vpcontrol,
  suppressWarnings = FALSE,
  quiet = FALSE,
  BPPARAM = SerialParam(),
  computeResiduals = TRUE,
  REML = TRUE,
  ...
)

Arguments

exprObj

matrix of expression data (g genes x n samples), or ExpressionSet, or EList returned by voom() from the limma package

formula

specifies variables for the linear (mixed) model. Must only specify covariates, since the rows of exprObj are automatically used a a response. e.g.: ~ a + b + (1|c) Formulas with only fixed effects also work, and lmFit() followed by contrasts.fit() are run.

data

data.frame with columns corresponding to formula

L

contrast matrix specifying a linear combination of fixed effects to test

ddf

Specifiy "Satterthwaite" or "Kenward-Roger" method to estimate effective degress of freedom for hypothesis testing in the linear mixed model. Note that Kenward-Roger is more accurate, but is *much* slower. Satterthwaite is a good enough approximation for most datasets. "adaptive" (Default) uses KR for <= 10 samples.

useWeights

if TRUE, analysis uses heteroskedastic error estimates from voom(). Value is ignored unless exprObj is an EList() from voom() or weightsMatrix is specified

weightsMatrix

matrix the same dimension as exprObj with observation-level weights from voom(). Used only if useWeights is TRUE

control

control settings for lmer()

suppressWarnings

if TRUE, do not stop because of warnings or errors in model fit

quiet

suppress message, default FALSE

BPPARAM

parameters for parallel evaluation

computeResiduals

if TRUE, compute residuals and extract with residuals(fit). Setting to FALSE saves memory

REML

use restricted maximum likelihood to fit linear mixed model. default is TRUE. See Details.

...

Additional arguments for lmer() or lm()

Value

MArrayLM2 object (just like MArrayLM from limma), and the directly estimated p-value (without eBayes)

Details

A linear (mixed) model is fit for each gene in exprObj, using formula to specify variables in the regression (Hoffman and Roussos, 2021). If categorical variables are modeled as random effects (as is recommended), then a linear mixed model us used. For example if formula is ~ a + b + (1|c), then the model is

fit <- lmer( exprObj[j,] ~ a + b + (1|c), data=data)

useWeights=TRUE causes weightsMatrix[j,] to be included as weights in the regression model.

Note: Fitting the model for 20,000 genes can be computationally intensive. To accelerate computation, models can be fit in parallel using BiocParallel to run code in parallel. Parallel processing must be enabled before calling this function. See below.

The regression model is fit for each gene separately. Samples with missing values in either gene expression or metadata are omitted by the underlying call to lmer.

Hypothesis tests and degrees of freedom are producted by lmerTest and pbkrtest pacakges

While REML=TRUE is required by lmerTest when ddf='Kenward-Roger', ddf='Satterthwaite' can be used with REML as TRUE or FALSE. Since the Kenward-Roger method gave the best power with an accurate control of false positive rate in our simulations, and since the Satterthwaite method with REML=TRUE gives p-values that are slightly closer to the Kenward-Roger p-values, REML=TRUE is the default. See Vignette "3) Theory and practice of random effects and REML"

References

Hoffman GE, Roussos P (2021). “dream: Powerful differential expression analysis for repeated measures designs.” Bioinformatics, 37(2), 192--201.

Examples

# library(variancePartition)

library(BiocParallel)

# load simulated data:
# geneExpr: matrix of gene expression values
# info: information/metadata about each sample
data(varPartData)

form <- ~ Batch + (1|Individual) + (1|Tissue) 

# Fit linear mixed model for each gene
# run on just 10 genes for time
fit = dream( geneExpr[1:10,], form, info)
#> Dividing work into 1 chunks...
#> 
#> Total:0.7 s
fit = eBayes(fit)

# view top genes
topTable( fit )
#> Removing intercept from test coefficients
#>             Batch2       Batch3      Batch4     AveExpr          F    P.Value
#> gene6  -0.67049689 -1.034776107 -1.04729128  -3.1554790 2.26933190 0.08279052
#> gene8   0.02567263 -0.129563777  0.67499421   0.9171386 2.00798682 0.11524431
#> gene3   0.08200976 -0.172582199 -0.49673001   0.1702122 1.25744320 0.29114590
#> gene1  -0.46092425 -0.453624935 -0.05771812 -10.4664549 0.96788135 0.40960507
#> gene7  -0.31606553  0.149452306 -0.03562447  -4.3799381 0.74684626 0.52578552
#> gene2  -0.49420745 -0.382210977 -0.40013018  -1.1281610 0.60798371 0.61080712
#> gene5   0.26795357  0.006060832  0.27368297   4.7187640 0.35414851 0.78620137
#> gene9  -0.35301416 -0.096570091 -0.12646906  -2.3079042 0.33697299 0.79862047
#> gene10  0.11578714 -0.098856729  0.14565869  -2.3673775 0.23126959 0.87449591
#> gene4   0.11531023  0.122855760  0.13721184  -4.5359748 0.04827941 0.98590284
#>        adj.P.Val      F.std
#> gene6  0.5762215 2.22700623
#> gene8  0.5762215 1.97548090
#> gene3  0.9704863 1.24609722
#> gene1  0.9716621 0.96185982
#> gene7  0.9716621 0.74380775
#> gene2  0.9716621 0.60633634
#> gene5  0.9716621 0.35407317
#> gene9  0.9716621 0.33695844
#> gene10 0.9716621 0.23150131
#> gene4  0.9859028 0.04841547

# get contrast matrix testing if the coefficient for Batch3 is 
# different from coefficient for Batch2
# The variable of interest must be a fixed effect
L = makeContrastsDream(form, info, contrasts=c("Batch3 - Batch2"))

# plot contrasts
plotContrasts( L )


# Fit linear mixed model for each gene
# run on just 10 genes for time
fit2 = dream( geneExpr[1:10,], form, info, L)
#> Dividing work into 1 chunks...
#> 
#> Total:0.7 s
fit = eBayes(fit)

# view top genes
topTable( fit2, coef="Batch3 - Batch2" )
#>               logFC     AveExpr           t   P.Value adj.P.Val       z.std
#> gene7   0.465517838  -4.3799381  1.39780930 0.1665203 0.8766243  1.38347152
#> gene6  -0.364279218  -3.1554790 -0.93401216 0.3534470 0.8766243 -0.92792405
#> gene3  -0.254591961   0.1702122 -0.87057841 0.3869226 0.8766243 -0.86521177
#> gene10 -0.214643873  -2.3673775 -0.75005983 0.4556964 0.8766243 -0.74595194
#> gene5  -0.261892737   4.7187640 -0.73742122 0.4632915 0.8766243 -0.73343765
#> gene9   0.256444065  -2.3079042  0.63729350 0.5259746 0.8766243  0.63416279
#> gene8  -0.155236404   0.9171386 -0.45917626 0.6475090 0.9106411 -0.45722551
#> gene2   0.111996470  -1.1281610  0.34847719 0.7285129 0.9106411  0.34710440
#> gene4   0.007545528  -4.5359748  0.02528454 0.9798987 0.9813892  0.02519588
#> gene1   0.007299316 -10.4664549  0.02340914 0.9813892 0.9813892  0.02332734

# Parallel processing using multiple cores with reduced memory usage
param = SnowParam(4, "SOCK", progressbar=TRUE)
fit3 = dream( geneExpr[1:10,], form, info, L, BPPARAM = param)
#> Dividing work into 1 chunks...
#> iteration: 
#> 1
#> 
#> 
#> Total:4 s
fit = eBayes(fit)

# Fit fixed effect model for each gene
# Use lmFit in the backend
form <- ~ Batch 
fit4 = dream( geneExpr[1:10,], form, info, L)
#> Fixed effect model, using limma directly...
#> User can apply eBayes() afterwards...
fit4 = eBayes( fit4 )

# view top genes
topTable( fit4, coef="Batch3 - Batch2" )
#>              logFC     AveExpr           t    P.Value adj.P.Val         B
#> gene8  -1.83567724   0.9171386 -1.87030676 0.06312128 0.6312128 -4.581611
#> gene4  -1.01697419  -4.5359748 -1.17835890 0.24026308 0.9496566 -4.593019
#> gene9  -0.71148095  -2.3079042 -0.69240222 0.48960809 0.9496566 -4.598021
#> gene6  -0.66356866  -3.1554790 -0.64362644 0.52066540 0.9496566 -4.598381
#> gene7   0.59150199  -4.3799381  0.62693912 0.53152116 0.9496566 -4.598498
#> gene5   0.55374399   4.7187640  0.56590215 0.57218963 0.9496566 -4.598902
#> gene3   0.32421597   0.1702122  0.34561266 0.73005146 0.9496566 -4.600015
#> gene10  0.18733363  -2.3673775  0.21296925 0.83160034 0.9496566 -4.600427
#> gene2   0.10045251  -1.1281610  0.10985054 0.91265454 0.9496566 -4.600612
#> gene1  -0.05984159 -10.4664549 -0.06322915 0.94965664 0.9496566 -4.600657

# Compute residuals using dream
residuals(fit4)[1:4, 1:4]
#>             s1         s2         s3        s4
#> gene1 2.196588 -7.2102826  1.3618545 0.8055865
#> gene2 1.270341  0.6885371 -0.8045642 2.0594469
#> gene3 1.939465  0.8003329 -1.7723606 3.0120196
#> gene4 2.766271 -2.4954702 -3.5198043 5.2286010