Skip to contents

Here we perform analysis of PBMCs from 8 individuals stimulated with interferon-β (Kang, et al, 2018, Nature Biotech). This is a small dataset, but it gives us an opportunity to walk through a standard lucida workflow modelling repeated measures in single cell study.

Load single cell count data

Here, single cell RNA-seq data is downloaded from ExperimentHub.

library(lucida)
library(SingleCellExperiment)
library(ExperimentHub)
library(scater)
 
# Download data, specifying EH2259 for the Kang, et al study
eh <- ExperimentHub()
sce <- eh[["EH2259"]]

# Set variables to have clear names
sce$Individual <- as.character(sce$ind)
sce$Status <- factor(sce$stim, c("ctrl", "stim"))
sce$CellType <- factor(sce$cell)

# only keep singlet cells with sufficient reads
sce <- sce[rowSums(counts(sce) > 0) > 0, ]
sce <- sce[, colData(sce)$multiplets == "singlet"]

# compute QC metrics
qc <- perCellQCMetrics(sce)

# remove cells with few or many detected genes
ol <- isOutlier(metric = qc$detected, nmads = 2, log = TRUE)
sce <- sce[, !ol]

# Compute library size for each cell
sce$libSize = colSums(counts(sce))

Run lucida

lucida fits regression models for each gene and cell type in order to identify genes showing a mean change in expression between stimulated and unstimulated conditions. The variable Status has levels stim for simulated and ctrl for controls. Since many cells are observed for each individual, we include Individual as a random effect to acount for the repeated measures. We only included Status and Individual in this model, but analyses of larger datasets can include additional biological or technical covariates.

The formula ~ Status + (1|Individual) fits an intercept term, a coefficient Statusstim for the simulated condition compared to the baseline ctrl, and models Individual as a random effect. "CellType" indicates the variable storing the cell type identifer, so analysis is peformed for each cell type. These are variables are all stored in as columns in colData(sce).

fit <- lucida(sce, ~ Status + (1|Individual), "CellType")

Printing out the model fit, we see that 2 coefficients (i.e. (Intercept), and Statusstim) were estimated across 8 cell clusters using a negative binomial mixed model (NBMM) with dispersion shrinkage. A total of 9440 models were fit successfully across the cell types, with the remaining genes having insufficient coverage.

fit
##      lucida analysis
## 
## Formula: ~ Status + (1 | Individual) 
## coefs(2): (Intercept), Statusstim
## Cell clusters(8): CD14+ Monocytes, CD4 T cells, ..., FCGR3A+ Monocytes,
##   Dendritic cells
## Method: NBMM with dispersion shrinkage
## Models fit: 9440

Extract results

The results of the model fits for each cell type and gene can be extacted by specifying the coefficient "Statusstim". The results are stored as a tibble to enable filtering downstream. The default call uses method = "classic" to return the maximum likelihood estimates of the log fold changes (logFC) and uses the Benjamini-Hochberg (BH) procedure for multiple testing correction.

results(fit, coef = "Statusstim")
## # A tibble: 9,440 × 7
##    cluster_id      ID       AveExpr logFC     se P.Value   FDR
##    <chr>           <chr>      <dbl> <dbl>  <dbl>   <dbl> <dbl>
##  1 CD14+ Monocytes ISG15      55.3   4.62 0.0775       0     0
##  2 CD14+ Monocytes IFI6        3.55  2.68 0.0433       0     0
##  3 CD14+ Monocytes MARCKSL1    2.11 -1.46 0.0365       0     0
##  4 CD14+ Monocytes GBP1        1.72  1.93 0.0479       0     0
##  5 CD14+ Monocytes RSAD2       4.47  4.79 0.123        0     0
##  6 CD14+ Monocytes TMSB10      7.61  1.22 0.0182       0     0
##  7 CD14+ Monocytes RPL15       3.78 -1.01 0.0236       0     0
##  8 CD14+ Monocytes PLSCR1      2.11  1.68 0.0338       0     0
##  9 CD14+ Monocytes TNFSF10     4.51  4.90 0.123        0     0
## 10 CD14+ Monocytes FAM26F      1.71  3.28 0.0790       0     0
## # ℹ 9,430 more rows

Plots

lucida has a number of built-in plots for diagnostics and visualization.

Dispersion estimates

Plot the relationship between the dispersion and the mean of normalizd counts like from DESeq2 (Love, et al. 2014). Here, the y-axis plots 1/θ1/\theta where θ\theta is the overdispersion parameter from the negative binomial model. Black points indicate the maximum likelihood estimate for each gene, red indicates the smoothed trend line, and blue points indicate the values after shrinkage. Grey points indicate dispersion outliers excluded from the trend.

Effect size vs average expression

Plot the relationship between the estimated effect size and mean normalized counts. Genes with low expression have more stochastic variation due to sampling low counts, so their effect size estimates show high variance. The effect sizes should be centered arount zero and the variance should decrease for genes with high counts. Every point is a gene and, genes passing the multiple testing correction are shown in red.

plotMA(fit, coef='Statusstim', ymax=6)

Volcano plot

The volcano plot can indicate the strength of the differential expression signal with each cell type. Genes passing the multiple testing correction are shown in red.

plotVolcano(fit, coef='Statusstim')

Variance partitioning

Variance partitioning analysis uses linear or linear mixed models to quanify the contribution of multiple sources of expression variation at the gene-level. For each gene it fits a linear (mixed) model and evalutes the fraction of expression variation explained by each variable. Variance fractions can be visualized at the gene-level for each cell type using a bar plot, or genome-wide using a violin plot.

In the case of the NBMM, estimating the variance fractions requires fitting the model again using only an intercept term and the random effect.

Violin plots

fit.null <- lucida(sce, ~ (1|Individual), "CellType")

vp <- fitVarPart(fit, fit.null)

plotVarPart(sortCols(vp))

Bar plots

Show the fraction of variance explained by each variable for a subset of genes.

# Bar plots of a subset of genes
library(dplyr)
vp %>%
  sortCols %>%
  filter(Gene %in% c('ISG15', 'C19orf10')) %>%
  plotPercentBars

PCA on full dataset

The lucida workflow considers raw counts using a negative bininomial model. lucida can also transform the counts for use in downstream analysis like clustering and PCA. Like sctransform (Choudhary and Satija, 2022), lucida can fit a negative binomial model for each gene across all cell types. A variance stabilizing transform extract residuals from the model fit to be used for downstream analysis.

Here, we fit lucida(), run vst(), perform PCA using irlba() and plot the results. We see the separatioin of cell types and stimulation status.

# run lucida with no formula
fit.all <- lucida(sce, ~ 1)

# Variance Stabilizing Transform 
# using residuals from model fit
res.vst <- vst( fit.all )

library(irlba)
pca <- irlba(res.vst)

pca$v[,1:3] %>%
  data.frame(., CellType = sce$cell, Status = sce$Status) %>%  
  ggplot(aes(X1, X2, color=CellType, shape = Status)) +
    geom_point() +
    theme_classic() +
    theme(aspect.ratio=1,
          plot.title = element_text(hjust = 0.5)) +
    xlab("PC 1") + 
    ylab("PC 2")

Bayesian post-processing

Setting method = "Bayesian" performs effect size shrinkage and multiple testing correction using the empirical Bayes approach of Stephens (2017) within the ashr package. This reports the local false discovery rate (lFDR) and the local false sign rate (lFSR). lFDR is the probability that the true effect size is zero. lFSR is more conservative and is the estimated probability that the true effect size is has the opposite sign or is zero.

Extract results

results(fit, coef = "Statusstim", method = "Bayesian")
## # A tibble: 9,440 × 7
##    cluster_id      ID       AveExpr logFC     se  lFDR  lFSR
##    <chr>           <chr>      <dbl> <dbl>  <dbl> <dbl> <dbl>
##  1 CD14+ Monocytes ISG15      55.3   4.62 0.0775     0     0
##  2 CD14+ Monocytes IFI6        3.55  2.68 0.0433     0     0
##  3 CD14+ Monocytes MARCKSL1    2.11 -1.46 0.0365     0     0
##  4 CD14+ Monocytes GBP1        1.72  1.93 0.0479     0     0
##  5 CD14+ Monocytes RSAD2       4.47  4.76 0.101      0     0
##  6 CD14+ Monocytes TMSB10      7.61  1.22 0.0167     0     0
##  7 CD14+ Monocytes RPL15       3.78 -1.01 0.0236     0     0
##  8 CD14+ Monocytes PLSCR1      2.11  1.68 0.0326     0     0
##  9 CD14+ Monocytes TNFSF10     4.51  4.83 0.0883     0     0
## 10 CD14+ Monocytes FAM26F      1.71  3.28 0.0781     0     0
## # ℹ 9,430 more rows

In addition, plotMA() and plotVolcano() can be run with results from Bayesian post-processing.

Effect size vs average expression

plotMA(fit, coef='Statusstim', method = "Bayesian")

Volcano plot

plotVolcano(fit, coef='Statusstim', method = "Bayesian")

Session info
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin23.6.0
## Running under: macOS Sonoma 14.7.1
## 
## Matrix products: default
## BLAS/LAPACK: /opt/homebrew/Cellar/openblas/0.3.30/lib/libopenblasp-r0.3.30.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] irlba_2.3.5.1               Matrix_1.7-4               
##  [3] dplyr_1.1.4                 muscData_1.24.0            
##  [5] scater_1.38.0               ggplot2_4.0.1              
##  [7] scuttle_1.20.0              ExperimentHub_3.0.0        
##  [9] AnnotationHub_4.0.0         BiocFileCache_3.0.0        
## [11] dbplyr_2.5.1                SingleCellExperiment_1.32.0
## [13] SummarizedExperiment_1.40.0 Biobase_2.70.0             
## [15] GenomicRanges_1.62.1        Seqinfo_1.0.0              
## [17] IRanges_2.44.0              S4Vectors_0.48.0           
## [19] BiocGenerics_0.56.0         generics_0.1.4             
## [21] MatrixGenerics_1.22.0       matrixStats_1.5.0          
## [23] lucida_0.0.2               
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3                jsonlite_2.0.0                   
##   [3] magrittr_2.0.4                    ggbeeswarm_0.7.3                 
##   [5] farver_2.1.2                      nloptr_2.2.1                     
##   [7] rmarkdown_2.30                    fs_1.6.6                         
##   [9] ragg_1.5.0                        vctrs_0.6.5                      
##  [11] memoise_2.0.1                     minqa_1.2.8                      
##  [13] SQUAREM_2021.1                    mixsqp_0.3-54                    
##  [15] htmltools_0.5.9                   S4Arrays_1.10.1                  
##  [17] progress_1.2.3                    curl_7.0.0                       
##  [19] BiocNeighbors_2.4.0               truncnorm_1.0-9                  
##  [21] Rhdf5lib_1.32.0                   SparseArray_1.10.8               
##  [23] Formula_1.2-5                     rhdf5_2.54.1                     
##  [25] sass_0.4.10                       parallelly_1.46.0                
##  [27] bslib_0.9.0                       htmlwidgets_1.6.4                
##  [29] desc_1.4.3                        httr2_1.2.2                      
##  [31] cachem_1.1.0                      lifecycle_1.0.4                  
##  [33] pkgconfig_2.0.3                   rsvd_1.0.5                       
##  [35] R6_2.6.1                          fastmap_1.2.0                    
##  [37] anndataR_1.1.0                    rbibutils_2.4                    
##  [39] digest_0.6.39                     AnnotationDbi_1.72.0             
##  [41] textshaping_1.0.4                 RSQLite_2.4.5                    
##  [43] beachmat_2.26.0                   labeling_0.4.3                   
##  [45] invgamma_1.2                      filelock_1.0.3                   
##  [47] httr_1.4.7                        abind_1.4-8                      
##  [49] compiler_4.5.1                    withr_3.0.2                      
##  [51] bit64_4.6.0-1                     S7_0.2.1                         
##  [53] BiocParallel_1.44.0               viridis_0.6.5                    
##  [55] carData_3.0-5                     DBI_1.2.3                        
##  [57] HDF5Array_1.38.0                  MASS_7.3-65                      
##  [59] rappdirs_0.3.3                    DelayedArray_0.36.0              
##  [61] tools_4.5.1                       vipor_0.4.7                      
##  [63] otel_0.2.0                        beeswarm_0.4.0                   
##  [65] glmGamPoi_1.22.0                  glue_1.8.0                       
##  [67] h5mread_1.2.1                     nlme_3.1-168                     
##  [69] rhdf5filters_1.22.0               grid_4.5.1                       
##  [71] gtable_0.3.6                      tidyr_1.3.2                      
##  [73] hms_1.1.4                         utf8_1.2.6                       
##  [75] ScaledMatrix_1.18.0               BiocSingular_1.26.1              
##  [77] car_3.1-3                         XVector_0.50.0                   
##  [79] ggrepel_0.9.6                     BiocVersion_3.22.0               
##  [81] pillar_1.11.1                     stringr_1.6.0                    
##  [83] limma_3.66.0                      GenomicDataStreamRegression_0.0.4
##  [85] GenomicDataStream_0.0.67          splines_4.5.1                    
##  [87] lattice_0.22-7                    bit_4.6.0                        
##  [89] tidyselect_1.2.1                  locfit_1.5-9.12                  
##  [91] Biostrings_2.78.0                 knitr_1.51                       
##  [93] gridExtra_2.3                     reformulas_0.4.3                 
##  [95] edgeR_4.8.2                       xfun_0.55                        
##  [97] statmod_1.5.1                     stringi_1.8.7                    
##  [99] yaml_2.3.12                       boot_1.3-32                      
## [101] codetools_0.2-20                  evaluate_1.0.5                   
## [103] tibble_3.3.0                      BiocManager_1.30.27              
## [105] cli_3.6.5                         RcppParallel_5.1.11-1            
## [107] reticulate_1.44.1                 systemfonts_1.3.1                
## [109] Rdpack_2.6.4                      jquerylib_0.1.4                  
## [111] BatchRegression_0.0.14            dichromat_2.0-0.1                
## [113] Rcpp_1.1.0                        png_0.1-8                        
## [115] parallel_4.5.1                    pkgdown_2.2.0                    
## [117] blob_1.2.4                        fastglmm_0.3.8                   
## [119] prettyunits_1.2.0                 beachmat.hdf5_1.8.0              
## [121] lme4_2.0-0                        viridisLite_0.4.2                
## [123] scales_1.4.0                      purrr_1.2.0                      
## [125] crayon_1.5.3                      fANCOVA_0.6-1                    
## [127] rlang_1.1.6                       ashr_2.2-63                      
## [129] KEGGREST_1.50.0