Compute precision weights for pseudobulk

Compute precision weights for pseudobulk using the delta method to approximate the variance of the log2 counts per million considering variation in the number of cells and gene expression variance across cells within each sample. By default, used number of cells; if specified use delta method. Note that processAssays() uses number of cells as weights when no weights are specificed

Usage

pbWeights(
  sce,
  sample_id,
  cluster_id,
  geneList = NULL,
  method = c("delta", "ncells"),
  shrink = TRUE,
  prior.count = 0.5,
  maxRatio = 20,
  h5adBlockSizes = 1e+09,
  details = FALSE,
  verbose = TRUE
)

Arguments

sce: SingleCellExperiment of where counts(sce) stores the raw count data at the single cell level
sample_id: character string specifying which variable to use as sample id
cluster_id: character string specifying which variable to use as cluster id
geneList: list of genes to be included for each cell type
method: select method to compute precision weights. 'delta' use the delta method based on normal approximation to a negative binomial model, slower but can increase power. 'ncells' use the number of cells, this is faster; Subsequent arguments are ignored. Included for testing
shrink: Defaults to TRUE. Use empirical Bayes variance shrinkage from limma to shrink estimates of expression variance across cells within each sample
prior.count: Defaults to 0.5. Count added to each observation at the pseudobulk level. This is scaled but the number of cells before added to the cell level
maxRatio: When computing precision as the reciprocal of variance 1/(x+tau) select tau to have a maximum ratio between the largest and smallest precision
h5adBlockSizes: set the automatic block size block size (in bytes) for DelayedArray to read an H5AD file. Larger values use more memory but are faster.
details: include data.frame of cell-level statistics as attr(., "details")
verbose: Show messages, defaults to TRUE

Examples

library(muscat)

data(example_sce)

# create pseudobulk for each sample and cell cluster
pb <- aggregateToPseudoBulk(example_sce,
  assay = "counts",
  sample_id = "sample_id",
  cluster_id = "cluster_id",
  verbose = FALSE
) 

# Gene expressed genes for each cell type
geneList = getExprGeneNames(pb)

# Create precision weights for pseudobulk
# By default, weights are set to cell count,
# which is the default in processAssays()
# even when no weights are specified
weightsList <- pbWeights(example_sce,
  sample_id = "sample_id",
  cluster_id = "cluster_id",
  geneList = geneList
)
#> Processing: B cells
#>   Computing library sizes...
#>   Processing samples...
#> Processing: CD14+ Monocytes
#>   Computing library sizes...
#>   Processing samples...
#> Processing: CD4 T cells
#>   Computing library sizes...
#>   Processing samples...
#> Processing: CD8 T cells
#>   Computing library sizes...
#>   Processing samples...
#> Processing: FCGR3A+ Monocytes
#>   Computing library sizes...
#>   Processing samples...

# voom-style normalization using initial weights
res.proc <- processAssays(pb, ~group_id, weightsList = weightsList)
#>   B cells...
#> 0.28 secs
#>   CD14+ Monocytes...
#> 0.38 secs
#>   CD4 T cells...
#> 0.3 secs
#>   CD8 T cells...
#> 0.2 secs
#>   FCGR3A+ Monocytes...
#> 0.4 secs