The sccomp_estimate function performs linear modeling on a table of cell counts or proportions, which includes a cell-group identifier, sample identifier, abundance (counts or proportions), and factors (continuous or discrete). The user can define a linear model using an R formula, where the first factor is the factor of interest. Alternatively, sccomp accepts single-cell data containers (e.g., Seurat, SingleCellExperiment, cell metadata, or group-size) and derives the count data from cell metadata.

sccomp_estimate(
  .data,
  formula_composition = ~1,
  formula_variability = ~1,
  .sample,
  .cell_group,
  .abundance = NULL,
  cores = detectCores(),
  bimodal_mean_variability_association = FALSE,
  percent_false_positive = 5,
  inference_method = "pathfinder",
  prior_mean = list(intercept = c(0, 1), coefficients = c(0, 1)),
  prior_overdispersion_mean_association = list(intercept = c(5, 2), slope = c(0, 0.6),
    standard_deviation = c(10, 20)),
  .sample_cell_group_pairs_to_exclude = NULL,
  output_directory = "sccomp_draws_files",
  verbose = TRUE,
  enable_loo = FALSE,
  noise_model = "multi_beta_binomial",
  exclude_priors = FALSE,
  use_data = TRUE,
  mcmc_seed = sample(1e+05, 1),
  max_sampling_iterations = 20000,
  pass_fit = TRUE,
  ...,
  .count = NULL,
  approximate_posterior_inference = NULL,
  variational_inference = NULL
)

Arguments

.data

A tibble including cell_group name column, sample name column, abundance column (counts or proportions), and factor columns.

formula_composition

A formula describing the model for differential abundance.

formula_variability

A formula describing the model for differential variability.

.sample

A column name as a symbol for the sample identifier.

.cell_group

A column name as a symbol for the cell-group identifier.

.abundance

A column name as a symbol for the cell-group abundance, which can be counts (> 0) or proportions (between 0 and 1, summing to 1 across .cell_group).

cores

Number of cores to use for parallel calculations.

bimodal_mean_variability_association

Logical, whether to model mean-variability as bimodal.

percent_false_positive

A real number between 0 and 100 for outlier identification.

inference_method

Character string specifying the inference method to use ('pathfinder', 'hmc', or 'variational').

prior_mean

A list specifying prior knowledge about the mean distribution, including intercept and coefficients.

prior_overdispersion_mean_association

A list specifying prior knowledge about mean/variability association.

.sample_cell_group_pairs_to_exclude

A column name indicating sample/cell-group pairs to exclude.

output_directory

A character string specifying the output directory for Stan draws.

verbose

Logical, whether to print progression details.

enable_loo

Logical, whether to enable model comparison using the LOO package.

noise_model

A character string specifying the noise model (e.g., 'multi_beta_binomial').

exclude_priors

Logical, whether to run a prior-free model.

use_data

Logical, whether to run the model data-free.

mcmc_seed

An integer seed for MCMC reproducibility.

max_sampling_iterations

Integer to limit the maximum number of iterations for large datasets.

pass_fit

Logical, whether to include the Stan fit as an attribute in the output.

...

Additional arguments passed to the cmdstanr::sample function.

.count

DEPRECATED. Use .abundance instead.

approximate_posterior_inference

DEPRECATED. Use inference_method instead.

variational_inference

DEPRECATED. Use inference_method instead.

Value

A tibble (tbl) with the following columns:

  • cell_group - The cell groups being tested.

  • parameter - The parameter being estimated from the design matrix described by the input formula_composition and formula_variability.

  • factor - The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).

  • c_lower - Lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.

  • c_effect - Mean of the posterior distribution for a composition (c) parameter.

  • c_upper - Upper (97.5%) quantile of the posterior distribution for a composition (c) parameter.

  • c_pH0 - Probability of the null hypothesis (no difference) for a composition (c). This is not a p-value.

  • c_FDR - False-discovery rate of the null hypothesis for a composition (c).

  • c_n_eff - Effective sample size for a composition (c) parameter.

  • c_R_k_hat - R statistic for a composition (c) parameter, should be within 0.05 of 1.0.

  • v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter.

  • v_effect - Mean of the posterior distribution for a variability (v) parameter.

  • v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter.

  • v_pH0 - Probability of the null hypothesis for a variability (v).

  • v_FDR - False-discovery rate of the null hypothesis for a variability (v).

  • v_n_eff - Effective sample size for a variability (v) parameter.

  • v_R_k_hat - R statistic for a variability (v) parameter.

  • count_data - Nested input count data.

Examples


message("Use the following example after having installed cmdstanr with install.packages(\"cmdstanr\", repos = c(\"https://stan-dev.r-universe.dev/\", getOption(\"repos\")))")
#> Use the following example after having installed cmdstanr with install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))

# \donttest{
  if (instantiate::stan_cmdstan_exists()) {
    data("counts_obj")

    estimate <- sccomp_estimate(
      counts_obj,
      ~ type,
      ~1,
      sample,
      cell_group,
      count,
      cores = 1
    )
    
   # Note! 
   # If counts are available, do not use proportion.
   # Using proportion ignores the high uncertainty of low counts
   
   estimate_proportion <- sccomp_estimate(
      counts_obj,
      ~ type,
      ~1,
      sample,
      cell_group,
      proportion,
      cores = 1
    )
    
  }
# }