The sccomp_estimate
function performs linear modeling on a table of cell counts or proportions,
which includes a cell-group identifier, sample identifier, abundance (counts or proportions), and factors
(continuous or discrete). The user can define a linear model using an R formula,
where the first factor is the factor of interest. Alternatively, sccomp
accepts
single-cell data containers (e.g., Seurat, SingleCellExperiment, cell metadata, or
group-size) and derives the count data from cell metadata.
sccomp_estimate(
.data,
formula_composition = ~1,
formula_variability = ~1,
.sample,
.cell_group,
.abundance = NULL,
cores = detectCores(),
bimodal_mean_variability_association = FALSE,
percent_false_positive = 5,
inference_method = "pathfinder",
prior_mean = list(intercept = c(0, 1), coefficients = c(0, 1)),
prior_overdispersion_mean_association = list(intercept = c(5, 2), slope = c(0, 0.6),
standard_deviation = c(10, 20)),
.sample_cell_group_pairs_to_exclude = NULL,
output_directory = "sccomp_draws_files",
verbose = TRUE,
enable_loo = FALSE,
noise_model = "multi_beta_binomial",
exclude_priors = FALSE,
use_data = TRUE,
mcmc_seed = sample(1e+05, 1),
max_sampling_iterations = 20000,
pass_fit = TRUE,
...,
.count = NULL,
approximate_posterior_inference = NULL,
variational_inference = NULL
)
A tibble including cell_group name column, sample name column, abundance column (counts or proportions), and factor columns.
A formula describing the model for differential abundance.
A formula describing the model for differential variability.
A column name as a symbol for the sample identifier.
A column name as a symbol for the cell-group identifier.
A column name as a symbol for the cell-group abundance, which can be counts (> 0) or proportions (between 0 and 1, summing to 1 across .cell_group
).
Number of cores to use for parallel calculations.
Logical, whether to model mean-variability as bimodal.
A real number between 0 and 100 for outlier identification.
Character string specifying the inference method to use ('pathfinder', 'hmc', or 'variational').
A list specifying prior knowledge about the mean distribution, including intercept and coefficients.
A list specifying prior knowledge about mean/variability association.
A column name indicating sample/cell-group pairs to exclude.
A character string specifying the output directory for Stan draws.
Logical, whether to print progression details.
Logical, whether to enable model comparison using the LOO package.
A character string specifying the noise model (e.g., 'multi_beta_binomial').
Logical, whether to run a prior-free model.
Logical, whether to run the model data-free.
An integer seed for MCMC reproducibility.
Integer to limit the maximum number of iterations for large datasets.
Logical, whether to include the Stan fit as an attribute in the output.
Additional arguments passed to the cmdstanr::sample
function.
DEPRECATED. Use .abundance
instead.
DEPRECATED. Use inference_method
instead.
DEPRECATED. Use inference_method
instead.
A tibble (tbl
) with the following columns:
cell_group - The cell groups being tested.
parameter - The parameter being estimated from the design matrix described by the input formula_composition
and formula_variability
.
factor - The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).
c_lower - Lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.
c_effect - Mean of the posterior distribution for a composition (c) parameter.
c_upper - Upper (97.5%) quantile of the posterior distribution for a composition (c) parameter.
c_pH0 - Probability of the null hypothesis (no difference) for a composition (c). This is not a p-value.
c_FDR - False-discovery rate of the null hypothesis for a composition (c).
c_n_eff - Effective sample size for a composition (c) parameter.
c_R_k_hat - R statistic for a composition (c) parameter, should be within 0.05 of 1.0.
v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter.
v_effect - Mean of the posterior distribution for a variability (v) parameter.
v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter.
v_pH0 - Probability of the null hypothesis for a variability (v).
v_FDR - False-discovery rate of the null hypothesis for a variability (v).
v_n_eff - Effective sample size for a variability (v) parameter.
v_R_k_hat - R statistic for a variability (v) parameter.
count_data - Nested input count data.
message("Use the following example after having installed cmdstanr with install.packages(\"cmdstanr\", repos = c(\"https://stan-dev.r-universe.dev/\", getOption(\"repos\")))")
#> Use the following example after having installed cmdstanr with install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))
# \donttest{
if (instantiate::stan_cmdstan_exists()) {
data("counts_obj")
estimate <- sccomp_estimate(
counts_obj,
~ type,
~1,
sample,
cell_group,
count,
cores = 1
)
# Note!
# If counts are available, do not use proportion.
# Using proportion ignores the high uncertainty of low counts
estimate_proportion <- sccomp_estimate(
counts_obj,
~ type,
~1,
sample,
cell_group,
proportion,
cores = 1
)
}
# }