The sccomp_remove_outliers
function takes as input a table of cell counts with columns for cell-group identifier, sample identifier, integer count, and factors (continuous or discrete). The user can define a linear model using an input R formula, where the first factor is the factor of interest. Alternatively, sccomp
accepts single-cell data containers (e.g., Seurat, SingleCellExperiment, cell metadata, or group-size) and derives the count data from cell metadata.
sccomp_remove_outliers(
.estimate,
percent_false_positive = 5,
cores = detectCores(),
inference_method = "pathfinder",
output_directory = "sccomp_draws_files",
verbose = TRUE,
mcmc_seed = sample(1e+05, 1),
max_sampling_iterations = 20000,
enable_loo = FALSE,
approximate_posterior_inference = NULL,
variational_inference = NULL,
...
)
A tibble including a cell_group name column, sample name column, read counts column (optional depending on the input class), and factor columns.
A real number between 0 and 100 (not inclusive), used to identify outliers with a specific false positive rate.
Integer, the number of cores to be used for parallel calculations.
Character string specifying the inference method to use ('pathfinder', 'hmc', or 'variational').
A character string specifying the output directory for Stan draws.
Logical, whether to print progression details.
Integer, used for Markov-chain Monte Carlo reproducibility. By default, a random number is sampled from 1 to 999999.
Integer, limits the maximum number of iterations in case a large dataset is used, to limit computation time.
Logical, whether to enable model comparison using the R package LOO. This is useful for comparing fits between models, similar to ANOVA.
DEPRECATED, use the variational_inference
argument.
Logical, whether to use variational Bayes for posterior inference. It is faster and convenient. Setting this argument to FALSE
runs full Bayesian (Hamiltonian Monte Carlo) inference, which is slower but the gold standard.
Additional arguments passed to the cmdstanr::sample
function.
A tibble (tbl
), with the following columns:
cell_group - The cell groups being tested.
parameter - The parameter being estimated from the design matrix described by the input formula_composition and formula_variability.
factor - The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).
c_lower - Lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.
c_effect - Mean of the posterior distribution for a composition (c) parameter.
c_upper - Upper (97.5%) quantile of the posterior distribution for a composition (c) parameter.
c_n_eff - Effective sample size, the number of independent draws in the sample. The higher, the better.
c_R_k_hat - R statistic, a measure of chain equilibrium, should be within 0.05 of 1.0.
v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter.
v_effect - Mean of the posterior distribution for a variability (v) parameter.
v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter.
v_n_eff - Effective sample size for a variability (v) parameter.
v_R_k_hat - R statistic for a variability (v) parameter, a measure of chain equilibrium.
count_data - Nested input count data.
message("Use the following example after having installed install.packages(\"cmdstanr\", repos = c(\"https://stan-dev.r-universe.dev/\", getOption(\"repos\")))")
#> Use the following example after having installed install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))
# \donttest{
if (instantiate::stan_cmdstan_exists()) {
data("counts_obj")
estimate = sccomp_estimate(
counts_obj,
~ type,
~1,
sample,
cell_group,
count,
cores = 1
) |>
sccomp_remove_outliers(cores = 1)
}
# }