The sccomp_remove_outliers function takes as input a table of cell counts with columns for cell-group identifier, sample identifier, integer count, and factors (continuous or discrete). The user can define a linear model using an input R formula, where the first factor is the factor of interest. Alternatively, sccomp accepts single-cell data containers (e.g., Seurat, SingleCellExperiment, cell metadata, or group-size) and derives the count data from cell metadata.

sccomp_remove_outliers(
  .estimate,
  percent_false_positive = 5,
  cores = detectCores(),
  inference_method = "pathfinder",
  output_directory = "sccomp_draws_files",
  verbose = TRUE,
  mcmc_seed = sample(1e+05, 1),
  max_sampling_iterations = 20000,
  enable_loo = FALSE,
  approximate_posterior_inference = NULL,
  variational_inference = NULL,
  ...
)

Arguments

.estimate

A tibble including a cell_group name column, sample name column, read counts column (optional depending on the input class), and factor columns.

percent_false_positive

A real number between 0 and 100 (not inclusive), used to identify outliers with a specific false positive rate.

cores

Integer, the number of cores to be used for parallel calculations.

inference_method

Character string specifying the inference method to use ('pathfinder', 'hmc', or 'variational').

output_directory

A character string specifying the output directory for Stan draws.

verbose

Logical, whether to print progression details.

mcmc_seed

Integer, used for Markov-chain Monte Carlo reproducibility. By default, a random number is sampled from 1 to 999999.

max_sampling_iterations

Integer, limits the maximum number of iterations in case a large dataset is used, to limit computation time.

enable_loo

Logical, whether to enable model comparison using the R package LOO. This is useful for comparing fits between models, similar to ANOVA.

approximate_posterior_inference

DEPRECATED, use the variational_inference argument.

variational_inference

Logical, whether to use variational Bayes for posterior inference. It is faster and convenient. Setting this argument to FALSE runs full Bayesian (Hamiltonian Monte Carlo) inference, which is slower but the gold standard.

...

Additional arguments passed to the cmdstanr::sample function.

Value

A tibble (tbl), with the following columns:

  • cell_group - The cell groups being tested.

  • parameter - The parameter being estimated from the design matrix described by the input formula_composition and formula_variability.

  • factor - The covariate factor in the formula, if applicable (e.g., not present for Intercept or contrasts).

  • c_lower - Lower (2.5%) quantile of the posterior distribution for a composition (c) parameter.

  • c_effect - Mean of the posterior distribution for a composition (c) parameter.

  • c_upper - Upper (97.5%) quantile of the posterior distribution for a composition (c) parameter.

  • c_n_eff - Effective sample size, the number of independent draws in the sample. The higher, the better.

  • c_R_k_hat - R statistic, a measure of chain equilibrium, should be within 0.05 of 1.0.

  • v_lower - Lower (2.5%) quantile of the posterior distribution for a variability (v) parameter.

  • v_effect - Mean of the posterior distribution for a variability (v) parameter.

  • v_upper - Upper (97.5%) quantile of the posterior distribution for a variability (v) parameter.

  • v_n_eff - Effective sample size for a variability (v) parameter.

  • v_R_k_hat - R statistic for a variability (v) parameter, a measure of chain equilibrium.

  • count_data - Nested input count data.

Examples


message("Use the following example after having installed install.packages(\"cmdstanr\", repos = c(\"https://stan-dev.r-universe.dev/\", getOption(\"repos\")))")
#> Use the following example after having installed install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))

# \donttest{
  if (instantiate::stan_cmdstan_exists()) {
    data("counts_obj")
    
    estimate = sccomp_estimate(
      counts_obj,
      ~ type,
      ~1,
      sample,
      cell_group,
      count,
      cores = 1
    ) |>
    sccomp_remove_outliers(cores = 1)
  }
# }