This function simulates counts from a linear model.
simulate_data(
.data,
.estimate_object,
formula_composition,
formula_variability = NULL,
.sample = NULL,
.cell_group = NULL,
.coefficients = NULL,
variability_multiplier = 5,
number_of_draws = 1,
mcmc_seed = sample(1e+05, 1),
cores = detectCores()
)
A tibble including a cell_group name column | sample name column | read counts column | factor columns | Pvalue column | a significance column
The result of sccomp_estimate execution. This is used for sampling from real-data properties.
A formula. The sample formula used to perform the differential cell_group abundance analysis
A formula. The formula describing the model for differential variability, for example ~treatment. In most cases, if differentially variability is of interest, the formula should only include the factor of interest as a large anount of data is needed to define variability depending to each factors.
A column name as symbol. The sample identifier
A column name as symbol. The cell_group identifier
The column names for coefficients, for example, c(b_0, b_1)
A real scalar. This can be used for artificially increasing the variability of the simulation for benchmarking purposes.
An integer. How may copies of the data you want to draw from the model joint posterior distribution.
An integer. Used for Markov-chain Monte Carlo reproducibility. By default a random number is sampled from 1 to 999999. This itself can be controlled by set.seed()#' @param cores Integer, the number of cores to be used for parallel calculations.
Integer, the number of cores to be used for parallel calculations.
A tibble (tbl
) with the following columns:
sample - A character column representing the sample name.
type - A factor column representing the type of the sample.
phenotype - A factor column representing the phenotype in the data.
count - An integer column representing the original cell counts.
cell_group - A character column representing the cell group identifier.
b_0 - A numeric column representing the first coefficient used for simulation.
b_1 - A numeric column representing the second coefficient used for simulation.
generated_proportions - A numeric column representing the generated proportions from the simulation.
generated_counts - An integer column representing the generated cell counts from the simulation.
replicate - An integer column representing the replicate number for each draw from the posterior distribution.
message("Use the following example after having installed install.packages(\"cmdstanr\", repos = c(\"https://stan-dev.r-universe.dev/\", getOption(\"repos\")))")
#> Use the following example after having installed install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev/", getOption("repos")))
# \donttest{
if (instantiate::stan_cmdstan_exists()) {
data("counts_obj")
library(dplyr)
estimate = sccomp_estimate(
counts_obj,
~ type, ~1, sample, cell_group, count,
cores = 1
)
# Set coefficients for cell_groups. In this case all coefficients are 0 for simplicity.
counts_obj = counts_obj |> mutate(b_0 = 0, b_1 = 0)
# Simulate data
simulate_data(counts_obj, estimate, ~type, ~1, sample, cell_group, c(b_0, b_1))
}
# }