Here we provide tips for working with the epidist
package, and answers to “frequently” asked questions.
If you have a question about using the package, please create an issue and we will endeavour to get back to you soon!
The code block below is provided to facilitate reproduction of this script, if required!
library(epidist)
library(brms)
library(dplyr)
library(ggplot2)
library(scales)
library(tidyr)
set.seed(1)
meanlog <- 1.8
sdlog <- 0.5
obs_time <- 25
sample_size <- 200
obs_cens_trunc <- simulate_gillespie(seed = 101) |>
simulate_secondary(
meanlog = meanlog,
sdlog = sdlog
) |>
observe_process() |>
filter_obs_by_obs_time(obs_time = obs_time)
obs_cens_trunc_samp <-
obs_cens_trunc[sample(seq_len(.N), sample_size, replace = FALSE)]
data <- as_latent_individual(obs_cens_trunc_samp)
fit <- epidist(
data,
formula = bf(mu ~ 1, sigma ~ 1),
seed = 1
)
I would like to work with the samples output
The output of a call to epidist
is compatible with typical Stan workflows.
We recommend use of the posterior
package for working with samples from MCMC or other sampling algorithms.
For example, the function posterior::as_draws_df()
may be used to obtain a dataframe of MCMC draws for specified parameters.
library(posterior)
draws <- as_draws_df(fit, variable = c("Intercept", "Intercept_sigma"))
head(draws)
## # A draws_df: 6 iterations, 1 chains, and 2 variables
## Intercept Intercept_sigma
## 1 1.7 0.46
## 2 1.8 0.43
## 3 1.8 0.48
## 4 1.7 0.44
## 5 1.7 0.47
## 6 1.7 0.46
## # ... hidden reserved variables {'.chain', '.iteration', '.draw'}
How can I assess if sampling has converged?
The output of a call to epidist
is compatible with typical Stan workflows.
We recommend use of the bayesplot
package for sampling diagnostic plots.
For example, the function bayesplot::mcmc_trace()
can be used to produce traceplots for specified parameters.
library(bayesplot)
mcmc_trace(fit, pars = c("Intercept", "Intercept_sigma"))
I’d like to run a simulation study
We recommend use of the purrr
package for running many epidist
models, for example as a part of a simulation study.
We particularly highlight two functions which might be useful:
-
purrr::map()
(and other similar functions) for iterating over a list of inputs. -
purrr::safely()
which ensures that the function called “always succeeds”. In other words, if there is an error it will be captured and output, rather than ending computation (and potentially disrupting a call topurrr::map()
).
For an example use of these functions, have a look at the epidist-paper
repository containing the code for Park et al. (2024).
(Note that in that codebase, we use map
as a part of a targets
pipeline.)
How did you choose the default priors for epidist
?
brms
provides default priors for all parameters.
However, some of those priors do not make sense in the context of our application.
Instead, we used prior predictive checking to set epidist
-specific default priors which produce epidemiological delay distribution mean and standard deviation parameters in a reasonable range.
For example, for the brms::lognormal()
latent individual model, we suggest the following prior distributions for the brms
mu
and sigma
intercept parameters:
family <- "lognormal"
epidist_family <- epidist_family(data, family)
epidist_formula <- epidist_formula(
data, family = epidist_family, formula = bf(mu ~ 1, sigma ~ 1)
)
# NULL here means no replacing priors from the user!
epidist_prior <- epidist_prior(
data = data,
family = family,
formula = epidist_formula,
prior = NULL
)
epidist_prior
## prior class coef group resp dpar nlpar lb ub source
## normal(1, 1) Intercept family
## normal(-0.7, 0.4) Intercept sigma family
(Note that the functions epidist_family
and epidist_prior
are mostly for internal use!)
Here are the distributions on the delay distribution mean and standard deviation parameters that these prior distributions imply:
set.seed(1)
fit_ppc <- epidist(
data = data,
formula = bf(mu ~ 1, sigma ~ 1),
family = "lognormal",
sample_prior = "only",
seed = 1
)
pred <- predict_delay_parameters(fit_ppc)
pred |>
as.data.frame() |>
pivot_longer(
cols = c("mu", "sigma", "mean", "sd"),
names_to = "parameter",
values_to = "value"
) |>
filter(parameter %in% c("mean", "sd")) |>
ggplot(aes(x = value, y = after_stat(density))) +
geom_histogram() +
facet_wrap(. ~ parameter, scales = "free") +
labs(x = "", y = "Density") +
theme_minimal() +
scale_x_log10(labels = comma)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## 1% 10% 25% 50% 75% 90% 99%
## 0.3558084 1.0303217 1.9577108 3.7682459 7.5819517 14.5496488 46.0359735
## 1% 10% 25% 50% 75% 90%
## 0.03582676 0.37495328 1.01121384 2.72009640 7.13444186 18.28852812
## 99%
## 94.99318781
What do the parameters in my model output correspond to?
The epidist
package uses brms
to fit models.
This means that the model output will include brms
-style names for parameters.
Here, we provide a table giving the correspondence between the distributional parameter names used in brms
and those used in standard R functions for some common likelihood families.
Family |
brms parameter |
R parameter |
---|---|---|
lognormal() |
mu |
meanlog |
lognormal() |
sigma |
sdlog |
Gamma() |
mu |
shape / scale |
Gamma() |
shape |
shape |
Note that all families in brms
are parameterised with some measure of centrality mu
as their first parameter.
This parameter does not necessarily correspond to the mean: hence the provision of a function add_mean_sd
within epidist
to add columns containing the natural scale mean and standard deviation to a data.frame
of draws.