Skip to contents

Here we provide tips for working with the epidist package, and answers to “frequently” asked questions. If you have a question about using the package, please create an issue and we will endeavour to get back to you soon!

The code block below is provided to facilitate reproduction of this script, if required!

library(epidist)
library(brms)
library(dplyr)
library(ggplot2)
library(scales)
library(tidyr)

set.seed(1)

meanlog <- 1.8
sdlog <- 0.5
obs_time <- 25
sample_size <- 200

obs_cens_trunc_samp <- simulate_gillespie(seed = 101) |>
  simulate_secondary(
    meanlog = meanlog,
    sdlog = sdlog
  ) |>
  observe_process() |>
  filter_obs_by_obs_time(obs_time = obs_time) |>
  slice_sample(n = sample_size, replace = FALSE)

data <- as_latent_individual(obs_cens_trunc_samp)
fit <- epidist(
  data,
  formula = bf(mu ~ 1, sigma ~ 1),
  seed = 1
)

I would like to work with the samples output

The output of a call to epidist is compatible with typical Stan workflows. We recommend use of the posterior package for working with samples from MCMC or other sampling algorithms. For example, the function posterior::as_draws_df() may be used to obtain a dataframe of MCMC draws for specified parameters.

library(posterior)
draws <- as_draws_df(fit, variable = c("Intercept", "Intercept_sigma"))
head(draws)
## # A draws_df: 6 iterations, 1 chains, and 2 variables
##   Intercept Intercept_sigma
## 1       1.8            0.53
## 2       1.8            0.56
## 3       1.8            0.46
## 4       1.7            0.54
## 5       1.8            0.44
## 6       1.8            0.53
## # ... hidden reserved variables {'.chain', '.iteration', '.draw'}

How can I assess if sampling has converged?

The output of a call to epidist is compatible with typical Stan workflows. We recommend use of the bayesplot package for sampling diagnostic plots. For example, the function bayesplot::mcmc_trace() can be used to produce traceplots for specified parameters.

library(bayesplot)
mcmc_trace(fit, pars = c("Intercept", "Intercept_sigma"))

I’d like to run a simulation study

We recommend use of the purrr package for running many epidist models, for example as a part of a simulation study. We particularly highlight two functions which might be useful:

  1. purrr::map() (and other similar functions) for iterating over a list of inputs.
  2. purrr::safely() which ensures that the function called “always succeeds”. In other words, if there is an error it will be captured and output, rather than ending computation (and potentially disrupting a call to purrr::map()).

For an example use of these functions, have a look at the epidist-paper repository containing the code for Park et al. (2024). (Note that in that codebase, we use map as a part of a targets pipeline.)

How did you choose the default priors for epidist?

brms provides default priors for all parameters. However, some of those priors do not make sense in the context of our application. Instead, we used prior predictive checking to set epidist-specific default priors which produce epidemiological delay distribution mean and standard deviation parameters in a reasonable range.

For example, for the brms::lognormal() latent individual model, we suggest the following prior distributions for the brms mu and sigma intercept parameters:

family <- "lognormal"

epidist_family <- epidist_family(data, family)
epidist_formula <- epidist_formula(
  data, family = epidist_family, formula = bf(mu ~ 1, sigma ~ 1)
)

# NULL here means no replacing priors from the user!
epidist_prior <- epidist_prior(
  data = data,
  family = family,
  formula = epidist_formula,
  prior = NULL
)

epidist_prior
##              prior     class coef group resp  dpar nlpar lb ub source
##       normal(1, 1) Intercept                                   family
##  normal(-0.7, 0.4) Intercept                 sigma             family

(Note that the functions epidist_family() and epidist_prior() are mostly for internal use!)

Here are the distributions on the delay distribution mean and standard deviation parameters that these prior distributions imply:

set.seed(1)
fit_ppc <- epidist(
  data = data,
  formula = bf(mu ~ 1, sigma ~ 1),
  family = "lognormal",
  sample_prior = "only",
  seed = 1
)
pred <- predict_delay_parameters(fit_ppc)

pred |>
  as.data.frame() |>
  pivot_longer(
    cols = c("mu", "sigma", "mean", "sd"),
    names_to = "parameter",
    values_to = "value"
  ) |>
  filter(parameter %in% c("mean", "sd")) |>
  ggplot(aes(x = value, y = after_stat(density))) +
  geom_histogram() +
  facet_wrap(. ~ parameter, scales = "free") +
  labs(x = "", y = "Density") +
  theme_minimal() +
  scale_x_log10(labels = comma)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

quantile(pred$mean, c(0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99))
##         1%        10%        25%        50%        75%        90%        99% 
##  0.3558084  1.0303217  1.9577108  3.7682459  7.5819517 14.5496488 46.0359735
quantile(pred$sd, c(0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99))
##          1%         10%         25%         50%         75%         90% 
##  0.03582676  0.37495328  1.01121384  2.72009640  7.13444186 18.28852812 
##         99% 
## 94.99318781

What do the parameters in my model output correspond to?

The epidist package uses brms to fit models. This means that the model output will include brms-style names for parameters. Here, we provide a table giving the correspondence between the distributional parameter names used in brms and those used in standard R functions for some common likelihood families.

Family brms parameter R parameter
lognormal() mu meanlog
lognormal() sigma sdlog
Gamma() mu shape / scale
Gamma() shape shape

Note that all families in brms are parameterised with some measure of centrality mu as their first parameter. This parameter does not necessarily correspond to the mean: hence the provision of a function add_mean_sd() within epidist to add columns containing the natural scale mean and standard deviation to a data.frame of draws.

Bibliography

Park, Sang Woo, Andrei R. Akhmetzhanov, Kelly Charniga, Anne Cori, Nicholas G. Davies, Jonathan Dushoff, Sebastian Funk, et al. 2024. “Estimating Epidemiological Delay Distributions for Infectious Diseases.” medRxiv. https://doi.org/10.1101/2024.01.12.24301247.