Skip to contents

In this vignette, we use the epidist package to analyze line list data from the 2014-2016 outbreak of Ebola in Sierra Leone (World Health Organization 2016). These data were collated by Fang et al. (2016). We provide the data in the epidist package via ?sierra_leone_ebola_data. In analyzing this data, we demonstrate the following features of epidist:

  1. Fitting district-sex stratified partially pooled delay distribution estimates with a lognormal delay distribution.
  2. Post-processing and plotting functionality using the integration of brms functionality with the tidybayes package.

The packages used in this article are:

For users new to epidist, before reading this article we recommend beginning with the “Getting started with epidist” vignette.

1 Data preparation

We begin by loading the Ebola line list data:

data("sierra_leone_ebola_data")

The data has 8358 rows, each corresponding to a unique case report id. The columns of the data are the individuals name (retracted, and hence can be removed), age, sex, the dates of Ebola symptom onset and positive sample, and their district and chiefdom.

sierra_leone_ebola_data <- sierra_leone_ebola_data |>
  select(-name) |>
  mutate(
    case = as.integer(id), .keep = "unused",
    sex = case_when(
      sex == "F" ~ "Female",
      sex == "M" ~ "Male"
    )
  )

head(sierra_leone_ebola_data)
#> # A tibble: 6 × 7
#>     age sex    date_of_symptom_onset date_of_sample_tested district chiefdom   
#>   <dbl> <chr>  <dttm>                <dttm>                <chr>    <chr>      
#> 1    20 Female 2014-05-18 00:00:00   2014-05-23 00:00:00   Kailahun Kissi Teng 
#> 2    42 Female 2014-05-20 00:00:00   2014-05-25 00:00:00   Kailahun Kissi Teng 
#> 3    45 Female 2014-05-20 00:00:00   2014-05-25 00:00:00   Kailahun Kissi Tonge
#> 4    15 Female 2014-05-21 00:00:00   2014-05-26 00:00:00   Kailahun Kissi Teng 
#> 5    19 Female 2014-05-21 00:00:00   2014-05-26 00:00:00   Kailahun Kissi Teng 
#> 6    55 Female 2014-05-21 00:00:00   2014-05-26 00:00:00   Kailahun Kissi Teng 
#> # ℹ 1 more variable: case <int>

fraction <- 5
ndistrict <- length(unique(sierra_leone_ebola_data$district))

Figure 1.1 shows the dates of symptom onset and sample testing for cases across in each district. (In this figure, we filter down to every 5th case in order to avoid overplotting.) We can see that the start time and course of the epidemic varies across districts.

sierra_leone_ebola_data |>
  filter(case %% fraction == 0) |>
  ggplot() +
  geom_segment(
    aes(
      x = date_of_symptom_onset, xend = date_of_sample_tested,
      y = case, yend = case
    ),
    col = "grey"
  ) +
  geom_point(aes(x = date_of_symptom_onset, y = case), col = "#56B4E9") +
  geom_point(aes(x = date_of_sample_tested, y = case), col = "#009E73") +
  facet_wrap(district ~ ., ncol = 2) +
  labs(x = "", y = "Case ID") +
  theme_minimal()
Primary and secondary event times for every 5th case, over the 14 districts of Sierra Leone.

Figure 1.1: Primary and secondary event times for every 5th case, over the 14 districts of Sierra Leone.

We can use a map to help visualise the outbreak across space. To create a map, we first join the Ebola data to shapefiles for the districts of Sierra Leone. These shapefiles were obtained from the Database of Global Administrative Areas (GADM).

sf <- st_read(
  system.file("extdata/gadm41_SLE_shp/gadm41_SLE_2.shp", package = "epidist"),
  quiet = TRUE
)

sierra_leone_ebola_data_sf <- select(sf, district = NAME_2, geometry) |>
  left_join(
    sierra_leone_ebola_data |>
      group_by(district) |>
      summarise(cases = n())
  )

Figure 1.2 shows that the majority of cases were concentrated in the Western Urban district. This district contains the capital of the country, and has the highest population (as such, one may also be interested in the prevalence of Ebola, that is the number of cases divided by the population size).

ggplot(sierra_leone_ebola_data_sf, aes(fill = cases, geometry = geometry)) +
  geom_sf() +
  scale_fill_viridis_c(na.value = "lightgrey") +
  theme_minimal() +
  labs(fill = "Cases")
A cloropleth showing the total number of Ebola cases in each district of Sierra Leone.

Figure 1.2: A cloropleth showing the total number of Ebola cases in each district of Sierra Leone.

2 Fitting sex-district stratified delay distributions

To understand the delay between time of symptom onset and time of sample testing, we fit a range of statistical models using the epidist package. In some models, we vary the parameters of the delay distribution by sex or by district. For the lognormal delay distribution these parameters are the mean and standard deviation of the underlying normal distribution. That is, \(\mu\) and \(\sigma\) such that when \(x \sim \mathcal{N}(\mu, \sigma)\) then \(\exp(x)\) has a lognormal distribution.

2.1 Data preparation

To prepare the data, we begin by transforming the date columns to ptime and stime columns for the times of the primary and secondary events respectively. Both of these columns are relative to the first date of symptom onset in the data:

sierra_leone_ebola_data <- sierra_leone_ebola_data |>
  mutate(
    # use lubridate::ymd() to drop any sub-date time info
    date_of_symptom_onset = ymd(date_of_symptom_onset),
    date_of_sample_tested = ymd(date_of_sample_tested),
    # ptime and stime represent the number of days elapsed since the earliest
    # date of symptom onset in the data
    ptime = as.numeric(date_of_symptom_onset - min(date_of_symptom_onset)),
    stime = as.numeric(date_of_sample_tested - min(date_of_symptom_onset))
  ) |>
  select(case, ptime, stime, age, sex, district)

head(sierra_leone_ebola_data)
#> # A tibble: 6 × 6
#>    case ptime stime   age sex    district
#>   <int> <dbl> <dbl> <dbl> <chr>  <chr>   
#> 1     1     0     5    20 Female Kailahun
#> 2     2     2     7    42 Female Kailahun
#> 3     3     2     7    45 Female Kailahun
#> 4     4     3     8    15 Female Kailahun
#> 5     5     3     8    19 Female Kailahun
#> 6     6     3     8    55 Female Kailahun

Next, we use observe_process() to add interval censoring columns giving the lower and upper bounds on the primary and secondary event times:

obs_cens <- observe_process(sierra_leone_ebola_data)
head(obs_cens)
#> # A tibble: 6 × 16
#>    case ptime stime   age sex    district ptime_daily ptime_lwr ptime_upr
#>   <int> <dbl> <dbl> <dbl> <chr>  <chr>          <dbl>     <dbl>     <dbl>
#> 1     1     0     5    20 Female Kailahun           0         0         1
#> 2     2     2     7    42 Female Kailahun           2         2         3
#> 3     3     2     7    45 Female Kailahun           2         2         3
#> 4     4     3     8    15 Female Kailahun           3         3         4
#> 5     5     3     8    19 Female Kailahun           3         3         4
#> 6     6     3     8    55 Female Kailahun           3         3         4
#> # ℹ 7 more variables: stime_daily <dbl>, stime_lwr <dbl>, stime_upr <dbl>,
#> #   delay_daily <dbl>, delay_lwr <dbl>, delay_upr <dbl>, obs_time <dbl>

For the time being, we filter the data to only complete cases (i.e. rows of the data which have no missing values1).

n <- nrow(obs_cens)
obs_cens <- obs_cens[complete.cases(obs_cens), ]
n_complete <- nrow(obs_cens)

Here, 83% of the cases were complete.

subsample <- 0.2

Additionally, to speed up computation, we take a random 20% subsample of the complete data. (In a real analysis, we’d recommend using all the available data).

obs_cens <- obs_cens |>
  slice_sample(n = round(n_complete * subsample), replace = FALSE)

2.2 Model fitting

To prepare the data for use with the latent individual model, we use the function as_latent_individual():

obs_prep <- as_latent_individual(obs_cens)
head(obs_prep)
#> # A tibble: 6 × 22
#>    case ptime stime   age sex   district      ptime_daily ptime_lwr ptime_upr
#>   <int> <dbl> <dbl> <dbl> <chr> <chr>               <dbl>     <dbl>     <dbl>
#> 1  3073   148   155    38 Male  Bombali               148       148       149
#> 2  3125   149   155    22 Male  Western Urban         149       149       150
#> 3  2774   143   157    28 Male  Kenema                143       143       144
#> 4   547    71    75     7 Male  Kenema                 71        71        72
#> 5  5394   186   191    11 Male  Bombali               186       186       187
#> 6  3692   159   168    29 Male  Bombali               159       159       160
#> # ℹ 13 more variables: stime_daily <dbl>, stime_lwr <dbl>, stime_upr <dbl>,
#> #   delay_daily <dbl>, delay_lwr <dbl>, delay_upr <dbl>, obs_time <dbl>,
#> #   relative_obs_time <dbl>, pwindow <dbl>, woverlap <dbl>, swindow <dbl>,
#> #   delay <dbl>, row_id <fct>

Now we are ready to fit the latent individual model.

2.2.1 Intercept-only model

We start by fitting a single lognormal distribution to the data. This model assumes that a single distribution describes all delays in the data, regardless of the case’s location, sex, or any other covariates. To do this, we set formula = bf(mu ~ 1, sigma ~ 1) to place an model with only and intercept parameter (i.e. ~ 1 in R formula syntax) on the mu and sigma parameters of the lognormal distribution. This distribution is specified using family = lognormal().

fit <- epidist(
  data = obs_prep,
  formula = bf(mu ~ 1, sigma ~ 1),
  family = lognormal(),
  algorithm = "sampling",
  refresh = 0,
  silent = 2,
  seed = 1
)

The fit object is a brmsfit object, and has the associated range of methods. See methods(class = "brmsfit") for more details. For example, we may use summary() to view information about the fitted model, including posterior estimates for the regression coefficients:

summary(fit)
#>  Family: latent_lognormal 
#>   Links: mu = identity; sigma = log 
#> Formula: delay | vreal(relative_obs_time, pwindow, swindow) ~ 1 
#>          sigma ~ 1
#>    Data: data (Number of observations: 1390) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept           1.61      0.02     1.58     1.64 1.00     8684     2357
#> sigma_Intercept    -0.53      0.02    -0.57    -0.49 1.00     8236     2538
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

2.2.2 Sex-stratified model

To fit a model which varies the parameters of the fitted lognormal distribution, mu and sigma, by sex we alter the formula specification to include fixed effects for sex ~ 1 + sex as follows:

fit_sex <- epidist(
  data = obs_prep,
  formula = bf(mu ~ 1 + sex, sigma ~ 1 + sex),
  family = lognormal(),
  algorithm = "sampling",
  refresh = 0,
  silent = 2,
  seed = 1
)

A summary of the model shows that males tend to have longer delays (the posterior mean of sexMale is 0.06) and greater delay variation (the posterior mean of sigma_sexMale is -0.01). For the sexMale effect, the 95% credible interval is greater than zero, whereas for the sigma_sexMale effect the 95% credible interval includes zero. It is important to note that the estimates represent an average of the observed data, and individual delays between men and women vary significantly.

summary(fit_sex)
#>  Family: latent_lognormal 
#>   Links: mu = identity; sigma = log 
#> Formula: delay | vreal(relative_obs_time, pwindow, swindow) ~ sex 
#>          sigma ~ 1 + sex
#>    Data: data (Number of observations: 1390) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Regression Coefficients:
#>                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept           1.58      0.02     1.53     1.63 1.00    11131     2905
#> sigma_Intercept    -0.52      0.03    -0.58    -0.47 1.00     7580     3070
#> sexMale             0.06      0.03    -0.00     0.13 1.00     9137     2883
#> sigma_sexMale      -0.01      0.04    -0.09     0.07 1.00     7746     3049
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

2.2.3 Sex-district stratified model

Finally, we will fit a model which also varies by district. To do this, we will use district level random effects, assumed to be drawn from a shared normal distribution, within the model for both the mu and sigma parameters. These random effects are specified by including (1 | district) in the formulas:

fit_sex_district <- epidist(
  data = obs_prep,
  formula = bf(
    mu ~ 1 + sex + (1 | district),
    sigma ~ 1 + sex + (1 | district)
  ),
  family = lognormal(),
  algorithm = "sampling",
  refresh = 0,
  silent = 2,
  seed = 1
)

For this model, along with looking at the summary(), we may also use the brms::ranef() function to look at the estimates of the random effects:

summary(fit_sex_district)
#>  Family: latent_lognormal 
#>   Links: mu = identity; sigma = log 
#> Formula: delay | vreal(relative_obs_time, pwindow, swindow) ~ sex + (1 | district) 
#>          sigma ~ 1 + sex + (1 | district)
#>    Data: data (Number of observations: 1390) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Multilevel Hyperparameters:
#> ~district (Number of levels: 14) 
#>                     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept)           0.13      0.04     0.07     0.22 1.00     1981     2500
#> sd(sigma_Intercept)     0.23      0.07     0.13     0.40 1.00     1800     2234
#> 
#> Regression Coefficients:
#>                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept           1.61      0.05     1.52     1.71 1.00     2423     2615
#> sigma_Intercept    -0.61      0.07    -0.76    -0.46 1.00     1991     2516
#> sexMale             0.05      0.03    -0.00     0.11 1.00     9025     2829
#> sigma_sexMale       0.01      0.04    -0.07     0.09 1.00     7318     2851
#> 
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
ranef(fit_sex_district)
#> $district
#> , , Intercept
#> 
#>                   Estimate  Est.Error        Q2.5        Q97.5
#> Bo             0.049659543 0.07648929 -0.09711603  0.198415850
#> Bombali        0.174525747 0.05101936  0.07642013  0.280136825
#> Bonthe         0.008627425 0.12711363 -0.25858180  0.272735075
#> Kailahun      -0.033957270 0.05989824 -0.15194590  0.081417483
#> Kambia         0.014035709 0.08728953 -0.15802057  0.184208650
#> Kenema        -0.125336198 0.07186232 -0.27571403  0.008911684
#> Koinadugu      0.131315706 0.10223466 -0.05158371  0.352779725
#> Kono          -0.048832098 0.07882006 -0.20807180  0.101906450
#> Moyamba        0.022377977 0.08328607 -0.14132422  0.187826300
#> Port Loko     -0.001518065 0.05623126 -0.11547050  0.108663925
#> Pujehun       -0.036766564 0.11669134 -0.27599690  0.186260275
#> Tonkolili      0.047941057 0.06016027 -0.07005451  0.168607925
#> Western Rural -0.017555863 0.05578999 -0.13222013  0.088493313
#> Western Urban -0.172633700 0.05531442 -0.28503890 -0.065921998
#> 
#> , , sigma_Intercept
#> 
#>                   Estimate  Est.Error        Q2.5       Q97.5
#> Bo            -0.075404985 0.12119463 -0.31379113  0.16745023
#> Bombali       -0.275437891 0.08655618 -0.44781052 -0.10861032
#> Bonthe        -0.051516361 0.23732468 -0.53411608  0.40638035
#> Kailahun      -0.209597067 0.09857676 -0.40725572 -0.01650580
#> Kambia         0.259667263 0.12118200  0.03643989  0.50790048
#> Kenema         0.195191620 0.09983437  0.01150421  0.39580870
#> Koinadugu      0.006092283 0.15098508 -0.28488527  0.31472935
#> Kono           0.101216931 0.11122112 -0.10887437  0.32718263
#> Moyamba        0.142895624 0.11660600 -0.07898224  0.37835145
#> Port Loko      0.085841299 0.08613680 -0.08135099  0.25841027
#> Pujehun       -0.139828100 0.22803379 -0.61120480  0.27803248
#> Tonkolili     -0.235290473 0.10141871 -0.43479545 -0.03705476
#> Western Rural  0.028077793 0.08536926 -0.13896268  0.19957480
#> Western Urban  0.192813792 0.08029768  0.03438413  0.35998105

2.3 Posterior expectations

To go further than summaries of the fitted model, we recommend using the tidybayes package. For example, to obtain the posterior expectation of the delay distribution, under no censoring or truncation, we may use the modelr::data_grid() function in combination with the tidybayes::add_epred_draws() function. The tidybayes::add_epred_draws() function uses the (internal) posterior_epred_latent_lognormal() function implemented in epidist for the latent_lognormal custom brms family.

In Figure 2.1 we show the posterior expectation of the delay distribution for each of the three fitted models. Figure 2.1B illustrates the higher mean of men as compared with women.

epred_draws <- obs_prep |>
  as.data.frame() |>
  data_grid(NA) |>
  mutate(relative_obs_time = NA, pwindow = NA, swindow = NA) |>
  add_epred_draws(fit, dpar = TRUE)

epred_base_figure <- epred_draws |>
  ggplot(aes(x = .epred)) +
  stat_halfeye() +
  labs(x = "", y = "", title = "Intercept-only", tag = "A") +
  theme_minimal()

epred_draws_sex <- obs_prep |>
  as.data.frame() |>
  data_grid(sex) |>
  mutate(relative_obs_time = NA, pwindow = NA, swindow = NA) |>
  add_epred_draws(fit_sex, dpar = TRUE)

epred_sex_figure <- epred_draws_sex |>
  ggplot(aes(x = .epred, y = sex)) +
  stat_halfeye() +
  labs(x = "", y = "", title = "Sex-stratified", tag = "B") +
  theme_minimal()

epred_draws_sex_district <- obs_prep |>
  as.data.frame() |>
  data_grid(sex, district) |>
  mutate(relative_obs_time = NA, pwindow = NA, swindow = NA) |>
  add_epred_draws(fit_sex_district, dpar = TRUE)

epred_sex_district_figure <- epred_draws_sex_district |>
  ggplot(aes(x = .epred, y = district)) +
  stat_pointinterval() +
  facet_grid(. ~ sex) +
  labs(
    x = "Posterior expectation of the delay", y = "",
    title = "Sex-district-stratified", tag = "C"
  ) +
  scale_y_discrete(limits = rev) +
  theme_minimal()
epred_base_figure / epred_sex_figure / epred_sex_district_figure +
  plot_layout(heights = c(1, 1.5, 2.5))
The fitted posterior expectations of the delay distribution for each model.

Figure 2.1: The fitted posterior expectations of the delay distribution for each model.

2.4 Linear predictor posteriors

The tidybayes package also allows users to generate draws of the linear predictors for all distributional parameters using tidybayes::add_linpred_draws(). For example, for the mu parameter in the sex-district stratified model (Figure 2.2):

linpred_draws_sex_district <- obs_prep |>
  as.data.frame() |>
  data_grid(sex, district) |>
  mutate(relative_obs_time = NA, pwindow = NA, swindow = NA) |>
  add_linpred_draws(fit_sex_district, dpar = TRUE)
linpred_draws_sex_district |>
  ggplot(aes(x = mu, y = district)) +
  stat_pointinterval() +
  facet_grid(. ~ sex) +
  labs(x = "Posterior of the mu linear predictor", y = "") +
  scale_y_discrete(limits = rev) +
  theme_minimal()
The posterior distribution of the linear predictor of mu parameter within the sex-district stratified model. The posterior expectations in Section 2.3 are a function of both the mu linear predictor posterior distribution and sigma linear predictor posterior distribution.

Figure 2.2: The posterior distribution of the linear predictor of mu parameter within the sex-district stratified model. The posterior expectations in Section 2.3 are a function of both the mu linear predictor posterior distribution and sigma linear predictor posterior distribution.

2.5 Delay posterior distributions

Posterior predictions of the delay distribution are an important output of an analysis with the epidist package. In this section, we demonstrate how to produce either a discrete probability mass function representation, or continuous probability density function representation of the delay distribution.

2.5.1 Discrete probability mass function

To generate a discrete probability mass function (PMF) we predict the delay distribution that would be observed with daily censoring and no right truncation. To do this, we set each of pwindow and swindow to 1 for daily censoring, and relative_obs_time to 1000 for no censoring. Figure 2.3 shows the result, where the few delays greater than 30 are omitted from the figure.

draws_pmf <- obs_prep |>
  as.data.frame() |>
  mutate(relative_obs_time = 1000, pwindow = 1, swindow = 1) |>
  add_predicted_draws(fit, ndraws = 1000)

pmf_base_figure <- ggplot(draws_pmf, aes(x = .prediction)) +
  geom_bar(aes(y = after_stat(count / sum(count)))) +
  labs(x = "", y = "", title = "Intercept-only", tag = "A") +
  scale_x_continuous(limits = c(0, 30)) +
  theme_minimal()

draws_sex_pmf <- obs_prep |>
  as.data.frame() |>
  data_grid(sex) |>
  mutate(relative_obs_time = 1000, pwindow = 1, swindow = 1) |>
  add_predicted_draws(fit_sex, ndraws = 1000)

pmf_sex_figure <- draws_sex_pmf |>
  ggplot(aes(x = .prediction)) +
  geom_bar(aes(y = after_stat(count / ave(count, PANEL, FUN = sum)))) +
  labs(x = "", y = "", title = "Sex-stratified", tag = "B") +
  facet_grid(. ~ sex) +
  scale_x_continuous(limits = c(0, 30)) +
  theme_minimal()

draws_sex_district_pmf <- obs_prep |>
  as.data.frame() |>
  data_grid(sex, district) |>
  mutate(relative_obs_time = 1000, pwindow = 1, swindow = 1) |>
  add_predicted_draws(fit_sex_district, ndraws = 1000)

pmf_sex_district_figure <- draws_sex_district_pmf |>
  mutate(
    district = case_when(
      district == "Port Loko" ~ "Port\nLoko",
      district == "Western Rural" ~ "Western\nRural",
      district == "Western Urban" ~ "Western\nUrban",
      .default = district
    )
  ) |>
  ggplot(aes(x = .prediction)) +
  geom_bar(aes(y = after_stat(count / ave(count, PANEL, FUN = sum)))) +
  labs(
    x = "PMF with daily censoring and no truncation", y = "",
    title = "Sex-district-stratified", tag = "C"
  ) +
  facet_grid(district ~ sex) +
  scale_x_continuous(limits = c(0, 30)) +
  theme_minimal()
pmf_base_figure / pmf_sex_figure / pmf_sex_district_figure +
  plot_layout(heights = c(1, 1.5, 5.5))
Posterior predictions of the discrete probability mass function for each of the fitted models.

Figure 2.3: Posterior predictions of the discrete probability mass function for each of the fitted models.

2.5.2 Continuous probability density function

The posterior predictive distribution under no truncation and no censoring. That is to produce continuous delay times (Figure 2.4):

draws_pdf <- obs_prep |>
  as.data.frame() |>
  mutate(relative_obs_time = 1000, pwindow = 0, swindow = 0) |>
  add_predicted_draws(fit, ndraws = 1000)

pdf_base_figure <- ggplot(draws_pdf, aes(x = .prediction)) +
  geom_density() +
  labs(x = "", y = "", title = "Intercept-only", tag = "A") +
  scale_x_continuous(limits = c(0, 30)) +
  theme_minimal()

draws_sex_pdf <- obs_prep |>
  as.data.frame() |>
  data_grid(sex) |>
  mutate(relative_obs_time = 1000, pwindow = 0, swindow = 0) |>
  add_predicted_draws(fit_sex, ndraws = 1000)

pdf_sex_figure <- draws_sex_pdf |>
  ggplot(aes(x = .prediction)) +
  geom_density() +
  labs(x = "", y = "", title = "Sex-stratified", tag = "B") +
  facet_grid(. ~ sex) +
  scale_x_continuous(limits = c(0, 30)) +
  theme_minimal()

draws_sex_district_pdf <- obs_prep |>
  as.data.frame() |>
  data_grid(sex, district) |>
  mutate(relative_obs_time = 1000, pwindow = 0, swindow = 0) |>
  add_predicted_draws(fit_sex_district, ndraws = 1000)

pdf_sex_district_figure <- draws_sex_district_pdf |>
  mutate(
    district = case_when(
      district == "Port Loko" ~ "Port\nLoko",
      district == "Western Rural" ~ "Western\nRural",
      district == "Western Urban" ~ "Western\nUrban",
      .default = district
    )
  ) |>
  ggplot(aes(x = .prediction)) +
  geom_density() +
  labs(
    x = "PDF with no censoring and no truncation", y = "",
    title = "Sex-district-stratified", tag = "C"
  ) +
  facet_grid(district ~ sex) +
  scale_x_continuous(limits = c(0, 30)) +
  theme_minimal()
pdf_base_figure / pdf_sex_figure / pdf_sex_district_figure +
  plot_layout(heights = c(1, 1.5, 5.5))
Posterior predictions of the continuous probability density function for each of the fitted models.

Figure 2.4: Posterior predictions of the continuous probability density function for each of the fitted models.

3 Conclusion

In this vignette, we demonstrate how the epidist package can be used to fit delay distribution models. These models can be stratified by covariates such as sex and district using fixed and random effects. Post-processing and prediction with fitted models is possible using the tidybayes package. We illustrate generating posterior expectations, the posteriors of linear predictors, as well as discrete and continuous representations of the delay distribution.

Bibliography

Fang, Li-Qun, Yang Yang, Jia-Fu Jiang, Hong-Wu Yao, David Kargbo, Xin-Lou Li, Bao-Gui Jiang, et al. 2016. “Transmission Dynamics of Ebola Virus Disease and Intervention Effectiveness in Sierra Leone.” Proceedings of the National Academy of Sciences 113 (16): 4488–93.
World Health Organization. 2016. “Ebola Outbreak 2014-2016 - West Africa.” https://www.who.int/emergencies/situations/ebola-outbreak-2014-2016-West-Africa.