Using epidist to estimate delay between symptom onset and positive test for an Ebola outbreak in Sierra Leone
Source:vignettes/ebola.Rmd
ebola.Rmd
In this vignette, we use the epidist
package to analyze line list data from the 2014-2016 outbreak of Ebola in Sierra Leone (World Health Organization 2016).
These data were collated by Fang et al. (2016).
We provide the data in the epidist
package via ?sierra_leone_ebola_data
.
In analyzing this data, we demonstrate the following features of epidist
:
- Fitting district-sex stratified partially pooled delay distribution estimates with a lognormal delay distribution.
- Post-processing and plotting functionality using the integration of
brms
functionality with thetidybayes
package.
The packages used in this article are:
set.seed(123)
library(epidist)
library(brms)
library(tibble)
library(dplyr)
library(ggplot2)
library(sf)
library(tidybayes)
library(modelr)
library(patchwork)
library(lubridate)
library(cmdstanr) # nolint
For users new to epidist
, before reading this article we recommend beginning with the “Getting started with epidist
” vignette.
1 Using the cmdstanr backend
As the models explored in this vignette are relatively complex, we recommend using the cmdstanr
backend for fitting models as it is typically more performant than the default rstan
backend.
To use the cmdstanr
backend, we first need to install CmdStan (see the README for more details).
We can check we have everything we need as follows:
cmdstanr::cmdstan_version()
#> [1] "2.35.0"
2 Data preparation
We begin by loading the Ebola line list data:
data("sierra_leone_ebola_data")
The data has 8358 rows, each corresponding to a unique case report id
.
The columns of the data are the individuals name (retracted, and hence can be removed), age, sex, the dates of Ebola symptom onset and positive sample, and their district and chiefdom.
sierra_leone_ebola_data <- sierra_leone_ebola_data |>
select(-name) |>
mutate(
case = as.integer(id), .keep = "unused",
sex = case_when(
sex == "F" ~ "Female",
sex == "M" ~ "Male"
)
)
head(sierra_leone_ebola_data)
#> # A tibble: 6 × 7
#> age sex date_of_symptom_onset date_of_sample_tested district chiefdom
#> <dbl> <chr> <dttm> <dttm> <chr> <chr>
#> 1 20 Female 2014-05-18 00:00:00 2014-05-23 00:00:00 Kailahun Kissi Teng
#> 2 42 Female 2014-05-20 00:00:00 2014-05-25 00:00:00 Kailahun Kissi Teng
#> 3 45 Female 2014-05-20 00:00:00 2014-05-25 00:00:00 Kailahun Kissi Tonge
#> 4 15 Female 2014-05-21 00:00:00 2014-05-26 00:00:00 Kailahun Kissi Teng
#> 5 19 Female 2014-05-21 00:00:00 2014-05-26 00:00:00 Kailahun Kissi Teng
#> 6 55 Female 2014-05-21 00:00:00 2014-05-26 00:00:00 Kailahun Kissi Teng
#> # ℹ 1 more variable: case <int>
fraction <- 5
ndistrict <- length(unique(sierra_leone_ebola_data$district))
Figure 2.1 shows the dates of symptom onset and sample testing for cases across in each district. (In this figure, we filter down to every 5th case in order to avoid overplotting.) We can see that the start time and course of the epidemic varies across districts.
sierra_leone_ebola_data |>
filter(case %% fraction == 0) |>
ggplot() +
geom_segment(
aes(
x = date_of_symptom_onset, xend = date_of_sample_tested,
y = case, yend = case
),
col = "grey"
) +
geom_point(aes(x = date_of_symptom_onset, y = case), col = "#56B4E9") +
geom_point(aes(x = date_of_sample_tested, y = case), col = "#009E73") +
facet_wrap(district ~ ., ncol = 2) +
labs(x = "", y = "Case ID") +
theme_minimal()
We can use a map to help visualise the outbreak across space. To create a map, we first join the Ebola data to shapefiles for the districts of Sierra Leone. These shapefiles were obtained from the Database of Global Administrative Areas (GADM).
sf <- st_read(
system.file(
"extdata", "gadm41_SLE_shp", "gadm41_SLE_2.shp",
package = "epidist"
),
quiet = TRUE
)
sierra_leone_ebola_data_sf <- select(sf, district = NAME_2, geometry) |>
left_join(
sierra_leone_ebola_data |>
group_by(district) |>
summarise(cases = n())
)
Figure 2.2 shows that the majority of cases were concentrated in the Western Urban district. This district contains the capital of the country, and has the highest population (as such, one may also be interested in the prevalence of Ebola, that is the number of cases divided by the population size).
ggplot(sierra_leone_ebola_data_sf, aes(fill = cases, geometry = geometry)) +
geom_sf() +
scale_fill_viridis_c(na.value = "lightgrey") +
theme_minimal() +
labs(fill = "Cases")
3 Fitting sex-district stratified delay distributions
To understand the delay between time of symptom onset and time of sample testing, we fit a range of statistical models using the epidist
package.
In some models, we vary the parameters of the delay distribution by sex or by district.
For the lognormal delay distribution these parameters are the mean and standard deviation of the underlying normal distribution.
That is, \(\mu\) and \(\sigma\) such that when \(x \sim \mathcal{N}(\mu, \sigma)\) then \(\exp(x)\) has a lognormal distribution.
3.1 Data preparation
To prepare the data, we begin by converting the date columns to Date
objects and selecting the relevant columns:
obs_cens <- sierra_leone_ebola_data |>
tibble() |>
mutate(
# use lubridate::ymd() to drop any sub-date time info
date_of_symptom_onset = ymd(date_of_symptom_onset),
date_of_sample_tested = ymd(date_of_sample_tested)
) |>
select(case, date_of_symptom_onset, date_of_sample_tested, age, sex, district)
head(obs_cens)
#> # A tibble: 6 × 6
#> case date_of_symptom_onset date_of_sample_tested age sex district
#> <int> <date> <date> <dbl> <chr> <chr>
#> 1 1 2014-05-18 2014-05-23 20 Female Kailahun
#> 2 2 2014-05-20 2014-05-25 42 Female Kailahun
#> 3 3 2014-05-20 2014-05-25 45 Female Kailahun
#> 4 4 2014-05-21 2014-05-26 15 Female Kailahun
#> 5 5 2014-05-21 2014-05-26 19 Female Kailahun
#> 6 6 2014-05-21 2014-05-26 55 Female Kailahun
For the time being, we filter the data to only complete cases (i.e. rows of the data which have no missing values1).
n <- nrow(obs_cens)
obs_cens <- obs_cens[complete.cases(obs_cens), ]
n_complete <- nrow(obs_cens)
Here, 83% of the cases were complete.
subsample <- 0.2
Additionally, to speed up computation, we take a random 20% subsample of the complete data. (In a real analysis, we’d recommend using all the available data.)
obs_cens <- obs_cens |>
slice_sample(n = round(n_complete * subsample), replace = FALSE)
Finally, we prepare the data for use with the epidist
package by converting the data to an epidist_linelist_data
object:
linelist_data <- obs_cens |>
as_epidist_linelist_data(
pdate_lwr = "date_of_symptom_onset",
sdate_lwr = "date_of_sample_tested"
)
In this call to [as_epidist_linelist_data()] it has made some assumptions about the data.
First, because we did not supply upper bounds for the primary and secondary events (pdate_upr
and sdate_upr
), it has assumed that the upper bounds are one day after the lower bounds.
Second, because we also did not supply an observation time column (obs_date
), it has assumed that the observation time is the maximum of the secondary event upper bounds.
3.2 Model fitting
To prepare the data for use with the latent individual model, we define the data as being a epidist_latent_model
model object:
obs_prep <- as_epidist_latent_model(linelist_data)
head(obs_prep)
#> # A tibble: 6 × 20
#> ptime_lwr ptime_upr stime_lwr stime_upr obs_time case pdate_lwr sdate_lwr
#> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <date> <date>
#> 1 108 109 111 112 480 1101 2014-09-03 2014-09-06
#> 2 189 190 197 198 480 5584 2014-11-23 2014-12-01
#> 3 111 112 117 118 480 1187 2014-09-06 2014-09-12
#> 4 138 139 142 143 480 2483 2014-10-03 2014-10-07
#> 5 217 218 221 222 480 6796 2014-12-21 2014-12-25
#> 6 157 158 163 164 480 3576 2014-10-22 2014-10-28
#> # ℹ 12 more variables: age <dbl>, sex <chr>, district <chr>, pdate_upr <date>,
#> # sdate_upr <date>, obs_date <date>, relative_obs_time <dbl>, pwindow <dbl>,
#> # woverlap <dbl>, swindow <dbl>, delay <dbl>, .row_id <int>
Now we are ready to fit the latent individual model.
3.2.1 Intercept-only model
We start by fitting a single lognormal distribution to the data.
This model assumes that a single distribution describes all delays in the data, regardless of the case’s location, sex, or any other covariates.
To do this, we set formula = mu ~ 1
to place an model with only an intercept parameter (i.e. ~ 1
in R formula syntax) on the mu
parameter of the lognormal distribution specified using family = lognormal()
.
(Note that the lognormal distribution has two distributional parameters mu
and sigma
.
As a model is not explicitly placed on sigma
, a constant model sigma ~ 1
is assumed.)
fit <- epidist(
data = obs_prep,
formula = mu ~ 1,
family = lognormal(),
algorithm = "sampling",
chains = 2,
cores = 2,
refresh = as.integer(interactive()),
seed = 1,
backend = "cmdstanr"
)
#> Running MCMC with 2 parallel chains...
#>
#> Chain 2 finished in 38.6 seconds.
#> Chain 1 finished in 39.0 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 38.8 seconds.
#> Total execution time: 39.1 seconds.
The fit
object is a brmsfit
object, and has the associated range of methods.
See methods(class = "brmsfit")
for more details.
For example, we may use summary()
to view information about the fitted model, including posterior estimates for the regression coefficients:
summary(fit)
#> Family: latent_lognormal
#> Links: mu = identity; sigma = log
#> Formula: delay | vreal(relative_obs_time, pwindow, swindow) ~ 1
#> sigma ~ 1
#> Data: data (Number of observations: 1390)
#> Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 2000
#>
#> Regression Coefficients:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 1.62 0.02 1.59 1.65 1.00 5455 1420
#> sigma_Intercept -0.54 0.02 -0.58 -0.50 1.00 5122 1549
#>
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
3.2.2 Sex-stratified model
To fit a model which varies the parameters of the fitted lognormal distribution, mu
and sigma
, by sex we alter the formula
specification to include fixed effects for sex ~ 1 + sex
as follows:
fit_sex <- epidist(
data = obs_prep,
formula = bf(mu ~ 1 + sex, sigma ~ 1 + sex),
family = lognormal(),
algorithm = "sampling",
chains = 2,
cores = 2,
refresh = as.integer(interactive()),
seed = 1,
backend = "cmdstanr"
)
#> Running MCMC with 2 parallel chains...
#>
#> Chain 1 finished in 42.9 seconds.
#> Chain 2 finished in 44.2 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 43.5 seconds.
#> Total execution time: 44.3 seconds.
A summary of the model shows that males tend to have longer delays (the posterior mean of sexMale
is 0.03) and greater delay variation (the posterior mean of sigma_sexMale
is 0.05).
For the sexMale
effect, the 95% credible interval is greater than zero, whereas for the sigma_sexMale
effect the 95% credible interval includes zero.
It is important to note that the estimates represent an average of the observed data, and individual delays between men and women vary significantly.
summary(fit_sex)
#> Family: latent_lognormal
#> Links: mu = identity; sigma = log
#> Formula: delay | vreal(relative_obs_time, pwindow, swindow) ~ sex
#> sigma ~ 1 + sex
#> Data: data (Number of observations: 1390)
#> Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 2000
#>
#> Regression Coefficients:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 1.60 0.02 1.56 1.65 1.00 4380 1414
#> sigma_Intercept -0.57 0.03 -0.63 -0.51 1.00 4326 1636
#> sexMale 0.03 0.03 -0.03 0.09 1.00 4873 1528
#> sigma_sexMale 0.05 0.04 -0.03 0.13 1.00 3482 1689
#>
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
3.2.3 Sex-district stratified model
Finally, we will fit a model which also varies by district.
To do this, we will use district level random effects, assumed to be drawn from a shared normal distribution, within the model for both the mu
and sigma
parameters.
These random effects are specified by including (1 | district)
in the formulas:
fit_sex_district <- epidist(
data = obs_prep,
formula = bf(
mu ~ 1 + sex + (1 | district),
sigma ~ 1 + sex + (1 | district)
),
family = lognormal(),
algorithm = "sampling",
chains = 2,
cores = 2,
refresh = as.integer(interactive()),
seed = 1,
backend = "cmdstanr"
)
#> Running MCMC with 2 parallel chains...
#>
#> Chain 1 finished in 49.1 seconds.
#> Chain 2 finished in 49.1 seconds.
#>
#> Both chains finished successfully.
#> Mean chain execution time: 49.1 seconds.
#> Total execution time: 49.2 seconds.
For this model, along with looking at the summary()
, we may also use the brms::ranef()
function to look at the estimates of the random effects:
summary(fit_sex_district)
#> Family: latent_lognormal
#> Links: mu = identity; sigma = log
#> Formula: delay | vreal(relative_obs_time, pwindow, swindow) ~ sex + (1 | district)
#> sigma ~ 1 + sex + (1 | district)
#> Data: data (Number of observations: 1390)
#> Draws: 2 chains, each with iter = 2000; warmup = 1000; thin = 1;
#> total post-warmup draws = 2000
#>
#> Multilevel Hyperparameters:
#> ~district (Number of levels: 14)
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sd(Intercept) 0.13 0.04 0.07 0.22 1.00 1004 1539
#> sd(sigma_Intercept) 0.23 0.06 0.14 0.39 1.00 783 1116
#>
#> Regression Coefficients:
#> Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept 1.63 0.05 1.54 1.72 1.00 918 1268
#> sigma_Intercept -0.64 0.08 -0.80 -0.50 1.00 735 993
#> sexMale 0.04 0.03 -0.02 0.10 1.00 4733 1410
#> sigma_sexMale 0.04 0.04 -0.04 0.12 1.00 3847 1697
#>
#> Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
ranef(fit_sex_district)
#> $district
#> , , Intercept
#>
#> Estimate Est.Error Q2.5 Q97.5
#> Bo 0.061370570 0.07805850 -0.09434620 0.217150750
#> Bombali 0.161739832 0.04852453 0.06956229 0.263049750
#> Bonthe 0.006184697 0.12696442 -0.25235730 0.260503950
#> Kailahun -0.043140439 0.05660398 -0.15724975 0.068502460
#> Kambia 0.048517520 0.08891895 -0.13123840 0.222670750
#> Kenema -0.140988285 0.07328490 -0.29453505 -0.005173668
#> Koinadugu 0.114949950 0.10069287 -0.06990953 0.322390550
#> Kono -0.034245903 0.07768097 -0.19590210 0.114399625
#> Moyamba -0.003899371 0.08491923 -0.17035983 0.159339625
#> Port Loko 0.012269693 0.05442444 -0.09154477 0.121246250
#> Pujehun -0.040962106 0.11806307 -0.28337432 0.182547750
#> Tonkolili 0.034770215 0.05831936 -0.07535285 0.155927325
#> Western Rural 0.007038704 0.05507077 -0.10220350 0.116271600
#> Western Urban -0.186967015 0.05366162 -0.29733588 -0.081107455
#>
#> , , sigma_Intercept
#>
#> Estimate Est.Error Q2.5 Q97.5
#> Bo -0.03993660 0.11777504 -0.26971368 0.186271475
#> Bombali -0.28611810 0.08927144 -0.45377777 -0.112220900
#> Bonthe -0.05263371 0.24958489 -0.56160083 0.416555200
#> Kailahun -0.20056102 0.10346414 -0.41225108 -0.002420407
#> Kambia 0.28971246 0.12194324 0.05987674 0.545790875
#> Kenema 0.21252846 0.10264686 0.02605080 0.418920750
#> Koinadugu 0.01504241 0.14716395 -0.26681377 0.312746900
#> Kono 0.10537514 0.11439856 -0.10130943 0.357216475
#> Moyamba 0.08499545 0.11743754 -0.14130670 0.330522600
#> Port Loko 0.01460764 0.08522727 -0.15130213 0.183499425
#> Pujehun -0.12948202 0.23548079 -0.62976475 0.299674950
#> Tonkolili -0.23271106 0.10183638 -0.43356765 -0.032404575
#> Western Rural 0.03038971 0.08725862 -0.13591113 0.197685850
#> Western Urban 0.24764473 0.08243182 0.09715344 0.413708450
3.3 Posterior expectations
To go further than summaries of the fitted model, we recommend using the tidybayes
package.
For example, to obtain the posterior expectation of the delay distribution, under no censoring or truncation, we may use the modelr::data_grid()
function in combination with the tidybayes::add_epred_draws()
function.
The tidybayes::add_epred_draws()
function uses the epidist_gen_posterior_predict()
function to generate a posterior prediction function for the lognormal()
distribution.
In Figure 3.1 we show the posterior expectation of the delay distribution for each of the three fitted models. Figure 3.1B illustrates the higher mean of men as compared with women.
epred_draws <- obs_prep |>
data_grid(NA) |>
mutate(relative_obs_time = NA, pwindow = NA, swindow = NA) |>
add_epred_draws(fit, dpar = TRUE)
epred_base_figure <- epred_draws |>
ggplot(aes(x = .epred)) +
stat_halfeye() +
labs(x = "", y = "", title = "Intercept-only", tag = "A") +
theme_minimal()
epred_draws_sex <- obs_prep |>
data_grid(sex) |>
mutate(relative_obs_time = NA, pwindow = NA, swindow = NA) |>
add_epred_draws(fit_sex, dpar = TRUE)
epred_sex_figure <- epred_draws_sex |>
ggplot(aes(x = .epred, y = sex)) +
stat_halfeye() +
labs(x = "", y = "", title = "Sex-stratified", tag = "B") +
theme_minimal()
epred_draws_sex_district <- obs_prep |>
data_grid(sex, district) |>
mutate(relative_obs_time = NA, pwindow = NA, swindow = NA) |>
add_epred_draws(fit_sex_district, dpar = TRUE)
epred_sex_district_figure <- epred_draws_sex_district |>
ggplot(aes(x = .epred, y = district)) +
stat_pointinterval() +
facet_grid(. ~ sex) +
labs(
x = "Posterior expectation of the delay", y = "",
title = "Sex-district-stratified", tag = "C"
) +
scale_y_discrete(limits = rev) +
theme_minimal()
epred_base_figure / epred_sex_figure / epred_sex_district_figure +
plot_layout(heights = c(1, 1.5, 2.5))
3.4 Linear predictor posteriors
The tidybayes
package also allows users to generate draws of the linear predictors for all distributional parameters using tidybayes::add_linpred_draws()
.
For example, for the mu
parameter in the sex-district stratified model (Figure 3.2):
linpred_draws_sex_district <- obs_prep |>
as.data.frame() |>
data_grid(sex, district) |>
mutate(relative_obs_time = NA, pwindow = NA, swindow = NA) |>
add_linpred_draws(fit_sex_district, dpar = TRUE)
linpred_draws_sex_district |>
ggplot(aes(x = mu, y = district)) +
stat_pointinterval() +
facet_grid(. ~ sex) +
labs(x = "Posterior of the mu linear predictor", y = "") +
scale_y_discrete(limits = rev) +
theme_minimal()
3.5 Delay posterior distributions
Posterior predictions of the delay distribution are an important output of an analysis with the epidist
package.
In this section, we demonstrate how to produce either a discrete probability mass function representation, or continuous probability density function representation of the delay distribution.
3.5.1 Discrete probability mass function
To generate a discrete probability mass function (PMF) we predict the delay distribution that would be observed with daily censoring and no right truncation.
To do this, we set each of pwindow
and swindow
to 1 for daily censoring, and relative_obs_time
to 1000 for no censoring.
Figure 3.3 shows the result, where the few delays greater than 30 are omitted from the figure.
draws_pmf <- obs_prep |>
as.data.frame() |>
mutate(relative_obs_time = 1000, pwindow = 1, swindow = 1) |>
add_predicted_draws(fit, ndraws = 1000)
pmf_base_figure <- ggplot(draws_pmf, aes(x = .prediction)) +
geom_bar(aes(y = after_stat(count / sum(count)))) +
labs(x = "", y = "", title = "Intercept-only", tag = "A") +
scale_x_continuous(limits = c(0, 30)) +
theme_minimal()
draws_sex_pmf <- obs_prep |>
as.data.frame() |>
data_grid(sex) |>
mutate(relative_obs_time = 1000, pwindow = 1, swindow = 1) |>
add_predicted_draws(fit_sex, ndraws = 1000)
pmf_sex_figure <- draws_sex_pmf |>
ggplot(aes(x = .prediction)) +
geom_bar(aes(y = after_stat(count / ave(count, PANEL, FUN = sum)))) +
labs(x = "", y = "", title = "Sex-stratified", tag = "B") +
facet_grid(. ~ sex) +
scale_x_continuous(limits = c(0, 30)) +
theme_minimal()
draws_sex_district_pmf <- obs_prep |>
as.data.frame() |>
data_grid(sex, district) |>
mutate(relative_obs_time = 1000, pwindow = 1, swindow = 1) |>
add_predicted_draws(fit_sex_district, ndraws = 1000)
pmf_sex_district_figure <- draws_sex_district_pmf |>
mutate(
district = case_when(
district == "Port Loko" ~ "Port\nLoko",
district == "Western Rural" ~ "Western\nRural",
district == "Western Urban" ~ "Western\nUrban",
.default = district
)
) |>
ggplot(aes(x = .prediction)) +
geom_bar(aes(y = after_stat(count / ave(count, PANEL, FUN = sum)))) +
labs(
x = "PMF with daily censoring and no truncation", y = "",
title = "Sex-district-stratified", tag = "C"
) +
facet_grid(district ~ sex) +
scale_x_continuous(limits = c(0, 30)) +
theme_minimal()
pmf_base_figure / pmf_sex_figure / pmf_sex_district_figure +
plot_layout(heights = c(1, 1.5, 5.5))
3.5.2 Continuous probability density function
The posterior predictive distribution under no truncation and no censoring. That is to produce continuous delay times (Figure 3.4):
draws_pdf <- obs_prep |>
as.data.frame() |>
mutate(relative_obs_time = 1000, pwindow = 0, swindow = 0) |>
add_predicted_draws(fit, ndraws = 1000)
pdf_base_figure <- ggplot(draws_pdf, aes(x = .prediction)) +
geom_density() +
labs(x = "", y = "", title = "Intercept-only", tag = "A") +
scale_x_continuous(limits = c(0, 30)) +
theme_minimal()
draws_sex_pdf <- obs_prep |>
as.data.frame() |>
data_grid(sex) |>
mutate(relative_obs_time = 1000, pwindow = 0, swindow = 0) |>
add_predicted_draws(fit_sex, ndraws = 1000)
pdf_sex_figure <- draws_sex_pdf |>
ggplot(aes(x = .prediction)) +
geom_density() +
labs(x = "", y = "", title = "Sex-stratified", tag = "B") +
facet_grid(. ~ sex) +
scale_x_continuous(limits = c(0, 30)) +
theme_minimal()
draws_sex_district_pdf <- obs_prep |>
as.data.frame() |>
data_grid(sex, district) |>
mutate(relative_obs_time = 1000, pwindow = 0, swindow = 0) |>
add_predicted_draws(fit_sex_district, ndraws = 1000)
pdf_sex_district_figure <- draws_sex_district_pdf |>
mutate(
district = case_when(
district == "Port Loko" ~ "Port\nLoko",
district == "Western Rural" ~ "Western\nRural",
district == "Western Urban" ~ "Western\nUrban",
.default = district
)
) |>
ggplot(aes(x = .prediction)) +
geom_density() +
labs(
x = "PDF with no censoring and no truncation", y = "",
title = "Sex-district-stratified", tag = "C"
) +
facet_grid(district ~ sex) +
scale_x_continuous(limits = c(0, 30)) +
theme_minimal()
pdf_base_figure / pdf_sex_figure / pdf_sex_district_figure +
plot_layout(heights = c(1, 1.5, 5.5))
4 Conclusion
In this vignette, we demonstrate how the epidist
package can be used to fit delay distribution models.
These models can be stratified by covariates such as sex and district using fixed and random effects.
Post-processing and prediction with fitted models is possible using the tidybayes
package.
We illustrate generating posterior expectations, the posteriors of linear predictors, as well as discrete and continuous representations of the delay distribution.