estimate_boosted_rt.Rd
estimate_boosted_rt()
applies the approach of Cori et al (2013) to estimate
a gamma-distributed reproduction number. It implements additional
pre-processing to handle outliers and seasonality, and it smooths the data
before computing Rt, rather than during the estimate. It also corrects
the portion of the smooth conditional on future data using rolling
cross-validation and geometrically-weighted bootstraps based on the residuals
for each future-conditional time point. Bootstrap weights are calculated
using a geometric sequence with the first-order autoregressive coefficient
as the discount factor.
estimate_boosted_rt(
.data,
.collection_date = "collection_date",
.report_date = "report_date",
serial_interval_mean = 6,
serial_interval_sd = 4.17,
start_date = "2020-03-12",
trend = "30 days",
period = "7 days",
delay_period = "14 days",
pct_reported = 0.9,
cutoff = 0.05,
plot_anomalies = FALSE
)
A data frame containing one incident observation per row
<tidy-select>
A Date
column to use as the
collection date of the observed case
<tidy-select>
A Date
column to use as the report
date of the observed case
The average number of days between infection of a primary case and a secondary case
The standard deviation of the number of days between infection of a primary case and a secondary case
The start date of the epidemic;
defaults to "2020-03-12"
, which is the beginning of the contiguous
part of Shelby County's observed cases (at least one case observed per
day since that date).
The length of time to use in trend decomposition; can be a
time-based definition (e.g. "1 month") or an integer number of days. If
NULL
or "auto"
, trend
is set automatically using the tunable
heuristics in the timetk package.
The length of time to use in seasonal decomposition; can be a
time-based definition (e.g. "1 week") or an integer number of days. If
NULL
or "auto"
, period
is set automatically using the tunable
heuristics in the timetk package.
The length of time to use in calculating reporting
delay; can be a time-based definition (e.g. "2 weeks") or an integer number
of days. If NULL
, delay_period
is set to "14 days"
.
The percent of total cases reported before considering
a collection date to be fully observed. It is not recommended to set this
to 1
, as reporting delays typically contain very large outliers which
will skew the results. The default is 0.9
, which strikes a balance
between sensitivity and robustness in Shelby County data.
The cutoff value for anomaly detection; controls both the maximum percentage of data points that may be considered anomalies, as well as the critical value for the Generalized Extreme Studentized Deviate test used to detect the anomalies. Can be interpreted as the desired maximum probability that an individual data point is labeled an anomaly.
Should anomalies be plotted for visual inspection? If
TRUE
, the plot will be on the log-scale.
A tibble
with columns .t
, .pred
(the median), .pred_lower
(the lower bound of the 95% credible interval), .pred_upper
(the upper bound of the 95% credible interval), .mean
(the average), and
.cv
(the coefficient of variation)