Estimate Reporting Delay Using a Simple Moving Average

estimate_delay() estimates the time it takes for a given percentage of samples collected on a certain date to be reported.

estimate_delay(
  .data,
  .collection_date = "collection_date",
  .report_date = "report_date",
  pct = 0.9,
  period = 14L,
  today = Sys.Date(),
  rtn = c("last_complete", "incomplete_only", "all"),
  min_dt = as.Date("2020-04-12"),
  quiet = FALSE
)

Arguments

.data: A data frame containing one incident observation per row
.collection_date: <tidy-select> A Date column to use as the collection date of the observed case
.report_date: <tidy-select> A Date column to use as the report date of the observed case
pct: The quantile to use when computing the delay
period: The number of days to average over for the rolling comparison
today: The date to consider "today"
rtn: What to return. By default, this is a single-row tibble containing the last complete .collection_date; it can also return either incomplete dates only or all dates. All return values are tibbles with the same columns; see Value for details.
min_dt: The minimum date to consider- set to the first reporting date in SCHD data by default
quiet: Should information on observations excluded from the estimation be shown?

Value

A tibble containing one row per date and columns for .collection_date, prior_delay, delay, and incomplete status

Details

To estimate reporting delay, estimate_delay() calculates quantiles of the delay distribution corresponding to pct for each .collection_date in the data. If reporting is complete, these quantiles are interpretable as the time needed for pct samples to be reported from a given date. If reporting is incomplete, these will be biased towards the portion of the delay distribution that is prioritized in the reporting process. In SCHD data, cases have been mostly processed in temporal order, so this bias is upwards (towards longer delays).

Next, quantiles are weighted by the sample size on each date, and a rolling average is calculated with a window equal to period. This is the continuous domain equivalent of calculating the quantile over period days.

Finally, the averages for t-period to t-1 are compared to time between today and each date t. If the average is larger than this time difference, reporting is considered incomplete; otherwise, reporting is considered complete.