Remove Incomplete, Illogical, and Missing Data from an Incidence Linelist

clean_linelist() creates a cleaned version of an incidence linelist with collection and report dates. It removes observations with collection dates that have not been fully reported (as determined by pct_reported), as well as observations with missing collection dates or collection dates prior to start_date. It then checks that all report dates are on or after the corresponding collection dates and removes observations where this is untrue.

clean_linelist(
  .data,
  .collection_date = "collection_date",
  .report_date = "report_date",
  start_date = "2020-03-12",
  delay_period = 14L,
  pct_reported = 0.9
)

Arguments

.data: A data frame containing one incident observation per row
.collection_date: <tidy-select> A Date column to use as the collection date of the observed case
.report_date: <tidy-select> A Date column to use as the report date of the observed case
start_date: The start date of the epidemic; defaults to "2020-03-12", which is the beginning of the contiguous part of Shelby County's observed cases (at least one case observed per day since that date).
delay_period: The length of time to use in calculating reporting delay; can be a time-based definition (e.g. "2 weeks") or an integer number of days. If NULL, delay_period is set to "14 days".
pct_reported: The percent of total cases reported before considering a collection date to be fully observed. It is not recommended to set this to 1, as reporting delays typically contain very large outliers which will skew the results. The default is 0.9, which strikes a balance between sensitivity and robustness in Shelby County data.

Value

A tibble containing the cleaned linelist

Remove Incomplete, Illogical, and Missing Data from an Incidence Linelist

Arguments

Value

See also