clean_linelist.Rd
clean_linelist()
creates a cleaned version of an incidence linelist with
collection and report dates. It removes observations with collection dates
that have not been fully reported (as determined by pct_reported
), as well
as observations with missing collection dates or collection dates prior to
start_date
. It then checks that all report dates are on or after the
corresponding collection dates and removes observations where this is untrue.
clean_linelist(
.data,
.collection_date = "collection_date",
.report_date = "report_date",
start_date = "2020-03-12",
delay_period = 14L,
pct_reported = 0.9
)
A data frame containing one incident observation per row
<tidy-select>
A Date
column to use as the
collection date of the observed case
<tidy-select>
A Date
column to use as the report
date of the observed case
The start date of the epidemic;
defaults to "2020-03-12"
, which is the beginning of the contiguous
part of Shelby County's observed cases (at least one case observed per
day since that date).
The length of time to use in calculating reporting
delay; can be a time-based definition (e.g. "2 weeks") or an integer number
of days. If NULL
, delay_period
is set to "14 days"
.
The percent of total cases reported before considering
a collection date to be fully observed. It is not recommended to set this
to 1
, as reporting delays typically contain very large outliers which
will skew the results. The default is 0.9
, which strikes a balance
between sensitivity and robustness in Shelby County data.
A tibble
containing the cleaned linelist
Higher-level functions
prep_linelist_decomposition()
, prep_linelist()