Skip to contents

std_invl() standardizes the various representations of numeric intervals found in the ML in HCT dataset. These intervals are assumed to be in percentage values and thus lie between 0 and 100. Explicit intervals with upper and lower bounds, as well as implicit intervals using < and >, are handled (<= and >= are currently not supported). The return value simplifies to </>/<=/>= or a single numeric value if possible and uses standard interval notation if not.

Usage

std_intvl(
  x,
  less_than = c("LESS THAN",
    "[A-Z ]*NOTHING TO SUGGEST[A-Z ]*SENSITIVITY[A-Z ]*(?=[0-9])"),
  greater_than = c("GREATER THAN"),
  na = na_patterns,
  std_chr = TRUE,
  warn = TRUE,
  ...
)

Arguments

x

A character vector

less_than

Regex patterns to consider "<". Passed to stringr::str_replace(). Can be a vector of patterns.

greater_than

Regex patterns to consider ">". Passed to stringr::str_replace(). Can be a vector of patterns.

na

Regex patterns to consider NA. Passed to stringr::str_detect(). Can be a vector of patterns.

std_chr

Whether to standarize the strings before parsing

warn

Whether to emit a warning when potential numeric values are not able to be converted to an interval

...

Arguments passed on to chr_to_num

std

Whether to standardize the vector before cleaning and converting

convert

Whether to actually convert to numeric

replace

A data.frame of regular expressions and strings to replace them; regular expression should be in a column named pattern, and replacements should be in a column named replacement. Each row is passed to stringr::str_replace().

per_action

How to treat %/percent/per million/etc labels. drop simply removes the labels, divide divides the value by the appropriate denominator, and ignore does nothing.

multiple_decimals

How to handle multiple decimals within a number

donor_host

Which value to use when values for both a donor and a host are given

Value

A character vector