read_file_delim() reads delimited files using vroom(). This allows the use of ALTREP columns, which don't load data into memory until they are needed.

read_file_delim(
  file,
  col_select = vroom::everything(),
  col_types = vroom::cols(.default = vroom::col_character()),
  na = c("", ".", "NA", "na", "Na", "N/A", "n/a", "N/a", "NULL", "null", "Null"),
  guess_max = .Machine$integer.max%/%100L,
  delim = NULL,
  ...
)

Arguments

file

path to a local file.

col_select

One or more selection expressions, like in dplyr::select(). Use c() or list() to use more than one expression. See ?dplyr::select for details on available selection options.

col_types

One of NULL, a cols() specification, or a string. See vignette("readr") for more details.

If NULL, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself.

If a column specification created by cols(), it must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only().

Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or _/- to skip the column.

na

Character vector of strings to interpret as missing values. Set this option to character() to indicate no missing values.

guess_max

Maximum number of records to use for guessing column types.

delim

One or more characters used to delimit fields within a file. If NULL the delimiter is guessed from the set of c(",", "\t", " ", "|", ":", ";").

...

Additional arguments to pass to vroom()

Value

A tibble if reading one file; a list of tibbles if reading multiple

Details

By default, read_file_delim() does not attempt to guess column types and reads all columns as character. This can be changed by setting col_types = vroom::cols(.default = vroom::col_guess()). If columns are guessed, the default is to use all rows; this can be changed by setting guess_max to a different value.

This saves a significant amount of time and space when loading data with many rarely used columns.