Efficently Read Delimited Files — read_file

read_file_delim() reads delimited files using vroom(). This allows the use of ALTREP columns, which don't load data into memory until they are needed.

read_file_delim(
  file,
  col_select = vroom::everything(),
  col_types = vroom::cols(.default = vroom::col_character()),
  na = c("", ".", "NA", "na", "Na", "N/A", "n/a", "N/a", "NULL", "null", "Null"),
  guess_max = .Machine$integer.max%/%100L,
  delim = NULL,
  ...
)

Arguments

file	path to a local file.
col_select	One or more selection expressions, like in `dplyr::select()`. Use `c()` or `list()` to use more than one expression. See `?dplyr::select` for details on available selection options.
col_types	One of `NULL`, a `cols()` specification, or a string. See `vignette("readr")` for more details. If `NULL`, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to supply the correct types yourself. If a column specification created by `cols()`, it must contain one column specification for each column. If you only want to read a subset of the columns, use `cols_only()`. Alternatively, you can use a compact string representation where each character represents one column: c = character, i = integer, n = number, d = double, l = logical, f = factor, D = date, T = date time, t = time, ? = guess, or `_`/`-` to skip the column.
na	Character vector of strings to interpret as missing values. Set this option to `character()` to indicate no missing values.
guess_max	Maximum number of records to use for guessing column types.
delim	One or more characters used to delimit fields within a file. If `NULL` the delimiter is guessed from the set of `c(",", "\t", " ", "\|", ":", ";")`.
...	Additional arguments to pass to `vroom()`

Value

A tibble if reading one file; a list of tibbles if reading multiple

Details

By default, read_file_delim() does not attempt to guess column types and reads all columns as character. This can be changed by setting col_types = vroom::cols(.default = vroom::col_guess()). If columns are guessed, the default is to use all rows; this can be changed by setting guess_max to a different value.

This saves a significant amount of time and space when loading data with many rarely used columns.