| hoist {tidyr} | R Documentation |
hoist(), unnest_longer(), and unnest_wider() provide tools for
rectangling, collapsing deeply nested lists into regular columns.
hoist() allows you to selectively pull components of a list-column out
in to their own top-level columns, using the same syntax as purrr::pluck().
unnest_wider() turns each element of a list-column into a column, and
unnest_longer() turns each element of a list-column into a row.
unnest_auto() picks between unnest_wider() or unnest_longer()
based heuristics described below.
Learn more in vignette("rectangle").
hoist( .data, .col, ..., .remove = TRUE, .simplify = TRUE, .ptype = list(), .transform = list() ) unnest_longer( data, col, values_to = NULL, indices_to = NULL, indices_include = NULL, names_repair = "check_unique", simplify = TRUE, ptype = list(), transform = list() ) unnest_wider( data, col, names_sep = NULL, simplify = TRUE, names_repair = "check_unique", ptype = list(), transform = list() ) unnest_auto(data, col)
.data, data |
A data frame. |
.col, col |
List-column to extract components from. |
... |
Components of The column names must be unique in a call to |
.remove |
If |
.simplify, simplify |
If |
.ptype, ptype |
Optionally, a named list of prototypes declaring the desired output type of each component. Use this argument if you want to check each element has the types you expect when simplifying. |
.transform, transform |
Optionally, a named list of transformation functions applied to each component. Use this function if you want transform or parse individual elements as they are hoisted. |
values_to |
Name of column to store vector values. Defaults to |
indices_to |
A string giving the name of column which will contain the
inner names or position (if not named) of the values. Defaults to |
indices_include |
Add an index column? Defaults to |
names_repair |
Used to check that output data frame has valid names. Must be one of the following options:
See |
names_sep |
If |
The three unnest() functions differ in how they change the shape of the
output data frame:
unnest_wider() preserves the rows, but changes the columns.
unnest_longer() preserves the columns, but changes the rows
unnest() can change both rows and columns.
These principles guide their behaviour when they are called with a
non-primary data type. For example, if you unnest_wider() a list of data
frames, the number of rows must be preserved, so each column is turned into
a list column of length one. Or if you unnest_longer() a list of data
frame, the number of columns must be preserved so it creates a packed
column. I'm not sure how if these behaviours are useful in practice, but
they are theoretically pleasing.
unnest_auto() heuristicsunnest_auto() inspects the inner names of the list-col:
If all elements are unnamed, it uses unnest_longer()
If all elements are named, and there's at least one name in
common acros all components, it uses unnest_wider()
Otherwise, it falls back to unnest_longer(indices_include = TRUE).
df <- tibble(
character = c("Toothless", "Dory"),
metadata = list(
list(
species = "dragon",
color = "black",
films = c(
"How to Train Your Dragon",
"How to Train Your Dragon 2",
"How to Train Your Dragon: The Hidden World"
)
),
list(
species = "blue tang",
color = "blue",
films = c("Finding Nemo", "Finding Dory")
)
)
)
df
# Turn all components of metadata into columns
df %>% unnest_wider(metadata)
# Extract only specified components
df %>% hoist(metadata,
"species",
first_film = list("films", 1L),
third_film = list("films", 3L)
)
df %>%
unnest_wider(metadata) %>%
unnest_longer(films)
# unnest_longer() is useful when each component of the list should
# form a row
df <- tibble(
x = 1:3,
y = list(NULL, 1:3, 4:5)
)
df %>% unnest_longer(y)
# Automatically creates names if widening
df %>% unnest_wider(y)
# But you'll usually want to provide names_sep:
df %>% unnest_wider(y, names_sep = "_")
# And similarly if the vectors are named
df <- tibble(
x = 1:2,
y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12))
)
df %>% unnest_wider(y)
df %>% unnest_longer(y)