| stri_split {stringi} | R Documentation |
These functions split each element in str into substrings.
pattern defines the delimiters that separate the inputs into tokens.
The input data between the matches become the fields themselves.
stri_split(str, ..., regex, fixed, coll, charclass) stri_split_fixed( str, pattern, n = -1L, omit_empty = FALSE, tokens_only = FALSE, simplify = FALSE, ..., opts_fixed = NULL ) stri_split_regex( str, pattern, n = -1L, omit_empty = FALSE, tokens_only = FALSE, simplify = FALSE, ..., opts_regex = NULL ) stri_split_coll( str, pattern, n = -1L, omit_empty = FALSE, tokens_only = FALSE, simplify = FALSE, ..., opts_collator = NULL ) stri_split_charclass( str, pattern, n = -1L, omit_empty = FALSE, tokens_only = FALSE, simplify = FALSE )
str |
character vector; strings to search in |
... |
supplementary arguments passed to the underlying functions,
including additional settings for |
pattern, regex, fixed, coll, charclass |
character vector; search patterns; for more details refer to stringi-search |
n |
integer vector, maximal number of strings to return, and, at the same time, maximal number of text boundaries to look for |
omit_empty |
logical vector; determines whether empty
tokens should be removed from the result ( |
tokens_only |
single logical value;
may affect the result if |
simplify |
single logical value;
if |
opts_collator, opts_fixed, opts_regex |
a named list used to tune up
the search engine's settings; see
|
Vectorized over str, pattern, n, and omit_empty
(with recycling of the elements in the shorter vector if necessary).
If n is negative, then all pieces are extracted.
Otherwise, if tokens_only is FALSE (this is the default,
for compatibility with the stringr package), then n-1
tokens are extracted (if possible) and the n-th string
gives the remainder (see Examples).
On the other hand, if tokens_only is TRUE,
then only full tokens (up to n pieces) are extracted.
omit_empty is applied during the split process: if it is set to
TRUE, then tokens of zero length are ignored. Thus, empty strings
will never appear in the resulting vector. On the other hand, if
omit_empty is NA, then empty tokens are substituted with
missing strings.
Empty search patterns are not supported. If you wish to split a
string into individual characters, use, e.g.,
stri_split_boundaries(str, type='character') for THE Unicode way.
stri_split is a convenience function. It calls either
stri_split_regex, stri_split_fixed, stri_split_coll,
or stri_split_charclass, depending on the argument used.
If simplify=FALSE (the default),
then the functions return a list of character vectors.
Otherwise, stri_list2matrix with byrow=TRUE
and n_min=n arguments is called on the resulting object.
In such a case, a character matrix with an appropriate number of rows
(according to the length of str, pattern, etc.)
is returned. Note that stri_list2matrix's fill argument
is set to an empty string and NA, for simplify equal to
TRUE and NA, respectively.
Other search_split:
about_search,
stri_split_boundaries(),
stri_split_lines()
stri_split_fixed('a_b_c_d', '_')
stri_split_fixed('a_b_c__d', '_')
stri_split_fixed('a_b_c__d', '_', omit_empty=TRUE)
stri_split_fixed('a_b_c__d', '_', n=2, tokens_only=FALSE) # 'a' & remainder
stri_split_fixed('a_b_c__d', '_', n=2, tokens_only=TRUE) # 'a' & 'b' only
stri_split_fixed('a_b_c__d', '_', n=4, omit_empty=TRUE, tokens_only=TRUE)
stri_split_fixed('a_b_c__d', '_', n=4, omit_empty=FALSE, tokens_only=TRUE)
stri_split_fixed('a_b_c__d', '_', omit_empty=NA)
stri_split_fixed(c('ab_c', 'd_ef_g', 'h', ''), '_', n=1, tokens_only=TRUE, omit_empty=TRUE)
stri_split_fixed(c('ab_c', 'd_ef_g', 'h', ''), '_', n=2, tokens_only=TRUE, omit_empty=TRUE)
stri_split_fixed(c('ab_c', 'd_ef_g', 'h', ''), '_', n=3, tokens_only=TRUE, omit_empty=TRUE)
stri_list2matrix(stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=TRUE))
stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=FALSE, simplify=TRUE)
stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=NA, simplify=TRUE)
stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=TRUE, simplify=TRUE)
stri_split_fixed(c('ab,c', 'd,ef,g', ',h', ''), ',', omit_empty=NA, simplify=NA)
stri_split_regex(c('ab,c', 'd,ef , g', ', h', ''),
'\\p{WHITE_SPACE}*,\\p{WHITE_SPACE}*', omit_empty=NA, simplify=TRUE)
stri_split_charclass('Lorem ipsum dolor sit amet', '\\p{WHITE_SPACE}')
stri_split_charclass(' Lorem ipsum dolor', '\\p{WHITE_SPACE}', n=3,
omit_empty=c(FALSE, TRUE))
stri_split_regex('Lorem ipsum dolor sit amet',
'\\p{Z}+') # see also stri_split_charclass