- In
str_replace_all()
, areplacement
function now receives all values in a single vector. This radically improves performance at the cost of breaking some existing uses (#462).
-
Some minor documentation improvements.
-
str_trunc()
now correctly truncates strings whenside
is"left"
or"center"
(@UchidaMizuki, #512).
-
stringr functions now consistently implement the tidyverse recycling rules (#372). There are two main changes:
-
Only vectors of length 1 are recycled. Previously, (e.g.)
str_detect(letters, c("x", "y"))
worked, but it now errors. -
str_c()
ignoresNULLs
, rather than treating them as length 0 vectors.
Additionally, many more arguments now throw errors, rather than warnings, if supplied the wrong type of input.
-
-
regex()
and friends now generate class names withstringr_
prefix (#384). -
str_detect()
,str_starts()
,str_ends()
andstr_subset()
now error when used with either an empty string (""
) or aboundary()
. These operations didn't really make sense (str_detect(x, "")
returnedTRUE
for all non-empty strings) and made it easy to make mistakes when programming.
-
Many tweaks to the documentation to make it more useful and consistent.
-
New
vignette("from-base")
by @sastoudt provides a comprehensive comparison between base R functions and their stringr equivalents. It's designed to help you move to stringr if you're already familiar with base R string functions (#266). -
New
str_escape()
escapes regular expression metacharacters, providing an alternative tofixed()
if you want to compose a pattern from user supplied strings (#408). -
New
str_equal()
compares two character vectors using unicode rules, optionally ignoring case (#381). -
str_extract()
can now optionally extract a capturing group instead of the complete match (#420). -
New
str_flatten_comma()
is a special case ofstr_flatten()
designed for comma separated flattening and can correctly apply the Oxford commas when there are only two elements (#444). -
New
str_split_1()
is tailored for the special case of splitting up a single string (#409). -
New
str_split_i()
extract a single piece from a string (#278, @bfgray3). -
New
str_like()
allows the use of SQL wildcards (#280, @rjpat). -
New
str_rank()
to complete the set of order/rank/sort functions (#353). -
New
str_sub_all()
to extract multiple substrings from each string. -
New
str_unique()
is a wrapper aroundstri_unique()
and returns unique string values in a character vector (#249, @seasmith). -
str_view()
uses ANSI colouring rather than an HTML widget (#370). This works in more places and requires fewer dependencies. It includes a number of other small improvements:- It no longer requires a pattern so you can use it to display strings with special characters.
- It highlights unusual whitespace characters.
- It's vectorised over both string
and
pattern` (#407). - It defaults to displaying all matches, making
str_view_all()
redundant (and hence deprecated) (#455).
-
New
str_width()
returns the display width of a string (#380). -
stringr is now licensed as MIT (#351).
-
Better error message if you supply a non-string pattern (#378).
-
A new data source for
sentences
has fixed many small errors. -
str_extract()
andstr_exctract_all()
now work correctly whenpattern
is aboundary()
. -
str_flatten()
gains alast
argument that optionally override the final separator (#377). It gains ana.rm
argument to remove missing values (since it's a summary function) (#439). -
str_pad()
gainsuse_width
argument to control whether to use the total code point width or the number of code points as "width" of a string (#190). -
str_replace()
andstr_replace_all()
can use standard tidyverse formula shorthand forreplacement
function (#331). -
str_starts()
andstr_ends()
now correctly respect regex operator precedence (@carlganz). -
str_wrap()
breaks only at whitespace by default; setwhitespace_only = FALSE
to return to the previous behaviour (#335, @rjpat). -
word()
now returns all the sentence when using a negativestart
parameter that is greater or equal than the number of words. (@pdelboca, #245)
Hot patch release to resolve R CMD check failures.
-
str_interp()
now renders lists consistently independent on the presence of additional placeholders (@amhrasmussen). -
New
str_starts()
andstr_ends()
functions to detect patterns at the beginning or end of strings (@jonthegeek, #258). -
str_subset()
,str_detect()
, andstr_which()
getnegate
argument, which is useful when you want the elements that do NOT match (#259, @yutannihilation). -
New
str_to_sentence()
function to capitalize with sentence case (@jonthegeek, #202).
-
str_replace_all()
with a named vector now respects modifier functions (#207) -
str_trunc()
is once again vectorised correctly (#203, @austin3dickey). -
str_view()
handlesNA
values more gracefully (#217). I've also tweaked the sizing policy so hopefully it should work better in notebooks, while preserving the existing behaviour in knit documents (#232).
- During package build, you may see
Error : object ‘ignore.case’ is not exported by 'namespace:stringr'
. This is because the long deprecatedstr_join()
,ignore.case()
andperl()
have now been removed.
-
str_glue()
andstr_glue_data()
provide convenient wrappers aroundglue
andglue_data()
from the glue package (#157). -
str_flatten()
is a wrapper aroundstri_flatten()
and clearly conveys flattening a character vector into a single string (#186). -
str_remove()
andstr_remove_all()
functions. These wrapstr_replace()
andstr_replace_all()
to remove patterns from strings. (@Shians, #178) -
str_squish()
removes spaces from both the left and right side of strings, and also converts multiple space (or space-like characters) to a single space within strings (@stephlocke, #197). -
str_sub()
gainsomit_na
argument for ignoringNA
. Accordingly,str_replace()
now ignoresNA
s and keeps the original strings. (@yutannihilation, #164)
-
str_trunc()
now preserves NAs (@ClaytonJY, #162) -
str_trunc()
now throws an error whenwidth
is shorter thanellipsis
(@ClaytonJY, #163). -
Long deprecated
str_join()
,ignore.case()
andperl()
have now been removed.
str_match_all()
now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent withstr_match()
and other match failures (#134).
-
In
str_replace()
,replacement
can now be a function that is called once for each match and whose return value is used to replace the match. -
New
str_which()
mimicsgrep()
(#129). -
A new vignette (
vignette("regular-expressions")
) describes the details of the regular expressions supported by stringr. The main vignette (vignette("stringr")
) has been updated to give a high-level overview of the package.
-
str_order()
andstr_sort()
gain explicitnumeric
argument for sorting mixed numbers and strings. -
str_replace_all()
now throws an error ifreplacement
is not a character vector. Ifreplacement
isNA_character_
it replaces the complete string with replaces withNA
(#124). -
All functions that take a locale (e.g.
str_to_lower()
andstr_sort()
) default to "en" (English) to ensure that the default is consistent across platforms.
-
Add sample datasets:
fruit
,words
andsentences
. -
fixed()
,regex()
, andcoll()
now throw an error if you use them with anything other than a plain string (#60). I've clarified that the replacement forperl()
isregex()
notregexp()
(#61).boundary()
has improved defaults when splitting on non-word boundaries (#58, @lmullen). -
str_detect()
now can detect boundaries (by checking for astr_count()
> 0) (#120).str_subset()
works similarly. -
str_extract()
andstr_extract_all()
now work withboundary()
. This is particularly useful if you want to extract logical constructs like words or sentences.str_extract_all()
respects thesimplify
argument when used withfixed()
matches. -
str_subset()
now respects custom options forfixed()
patterns (#79, @gagolews). -
str_replace()
andstr_replace_all()
now behave correctly when a replacement string contains$
s,\\\\1
, etc. (#83, #99). -
str_split()
gains asimplify
argument to matchstr_extract_all()
etc. -
str_view()
andstr_view_all()
create HTML widgets that display regular expression matches (#96). -
word()
returnsNA
for indexes greater than number of words (#112).
-
stringr is now powered by stringi instead of base R regular expressions. This improves unicode and support, and makes most operations considerably faster. If you find stringr inadequate for your string processing needs, I highly recommend looking at stringi in more detail.
-
stringr gains a vignette, currently a straight forward update of the article that appeared in the R Journal.
-
str_c()
now returns a zero length vector if any of its inputs are zero length vectors. This is consistent with all other functions, and standard R recycling rules. Similarly, usingstr_c("x", NA)
now yieldsNA
. If you want"xNA"
, usestr_replace_na()
on the inputs. -
str_replace_all()
gains a convenient syntax for applying multiple pairs of pattern and replacement to the same vector:input <- c("abc", "def") str_replace_all(input, c("[ad]" = "!", "[cf]" = "?"))
-
str_match()
now returns NA if an optional group doesn't match (previously it returned ""). This is more consistent withstr_extract()
and other match failures. -
New
str_subset()
keeps values that match a pattern. It's a convenient wrapper forx[str_detect(x)]
(#21, @jiho). -
New
str_order()
andstr_sort()
allow you to sort and order strings in a specified locale. -
New
str_conv()
to convert strings from specified encoding to UTF-8. -
New modifier
boundary()
allows you to count, locate and split by character, word, line and sentence boundaries. -
The documentation got a lot of love, and very similar functions (e.g. first and all variants) are now documented together. This should hopefully make it easier to locate the function you need.
-
ignore.case(x)
has been deprecated in favour offixed|regex|coll(x, ignore.case = TRUE)
,perl(x)
has been deprecated in favour ofregex(x)
. -
str_join()
is deprecated, please usestr_c()
instead.
-
fixed path in
str_wrap
example so works for more R installations. -
remove dependency on plyr
-
Zero input to
str_split_fixed
returns 0 row matrix withn
columns -
Export
str_join
-
new modifier
perl
that switches to Perl regular expressions -
str_match
now uses new base functionregmatches
to extract matches - this should hopefully be faster than my previous pure R algorithm
-
new
str_wrap
function which givesstrwrap
output in a more convenient format -
new
word
function extract words from a string given user defined separator (thanks to suggestion by David Cooper) -
str_locate
now returns consistent type when matching empty string (thanks to Stavros Macrakis) -
new
str_count
counts number of matches in a string. -
str_pad
andstr_trim
receive performance tweaks - for large vectors this should give at least a two order of magnitude speed up -
str_length returns NA for invalid multibyte strings
-
fix small bug in internal
recyclable
function
- all functions now vectorised with respect to string, pattern (and where appropriate) replacement parameters
- fixed() function now tells stringr functions to use fixed matching, rather than escaping the regular expression. Should improve performance for large vectors.
- new ignore.case() modifier tells stringr functions to ignore case of pattern.
- str_replace renamed to str_replace_all and new str_replace function added. This makes str_replace consistent with all functions.
- new str_sub<- function (analogous to substring<-) for substring replacement
- str_sub now understands negative positions as a position from the end of the string. -1 replaces Inf as indicator for string end.
- str_pad side argument can be left, right, or both (instead of center)
- str_trim gains side argument to better match str_pad
- stringr now has a namespace and imports plyr (rather than requiring it)
- fixed() now also escapes |
- str_join() renamed to str_c()
- all functions more carefully check input and return informative error messages if not as expected.
- add invert_match() function to convert a matrix of location of matches to locations of non-matches
- add fixed() function to allow matching of fixed strings.
- str_length now returns correct results when used with factors
- str_sub now correctly replaces Inf in end argument with length of string
- new function str_split_fixed returns fixed number of splits in a character matrix
- str_split no longer uses strsplit to preserve trailing breaks