Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweak "From base R" vignette: #483

Merged
merged 8 commits into from
Dec 7, 2022
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@
^codecov\.yml$
^\.httr-oauth$
^_pkgdown\.yml$
^doc$
salim-b marked this conversation as resolved.
Show resolved Hide resolved
^docs$
^Meta$
^README\.Rmd$
^README-.*\.png$
^appveyor\.yml$
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@ revdep/library
revdep/checks.noindex
revdep/library.noindex
revdep/data.sqlite
/doc/
/Meta/
9 changes: 8 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,18 @@ Imports:
vctrs
Suggests:
covr,
dplyr,
downlit,
gt,
htmltools,
htmlwidgets,
knitr,
pkgdown,
purrr,
rmarkdown,
testthat (>= 3.0.0)
testthat (>= 3.0.0),
tibble,
withr
VignetteBuilder:
knitr
Config/Needs/website: tidyverse/tidytemplate
Expand Down
109 changes: 76 additions & 33 deletions vignettes/from-base.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -8,41 +8,85 @@ vignette: >
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
```{r}
#| label: setup
#| include: false

knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)

library(stringr)
library(magrittr)
```

This vignette compares stringr functions to their base R equivalents to help users transitioning from using base R to stringr.

# Overall differences

We'll begin with a lookup table between the most important base string functions and their stringr equivalents.

| base | stringr |
|------|------------|
| `gregexpr(pattern, x)` | `str_locate_all(x, pattern)` |
| `grep(pattern, x, value = TRUE)` | `str_subset(x, pattern)` |
| `grep(pattern, x)` | `str_which(x, pattern)` |
| `grepl(pattern, x)` | `str_detect(x, pattern)` |
| `gsub(pattern, replacement, x)` | `str_replace_all(x, pattern, replacement)`|
| `nchar(x)` | `str_length(x)` |
| `order(x)` | `str_order(x)` |
| `regexec(pattern, x)` + `regmatches()` | `str_match(x, pattern)` |
| `regexpr(pattern, x)` + `regmatches()` | `str_extract(x, pattern)`|
| `regexpr(pattern, x)` | `str_locate(x, pattern)` |
| `sort(x)` | `str_sort(x)` |
| `strrep(x, n)` | `str_dup(x, n)` |
| `strsplit(x, pattern)` | `str_split(x, pattern)`|
| `strwrap(x)` | `str_wrap(x)` |
| `sub(pattern, replacement, x)` | `str_replace(x, pattern, replacement)` |
| `substr(x, start, end)` | `str_sub(x, start, end)` |
| `tolower(x)` | `str_to_lower(x)` |
| `tools::toTitleCase(x)` | `str_to_title(x)` |
| `toupper(x)` | `str_to_upper(x)` |
| `trimws(x)` | `str_trim(x)` |
We'll begin with a lookup table between the most important stringr functions and their base R equivalents.

```{r}
#| label: stringr-base-r-diff
#| echo: false

data_stringr_base_diff <- tibble::tribble(
~stringr, ~`base_r`,
"str_detect(x, pattern)", "grepl(pattern, x)",
"str_dup(x, n)", "strrep(x, n)",
"str_extract(x, pattern)", "regmatches(x, regexpr(pattern, x))",
"str_extract_all(x, pattern)", "regmatches(x, gregexpr(pattern, x))",
"str_length(x)", "nchar(x)",
"str_locate(x, pattern)", "regexpr(pattern, x)",
"str_locate_all(x, pattern)", "gregexpr(pattern, x)",
"str_match(x, pattern)", "regmatches(x, regexec(pattern, x))",
"str_order(x)", "order(x)",
"str_replace(x, pattern, replacement)", "sub(pattern, replacement, x)",
"str_replace_all(x, pattern, replacement)", "gsub(pattern, replacement, x)",
"str_sort(x)", "sort(x)",
"str_split(x, pattern)", "strsplit(x, pattern)",
"str_sub(x, start, end)", "substr(x, start, end)",
"str_subset(x, pattern)", "grep(pattern, x, value = TRUE)",
"str_to_lower(x)", "tolower(x)",
"str_to_title(x)", "tools::toTitleCase(x)",
"str_to_upper(x)", "toupper(x)",
"str_trim(x)", "trimws(x)",
"str_which(x, pattern)", "grep(pattern, x)",
"str_wrap(x)", "strwrap(x)"
)

# since downlit seems to omit tables, we manually link to pkgdown reference for HTML output
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the problem — I think the problem is that library(stringr) isn't shown in the output, so downlit can't see it. If you put library(stringr) in a separate chunk that's shown in the output, downlit should just work. This should substantially simplify the dependencies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with how downlit works internally (it seems quite magic!). But the table cell content is still not autolinked even when there's a visible library(stringr) in the output:

Screenshot 2022-12-06 at 22-44-49 From base R

I've tested a few combinations, none of them convinced downlit to autolink table cell contents. On the other hand, all code in text paragraphs is already properly linked without a visible library(stringr) chunk, so I think the issue lies elsewhere...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, in any case, I'd rather fix this upstream, rather than manually adding the links here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if this could be fixed in downlit. r-lib/downlit#67 is about a different issue though, right?

I've only added the links in the stringr column since I think they're more important when consulting this table. But I wouldn't mind if both columns were autolinked. 😄

Do you want me to remove the custom downlit-code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please file a downlit issue, and remove the custom downlit code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the code. And filed r-lib/downlit#165.

if (pkgdown::in_pkgdown()) {

withr::with_options(
new = list(
downlit.attached = "stringr",
downlit.local_packages = c("stringr" = "..")
),
code = data_stringr_base_diff %<>%
dplyr::mutate(
path_stringr_ref = purrr::map_chr(
stringr,
~ downlit::autolink_url(paste0(str_extract(.x, "\\w+"), "()"))
),
stringr = paste0("[`", stringr, "`](", path_stringr_ref, ")"),
base_r = paste0("`", base_r, "`")
)
)
} else {

data_stringr_base_diff %<>%
dplyr::mutate(dplyr::across(.fns = ~ paste0("`", .x, "`")))
}

data_stringr_base_diff %>%
dplyr::select(-any_of("path_stringr_ref")) %>%
dplyr::rename(`base R` = base_r) %>%
gt::gt() %>%
gt::fmt_markdown(columns = everything()) %>%
gt::tab_options(column_labels.font.weight = "bold")
```

Overall the main differences between base R and stringr are:

Expand All @@ -64,14 +108,10 @@ Overall the main differences between base R and stringr are:
1. Base functions use arguments (like `perl`, `fixed`, and `ignore.case`)
to control how the pattern is interpreted. To avoid dependence between
arguments, stringr instead uses helper functions (like `fixed()`,
`regexp()`, and `coll()`).
`regex()`, and `coll()`).

Next we'll walk through each of the functions, noting the similarities and important differences. These examples are adapted from the stringr documentation and here they are contrasted with the analogous base R operations.

```{r setup}
library(stringr)
```

# Detect matches

## `str_detect()`: Detect the presence or absence of a pattern in a string
Expand Down Expand Up @@ -275,7 +315,9 @@ str_length(letters)

There are some subtle differences between base and stringr here. `nchar()` requires a character vector, so it will return an error if used on a factor. `str_length()` can handle a factor input.

```{r, error = TRUE}
```{r}
#| error: true

# base
nchar(factor("abc"))
```
Expand Down Expand Up @@ -388,7 +430,6 @@ str_replace_all(fruits, "[aeiou]", "-")

Both stringr and base R have functions to convert to upper and lower case. Title case is also provided in stringr.


```{r}
dog <- "The quick brown dog"

Expand Down Expand Up @@ -431,7 +472,9 @@ The advantage of `str_flatten()` is that it always returns a vector the same len

To duplicate strings within a character vector use `strrep()` (in R 3.3.0 or greater) or `str_dup()`:

```{r, eval = (getRversion() >= "3.3.0")}
```{r}
#| eval: !expr getRversion() >= "3.3.0"

fruit <- c("apple", "pear", "banana")

# base
Expand Down