-
Notifications
You must be signed in to change notification settings - Fork 54
/
Copy pathimplicit-strategies.qmd
165 lines (118 loc) · 7.57 KB
/
implicit-strategies.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
# Implicit strategies {#sec-implicit-strategies}
```{r}
#| include = FALSE
source("common.R")
```
## What's the pattern?
There are two implicit strategies that are sometimes useful.
I call them implicit because you don't select them explicitly with a single argument, but instead select between them based on the presence and absence of different arguments.
As you might guess, this can make for a confusing interface, but it is occasionaly the best option.
- With **mutually exclusive arguments** you select between two strategies based on whether you supply argument `a` or argument `b`.
- With **compound objects** you select between two strategies based on whether you supply one complex object (e.g. a data frame) or multiple simple objects (e.g. vectors). I think the most compelling reason to use this pattern is when another function might be called directly by a user (who will supply individual arguments) or with the output from another function (which needs to pour into a single argument).
The main challenge with using these pattern is that you can't make them clear from the function signature alone, so you need to carefully document and check inputs yourself.
They are also likely to be surprising to the user as they are relatively rare patterns.
So before using either of these techniques you should try using an explicit strategy via an enum (@sec-enumerate-options), using separate functions (@sec-strategy-functions), or using strategy objects (@sec-strategy-objects).
@sec-cs-rvest explores these options from the perspective of `rvest::read_html()`.
## What are some examples?
### Mutually exclusive arguments
- `cutree()` is an example where I think mutually exclusive arguments shine: it's so simple
- In `ggplot2::scale_x_date()` and friends you can specify the breaks and labels either with `breaks` and `labels` (like all other scale functions) or with `date_breaks` and `date_labels`.
If you set both values in a pair, the `date_` version wins.
- `forcats::fct_other()` allows you to either `keep` or `drop` specified factor values.
If supply neither, or both, you get an error.
- `dplyr::relocate()` has optional `.before` and `.after` arguments.
### Compound objects
- For example, it seems reasonable that you should be able to feed the output of `str_locate()` directly into `str_sub()`:
```{r}
library(stringr)
x <- c("aaaaab", "aaab", "ccccb")
loc <- str_locate(x, "a+b")
str_sub(x, loc)
```
But equally, it's nice to be able to supply individual start and end values when calling it directly:
```{r}
str_sub("Hadley", start = 2, end = 4)
```
So `str_sub()` allows either individual vectors supplied to `start` and `end`, or a two-column matrix supplied to `start`.
- `options(list(a = 1, b = 2))` is equivalent to `options(a = 1, b = 2)`.
This is half of very useful pattern.
The other half of that pattern is that `options()` returns the previous value of any options that you set.
That means you can do `old <- options(…); options(old)` to temporarily set options with in a function.
`withr::local_options()` and `withr::local_envvar()` work similarly: you can either supply a single list of values, or individually named values.
But they do it with different arguments.
- Another place that this pattern crops up is in `dplyr::bind_rows()`.
When binding rows together, it's equally useful to bind a few named data frames as it is to bind a list of data frames that come from map or similar.
In base R you need to know about `do.call(rbind, mylist)` which is a relatively sophisticated pattern.
So in dplyr we tried to make `bind_rows()` automatically figure out if you were in situation one or two.
Unfortunately, it turns out to be really hard to tell which of the situations you are in, so dplyr implemented heuristics that work most of the time, but occasionally it fails in surprising ways.
Now we have generally steered away from interfaces that try to automatically "unsplice" their inputs and instead require that you use `!!!` to explicitly unsplice.
This is has some advantages and disadvantages: it's an interface that's becoming increasingly common in the tidyverse (and we have a good convention for documenting it with the `<dynamic-dots>` tag), but it's still relatively rare and is an advanced technique that we don't expect everyone to learn.
That's why for this important case, we also have `purrr::list_cbind()`.
But it means that functions like `purrr::hoist()`, `forcats::fct_cross()`, and `rvest::html_form()` which are less commonly given lists have a clearly documented escape hatch that doesn't require another different function.
(And of course if you understand the `do.call` pattern you can still use that too).
## How do you use this pattern?
### Mutually exclusive arguments
If a function needs to have mutually exclusive arguments (i.e. you must supply only one of theme) make sure you check that only one is supplied in order to give a clear error message.
Avoid implementing some precedence order where if both `a` and `b` are supplied, `b` silently wins.
The easiest way to do this is to use `rlang::check_exclusive()`.
(In the case of required args, you might want to consider putting them after `…`. This violations @sec-dots-after-required, but forces the user to name the arguments which will make the code easier to read)
If you must pick one of the two mutually exclusive arguments, make their defaults empty.
Otherwise, if they're optional, give them `NULL` arguments.
```{r}
#| error: true
fct_drop <- function(f, drop, keep) {
rlang::check_exclusive(drop, keep)
}
fct_drop(factor())
fct_drop(factor(), keep = "a", drop = "b")
```
(If the arguments are optional, you'll need `.require = FALSE` until <https://github.com/r-lib/rlang/issues/1647>)
::: {.callout-note collapse="true"}
## With base R
If you don't want to use rlang, you implement yourself with `xor()` and `missing()`:
```{r}
#| eval: false
fct_drop <- function(f, drop, keep) {
if (!xor(missing(keep), missing(drop))) {
stop("Exactly one of `keep` and `drop` must be supplied")
}
}
fct_drop(factor())
fct_drop(factor(), keep = "a", drop = "b")
```
:::
In the documentation, document the pair of arguments together, and make it clear that only one of the pair can be supplied:
```{r}
#' @param keep,drop Pick one of `keep` and `drop`:
#' * `keep` will preserve listed levels, replacing all others with
#' `other_level`.
#' * `drop` will replace listed levels with `other_level`, keeping all
#' as is.
```
### Compound arguments
To implement in your own functions, you should branch on the type of the first argument and then check that the others aren't supplied.
```{r}
str_sub <- function(string, start, end) {
if (is.matrix(start)) {
if (!missing(end)) {
abort("`end` must be missing when `start` is a matrix")
}
if (ncol(start) != 2) {
abort("Matrix `start` must have exactly two columns")
}
stri_sub(string, from = start[, 1], to = start[, 2])
} else {
stri_sub(string, from = start, to = end)
}
}
```
And make it clear in the documentation:
```{r}
#' @param start,end Integer vectors giving the `start` (default: first)
#' and `end` (default: last) positions, inclusively.
#'
#' Alternatively, you pass a two-column matrix to `start`, i.e.
#' `str_sub(x, start, end)` is equivalent to
#' `str_sub(x, cbind(start, end))`
```
(If you look at `string::str_sub()` you'll notice that `start` and `end` do have defaults; I think this is a mistake because `start` and `end` are important enough that the user should always be forced to supply them.)