-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Organizing package documentation #18
Comments
One issue I have is that some of these abstract things a step further and possibly makes it difficult to "google" your way out of confusion. For example, if I don't quite understand what "appenders" do, I'm going to search "appenders regular expressions" which won't give better results than "anchors regular expressions". The same thing applies to "expression wrappers" vs. "lookarounds". I think the API should be abstract, expressive, verbose but the documentation (when possible) a bit more true to regular expression terminology. I think this would be helpful for both novice and advanced users even if advanced users might not use this package in the first place. Having said that, the main changes I'm thinking of would be: Anchors
Lookarounds
Quantifiers
If this isn’t very convincing. Maybe we can compromise and go with what you have suggested but make sure to include the formal terminology with references within the individual function docs. Maybe with |
You are absolutely right about search-friendliness of the terms we use. I guess I was just trying to make an inventory of what we have and how these functions behave differently. More from architectural point of view, rather than from user point of view. I like the terms you picked and agree that documentation should be logically organized around those concepts. What I am still wondering about is how to guide users regarding "nesting" of functions (i.e. what functions are possible to combine together in particular through
I have a few minor questions about
|
Maybe we just throw out the quantifiers all together in favor of the rep argument? I'm not quite sure what the ramifications of that are but I'm all for trimming down the available functions: rx() %>% rx_find("abc", rep = "maybe")
#> (?:abc)?
rx() %>% rx_find("abc") %>% rx_zero_or_one()
#> (?:abc)? Regarding the docs, I'd like to mention that |
I wonder if we might organize the package into 9 types:
# A tibble: 40 x 3
func type args
<chr> <chr> <chr>
1 rx_end_of_line anchor .data
2 rx_start_of_line anchor .data
3 rx_word_edge anchor .data, negate
4 rx_begin_capture capturing group .data
5 rx_end_capture capturing group .data
6 rx_alpha character class .data, rep, mode, negate
7 rx_alpha_num character class .data, rep, mode, negate
8 rx_alphanum character class .data, rep, mode, negate
9 rx_br character class .data, rep, mode, negate
10 rx_digit character class .data, rep, mode, negate
11 rx_line_break character class .data, rep, mode, negate
12 rx_lowercase character class .data, rep, mode, negate
13 rx_punctuation character class .data, rep, mode, negate
14 rx_space character class .data, rep, mode, negate
15 rx_tab character class .data, rep, mode, negate
16 rx_uppercase character class .data, rep, mode, negate
17 rx_whitespace character class .data, rep, mode, negate
18 rx_word character class .data, mode, negate
19 rx_anything_but expression .data, ..., mode
20 rx_either_of expression .data, ..., rep, mode
21 rx_find expression .data, ..., rep, mode
22 rx_maybe expression .data, ..., mode
23 rx_none_of expression .data, ..., rep, mode
24 rx_one_of expression .data, ..., rep, mode
25 rx_range expression .data, ..., rep, mode, negate
26 rx_something_but expression .data, ..., mode
27 rx_anything friendly expression .data, mode
28 rx_something friendly expression .data, mode
29 rx_avoid_prefix lookaround .data, ...
30 rx_avoid_suffix lookaround .data, ...
31 rx_seek_prefix lookaround .data, ...
32 rx_seek_suffix lookaround .data, ...
33 rx_with_any_case modifier .data
34 rx_count quantifier .data, n, mode
35 rx_none_or_more quantifier .data, mode
36 rx_one_or_more quantifier .data, mode
37 %>% utility lhs, rhs
38 rx utility NA
39 rx_test utility x, txt
40 sanitize utility x The one I'm most conflicted about is expressions and friendly expressions. My main motivation for defining that function type is to figure out what can be nested and what can't. I think (though there might be an exception or two) that everything else can be nested and plugged in to one another. It's interesting to see some consistency between the types and arguments, there is of course some inconsistency that needs to be changed (enable argument, value vs. ..., etc). In any case, it makes sense for functions belonging to a specific type to have a somewhat consistent argument structure.
|
I have been thinking how to organize package documentation. We basically have a few "groups" of functions that may make sense to be introduced together (at least in pkgdown):
Single-character functions
These are functions that return one character and do not require any "wrappers"
rx_alpha_num
rx_br
andrx_line_break
rx_digit
rx_something
rx_space
rx_tab
rx_whitespace
rx_word_char
andrx_word
(with defaultrep="some"
) argument.Character "sets"
These function output ranges or "sets" of characters, wrapped into
[
, for which we don't have a way to express them with single character. This is important when "nesting" them into supersets below, when "outer" set of[
need to be "peeled off". From the user stand point they may not be any different from Single-character functionsrx_alphanum
rx_alpha
rx_lower
andrx_upper
rx_punctuation
rx_range
"Appenders"
These functions take
.data
argument and simply append something to it, thus modifying the behavior of previously appended function(s).rx_capture_groups
rx_count
rx_end_of_line
andrx_start_of_line
rx_one_or_more
andrx_none_or_more
rx_with_any_case
"Expression-wrappers"
These functions allow user to specify the sequence of characters out of which all should be matched to the string.
rx_avoid
andrx_seek
rx_find
(andrx_literal
, which I now dropped)rx_maybe
(which isrx_find
withrep
argument set to "maybe")rx_or
(which might need a bit of extra work, see Syntax for rx_or() #16 and thus will be out of this category)"Superset functions"
These functions specify a list of mutually exclusive symbols/expressions, only one of which should be matched to the string.
rx_one_of
rx_anything_but
andrx_something_but
(eventually
rx_either_of
) will be moved here as well, if we decide to keep it.I find this grouping helpful when reasoning about the functionality our package covers.
There are a few functions I dropped:
rx_any_of
(duplicate ofrx_one_of
)rx_digits
(too little advantage compared torx_digit(rep=n)
rx_literal
(duplicate ofrx_find
)rx_not
(duplicate ofrx_avoid_suffix
)rx_new
has been moved toutils.R
The text was updated successfully, but these errors were encountered: