Re-Use (and cache?) common XPaths #2350

AshesITR · 2023-11-24T05:52:09Z

Inspired by a comment by @MichaelChirico which I can't find at the moment.

Some linters share common XPath logic. This is currently in no way cached, so all linters have to re-evaluate the common XPath logic.
It might be worth to

Figure out which XPaths could be made common.
Figure out a way to cache the result of applying the XPaths to expressions / files so they are only evaluated once.

Related to #963

AshesITR · 2023-11-24T05:54:58Z

One way I could imagine doing that would be to add promises to source_expression for some of the XPaths that (lazily) evaluate to xml_find_all(source_expression$(full_)xml_parsed_content, common_xpath).

AshesITR · 2023-11-24T06:06:02Z

Here's what lint("R/zzz.R") outputs when run with
trace(xml2::xml_find_all, function() {cat(get("xpath", envir = parent.frame()), "\n", sep = "", file = "xpath_trace.log", append = TRUE)}):

xpath_trace.log

I see many //SYMBOL_FUNCTION_CALL[{xp_text_in_table(...)}] calls.
It might be worthwhile to do a text()-indexed cache containing //SYMBOL_FUNCTION_CALL results.

AshesITR · 2023-11-24T06:16:44Z

Here's what cost > 1s in lint() during lint_package() currently:

We can see 30% of the time is spent in xml_find_all().
What surprised me was is_lint_level() taking a whole 8s. That may be worth converting to an attribute of Linter()?

MichaelChirico · 2023-11-27T08:22:41Z

xpath_trace.log

Nice! more quantitatively:

download.file("https://github.com/r-lib/lintr/files/13455583/xpath_trace.log", tmp<-tempfile())
l = readLines(mtp)

global_nodes = re_matches(l, "//(?<node>[A-Za-z_-]+)")
as.data.frame(sort(table(node = unlist(global_nodes$node))))

                   node Freq
1               COMMENT    1
2           LEFT_ASSIGN    1
3      OP-RIGHT-BRACKET    1
4          OP-SEMICOLON    1
5                  PIPE    1
6                   LBB    2
7        SYMBOL_FORMALS    2
8                   AND  129
9                  expr  129
10                   GT  129
11                   LT  129
12                   NE  129
13                OP-AT  129
14             OP-COLON  129
15       OP-RIGHT-BRACE  129
16         RIGHT_ASSIGN  129
17              SPECIAL  129
18              OP-PLUS  130
19       OP-RIGHT-PAREN  130
20      OP-LEFT-BRACKET  131
21        OP-LEFT-PAREN  132
22            STR_CONST  133
23                 ELSE  258
24                   EQ  258
25            EQ_ASSIGN  258
26              forcond  258
27             OP-COMMA  258
28            OP-DOLLAR  258
29                   IF  259
30            NUM_CONST  259
31        OP-LEFT-BRACE  259
32             FUNCTION  387
33               SYMBOL  776
34 SYMBOL_FUNCTION_CALL 5427

i.e. >50% of all //global searches are for some //SYMBOL_FUNCTION_CALL.

AshesITR · 2023-11-27T18:43:43Z

So, what I have in mind is lazily do a //SYMBOL_FUNCTION_CALL XPath query for each expression in get_source_expressions() and store a helper function $get_function_call_nodes(function_names) that returns a list equivalent to the result of //SYMBOL_FUNCTION_CALL[{ xp_text_in_table(function_names) }].

By lazily I mean ideally this cache would only be built once a linter actually tries to use it.
Seeing that > 50% of our suite do, maybe it's not worth adding logic for lazy evaluation though.

AshesITR added the performance label Nov 24, 2023

AshesITR mentioned this issue Nov 24, 2023

Add linter_level to Linter() metadata. #2351

Closed

AshesITR mentioned this issue Nov 27, 2023

add xml_find_function_calls() helper to source expressions #2357

Merged

6 tasks

MichaelChirico closed this as completed in #2357 Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-Use (and cache?) common XPaths #2350

Re-Use (and cache?) common XPaths #2350

AshesITR commented Nov 24, 2023

AshesITR commented Nov 24, 2023

AshesITR commented Nov 24, 2023

AshesITR commented Nov 24, 2023

MichaelChirico commented Nov 27, 2023

AshesITR commented Nov 27, 2023

Re-Use (and cache?) common XPaths #2350

Re-Use (and cache?) common XPaths #2350

Comments

AshesITR commented Nov 24, 2023

AshesITR commented Nov 24, 2023

AshesITR commented Nov 24, 2023

AshesITR commented Nov 24, 2023

MichaelChirico commented Nov 27, 2023

AshesITR commented Nov 27, 2023