Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up map_token #37

Merged
merged 1 commit into from
Sep 19, 2023
Merged

Speed up map_token #37

merged 1 commit into from
Sep 19, 2023

Conversation

MichaelChirico
Copy link

I found lintr::get_source_expressions() very slow on the file mentioned on the {cyclocomp} performance issue:

gaborcsardi/cyclocomp#25

It looks like a lot of the time is consumed in xml_parse_data() so I'm looking for some easy performance wins.

Avoiding ifelse() seems like a good start. Here's a timing comparison of map_token() on the parse data frame from that file:

microbenchmark(times = 100, map_token(pd$token), map_token2(pd$token), map_token3(pd$token))
# Unit: milliseconds
#                  expr       min        lq     mean   median       uq       max neval cld
#   map_token(pd$token) 63.981255 67.672844 82.52606 82.72643 87.92059 255.91049   100   c
#  map_token2(pd$token) 24.273721 25.416668 33.79148 28.09598 43.07010  54.56177   100  b 
#  map_token3(pd$token)  9.595289  9.883199 16.86150 10.09418 11.46993 198.82841   100 a  

map_token3() is implemented here. map_token2() is closer to the original:

map_token2 <- function(token) {
  map <- xml_parse_token_map[token]
  translated <- !is.na(map)
  token[translated] <- map[translated]
  token
}

So I think we've got an 8x speed-up without much change to readability (IMO readability has improved).

Copy link
Member

@gaborcsardi gaborcsardi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@gaborcsardi gaborcsardi merged commit c9983ad into r-lib:main Sep 19, 2023
@MichaelChirico MichaelChirico deleted the patch-2 branch September 19, 2023 23:45
@MichaelChirico
Copy link
Author

net/net, #37 + #38 + #39 gave a 20% speed-up to this file as input: https://raw.githubusercontent.com/mwaldstein/edgarWebR/fb9a38e6a57186ffd1c93cc1aa00c4fdf1bc5514/vignettes/intro/0/browse-edgar-3c23fc.R

not bad!

pkgload::load_all()
"https://raw.githubusercontent.com/mwaldstein/edgarWebR/fb9a38e6a57186ffd1c93cc1aa00c4fdf1bc5514/vignettes/intro/0/browse-edgar-3c23fc.R" |>
  parse() |>
  xml_parse_data() |>
  system.time()

# NB: run on a fresh session since caching has a big effect on re-runs

# @ 40b9b577da01e77c7f7048fc0c71407fcc4e5e6e
#    user  system elapsed 
#   4.349   0.083   4.529

# @ c9983ad1bafb1c10a2adc133d6fd06a02623abb4
#    user  system elapsed 
#   3.559   0.103   3.767 

@MichaelChirico MichaelChirico mentioned this pull request Sep 19, 2023
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants