Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache default_ua on package load #322

Closed
wants to merge 2 commits into from
Closed

Cache default_ua on package load #322

wants to merge 2 commits into from

Conversation

richfitz
Copy link

The default_ua function is called on every request and involves
calls to the slow base function packageDescription. Depending on the
use, this can account for up to 40% of the client-side CPU time which
is a lot when using over a loopback connection (e.g. etcd, RNeo4j).

Because this information is unlikely to change during a session (and is
infomative for the server and likely ignored anyway) computing it once
and reusing it seems safe).


I'm not sure how much you're interested in local performance vs use over network connections (which will dwarf the differences here). Benchmarks vary depending by system; in particular SSD-backed storage suffer much less.

Running HEAD on a locally-running etcd server (which is where I found this issue)

Rprof(interval=0.002)
for (i in 1:1000) {
  httr::HEAD("http://localhost:2379")
}
Rprof(NULL)

Before:

$by.total
                               total.time total.pct self.time self.pct
"<Anonymous>"                       3.404     99.94     0.008     0.23
"request_perform"                   1.930     56.66     0.002     0.06
"structure"                         1.730     50.79     0.026     0.76
"request"                           1.710     50.21     0.008     0.23
"request_prepare"                   1.638     48.09     0.002     0.06
"request_combine"                   1.636     48.03     0.000     0.00
"vapply"                            1.634     47.97     0.008     0.23
"keep_last"                         1.632     47.92     0.010     0.29
"compact"                           1.628     47.80     0.000     0.00
"request_default"                   1.624     47.68     0.002     0.06
"default_ua"                        1.600     46.98     0.004     0.12
"packageVersion"                    1.576     46.27     0.000     0.00
"suppressWarnings"                  1.574     46.21     0.000     0.00
"withCallingHandlers"               1.572     46.15     0.002     0.06
"packageDescription"                1.568     46.04     0.002     0.06
"file.exists"                       1.480     43.45     1.480    43.45
".Call"                             1.406     41.28     1.406    41.28

after:

$by.total
                             total.time total.pct self.time self.pct
"<Anonymous>"                     1.718     99.08     0.008     0.46
"request_perform"                 1.516     87.43     0.004     0.23
"request_fetch"                   1.340     77.28     0.002     0.12
"request_fetch.write_memory"      1.338     77.16     0.000     0.00
".Call"                           1.336     77.05     1.336    77.05

This can be a lot less pronounced; using Python' simple http server and running on an SSD-based system I see

import SimpleHTTPServer
import SocketServer
Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
httpd = SocketServer.TCPServer(("", 8000), Handler)
httpd.serve_forever()
Rprof(interval=0.002)
for (i in 1:1000) {
  r <- httr::HEAD("http://localhost:8000")
}
Rprof(NULL)
summaryRprof()

before

$by.total
                               total.time total.pct self.time self.pct
"<Anonymous>"                       1.030     99.23     0.032     3.08
"request_perform"                   0.734     70.71     0.002     0.19
"structure"                         0.470     45.28     0.046     4.43
"request"                           0.408     39.31     0.010     0.96
"request_prepare"                   0.360     34.68     0.006     0.58
"request_combine"                   0.352     33.91     0.002     0.19
"vapply"                            0.300     28.90     0.020     1.93
"compact"                           0.290     27.94     0.002     0.19
"request_default"                   0.288     27.75     0.000     0.00
"keep_last"                         0.284     27.36     0.012     1.16
"default_ua"                        0.240     23.12     0.006     0.58
"stopifnot"                         0.230     22.16     0.022     2.12

after

$by.total
                             total.time total.pct self.time self.pct
"<Anonymous>"                     0.794     98.51     0.038     4.71
"request_perform"                 0.478     59.31     0.008     0.99
"stopifnot"                       0.270     33.50     0.020     2.48
"handle_url"                      0.236     29.28     0.000     0.00

(sorry, not really sure why the call traces look so different).

The default_ua function is called on every request and involves
calls to the slow base function packageDescription.  Depending on the
use, this can account for up to 40% of the client-side CPU time which
is a lot when using over a loopback connection (e.g. etcd, RNeo4j).

Because this information is unlikely to change during a session (and is
infomative for the server and likely ignored anyway) computing it once
and reusing it seems safe).
@richfitz
Copy link
Author

The CI failure is a hang during package installation -- I have tested the package locally and it should pass fine.

paste0(names(versions), "/", versions, collapse = " ")
}
# This is defined in .onLoad()
default_ua <- NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please keep all the code here? I'd prefer you did something like

cache <- new.env(parent = emptyenv())
cache$default_ua <- NULL

and then returned early in default_ua() if that value was set

This changes `default_ua` from a package variable to a function defined
within a cache.  The default UA will be defined on first use, rather than
on package load.
@hadley hadley closed this in 3e3e3ae May 24, 2016
@hadley
Copy link
Member

hadley commented May 24, 2016

In the future, can you please merge/rebase, and then ping me in the comments with "PTAL" when you're done? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants