Cache default_ua on package load #322

richfitz · 2016-01-14T10:00:34Z

The default_ua function is called on every request and involves
calls to the slow base function packageDescription. Depending on the
use, this can account for up to 40% of the client-side CPU time which
is a lot when using over a loopback connection (e.g. etcd, RNeo4j).

Because this information is unlikely to change during a session (and is
infomative for the server and likely ignored anyway) computing it once
and reusing it seems safe).

I'm not sure how much you're interested in local performance vs use over network connections (which will dwarf the differences here). Benchmarks vary depending by system; in particular SSD-backed storage suffer much less.

Running HEAD on a locally-running etcd server (which is where I found this issue)

Rprof(interval=0.002)
for (i in 1:1000) {
  httr::HEAD("http://localhost:2379")
}
Rprof(NULL)

Before:

$by.total
                               total.time total.pct self.time self.pct
"<Anonymous>"                       3.404     99.94     0.008     0.23
"request_perform"                   1.930     56.66     0.002     0.06
"structure"                         1.730     50.79     0.026     0.76
"request"                           1.710     50.21     0.008     0.23
"request_prepare"                   1.638     48.09     0.002     0.06
"request_combine"                   1.636     48.03     0.000     0.00
"vapply"                            1.634     47.97     0.008     0.23
"keep_last"                         1.632     47.92     0.010     0.29
"compact"                           1.628     47.80     0.000     0.00
"request_default"                   1.624     47.68     0.002     0.06
"default_ua"                        1.600     46.98     0.004     0.12
"packageVersion"                    1.576     46.27     0.000     0.00
"suppressWarnings"                  1.574     46.21     0.000     0.00
"withCallingHandlers"               1.572     46.15     0.002     0.06
"packageDescription"                1.568     46.04     0.002     0.06
"file.exists"                       1.480     43.45     1.480    43.45
".Call"                             1.406     41.28     1.406    41.28

after:

$by.total
                             total.time total.pct self.time self.pct
"<Anonymous>"                     1.718     99.08     0.008     0.46
"request_perform"                 1.516     87.43     0.004     0.23
"request_fetch"                   1.340     77.28     0.002     0.12
"request_fetch.write_memory"      1.338     77.16     0.000     0.00
".Call"                           1.336     77.05     1.336    77.05

This can be a lot less pronounced; using Python' simple http server and running on an SSD-based system I see

import SimpleHTTPServer
import SocketServer
Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
httpd = SocketServer.TCPServer(("", 8000), Handler)
httpd.serve_forever()

Rprof(interval=0.002)
for (i in 1:1000) {
  r <- httr::HEAD("http://localhost:8000")
}
Rprof(NULL)
summaryRprof()

before

$by.total
                               total.time total.pct self.time self.pct
"<Anonymous>"                       1.030     99.23     0.032     3.08
"request_perform"                   0.734     70.71     0.002     0.19
"structure"                         0.470     45.28     0.046     4.43
"request"                           0.408     39.31     0.010     0.96
"request_prepare"                   0.360     34.68     0.006     0.58
"request_combine"                   0.352     33.91     0.002     0.19
"vapply"                            0.300     28.90     0.020     1.93
"compact"                           0.290     27.94     0.002     0.19
"request_default"                   0.288     27.75     0.000     0.00
"keep_last"                         0.284     27.36     0.012     1.16
"default_ua"                        0.240     23.12     0.006     0.58
"stopifnot"                         0.230     22.16     0.022     2.12

after

$by.total
                             total.time total.pct self.time self.pct
"<Anonymous>"                     0.794     98.51     0.038     4.71
"request_perform"                 0.478     59.31     0.008     0.99
"stopifnot"                       0.270     33.50     0.020     2.48
"handle_url"                      0.236     29.28     0.000     0.00

(sorry, not really sure why the call traces look so different).

The default_ua function is called on every request and involves calls to the slow base function packageDescription. Depending on the use, this can account for up to 40% of the client-side CPU time which is a lot when using over a loopback connection (e.g. etcd, RNeo4j). Because this information is unlikely to change during a session (and is infomative for the server and likely ignored anyway) computing it once and reusing it seems safe).

richfitz · 2016-01-14T10:07:18Z

The CI failure is a hang during package installation -- I have tested the package locally and it should pass fine.

hadley · 2016-05-20T19:12:05Z

R/config.r

-  paste0(names(versions), "/", versions, collapse = " ")
-}
+# This is defined in .onLoad()
+default_ua <- NULL


Could you please keep all the code here? I'd prefer you did something like

cache <- new.env(parent = emptyenv()) cache$default_ua <- NULL

and then returned early in default_ua() if that value was set

This changes `default_ua` from a package variable to a function defined within a cache. The default UA will be defined on first use, rather than on package load.

hadley · 2016-05-24T13:52:53Z

In the future, can you please merge/rebase, and then ping me in the comments with "PTAL" when you're done? Thanks!

hadley reviewed May 20, 2016
View reviewed changes

Group default_ua changes together, as requested.

c0fdc3b

This changes `default_ua` from a package variable to a function defined within a cache. The default UA will be defined on first use, rather than on package load.

hadley closed this in 3e3e3ae May 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache default_ua on package load #322

Cache default_ua on package load #322

richfitz commented Jan 14, 2016

richfitz commented Jan 14, 2016

hadley May 20, 2016

hadley commented May 24, 2016

Cache default_ua on package load #322

Cache default_ua on package load #322

Conversation

richfitz commented Jan 14, 2016

richfitz commented Jan 14, 2016

hadley May 20, 2016

Choose a reason for hiding this comment

hadley commented May 24, 2016