Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to RETRY (#404) #459

Merged
merged 12 commits into from
Jul 27, 2017
Merged

Improvements to RETRY (#404) #459

merged 12 commits into from
Jul 27, 2017

Conversation

asieira
Copy link
Contributor

@asieira asieira commented Jul 25, 2017

  • Add new terminate_on parameter that allows status_codes that
    prevent retries to be specified;
  • Catch error conditions using tryCatch and retry if they occur.

* Add new `terminate_on` parameter that allows status_codes that
prevent retries to be specified;
* Catch error conditions using `tryCatch` and retry if they occur.
Since the example raises an error condition, need to place in a
`contest` block.
R/retry.R Outdated
@@ -16,6 +20,10 @@
#' \code{pause_cap} seconds.
#' @param quiet If \code{FALSE}, will print a message displaying how long
#' until the next request.
#' @param terminate_on Optional vector of numeric or integer HTTP status codes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer is a sub-type of numeric, so you don't need to mention both here (or test for both below)

R/retry.R Outdated
@@ -16,6 +20,10 @@
#' \code{pause_cap} seconds.
#' @param quiet If \code{FALSE}, will print a message displaying how long
#' until the next request.
#' @param terminate_on Optional vector of numeric or integer HTTP status codes
#' that if found on the response will immediately stop the retries. If missing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... that will terminate the retry process.

It's redundant to mention the missing behaviour since there's a default argument.

Copy link
Contributor Author

@asieira asieira Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the default value of NULL is not clear on what status codes will be terminated on the default scenario. I thought about setting the default value of terminate_on to 400:599, but the problem is that this is not exactly the same as running http_error (which will check for >= 400), and thus not strictly backwards compatible.

I guess that even if this is not strictly necessary, I would still rather err on the side of being clearer and more verbose on what the behavior is for the benefit of package users that won't read the source code.

Having said that, httr is your package. If you still think this must go, let me know and I'll remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your comment. I just want you to replace "If missing or null", with "If null".

Copy link
Contributor Author

@asieira asieira Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am so sorry about the confusion, I thought you wanted to remove the entire sentence that describes what the behavior is when the parameter was NULL or missing. My mistake. 🙇

Will replace "if missing or null" for "if null" as requested right now.


hu <- handle_url(handle, url, ...)
req <- request_build(verb, hu$url, body_config(body, match.arg(encode)), config, ...)
resp <- request_perform(req, hu$handle$handle)
resp <- tryCatch(request_perform(req, hu$handle$handle), error = function(e) e)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love having a variable called resp that sometimes contains a response and sometimes contains an error object. However, I can't see an obviously better API.

R/retry.R Outdated
}

resp
if ("error" %in% class(resp)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use inherits() (and below)

R/retry.R Outdated
resp
if ("error" %in% class(resp)) {
stop(resp)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop the else block, and simply return resp (but without the explicit return)

R/retry.R Outdated

retry_should_stop <- function(i, times, resp, terminate_on) {
if (i >= times) {
return(TRUE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the explicit returns

R/retry.R Outdated
length <- ceiling(stats::runif(1, max = min(pause_cap, pause_base * (2 ^ i))))
if (!quiet) {
status <- if ("error" %in% class(resp)) as.character(resp) else status_code(resp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think coercing the error to a string here is the right approach because it's going to be completely different to a short http status code. Instead it would be be better to just insert "ERROR" or similar.

Copy link
Contributor Author

@asieira asieira Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think it's useful to output what the encountered error was, otherwise the information on what errors happened during retries will be completely lost. IMHO outputting this information will be invaluable to httr users to debug transient errors they are encountering. i.e., being able to tell a DNS resolution failure from a refused connection:

> RETRY("GET", "http://invalidhostname/")
Request failed [Error in curl::curl_fetch_memory(url, handle = handle): Couldn't resolve host name
]. Retrying in 1 seconds...

Any alternative ways to output this, perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be better to have a completely different print method in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, updated the code to print the error condition on a separate line and include ERROR instead of the status code:

> RETRY("GET", "http://invalidhostname/")
Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: invalidhostname
Request failed [ERROR]. Retrying in 2 seconds...

R/retry.R Outdated
}
}

retry_should_stop <- function(i, times, resp, terminate_on) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think should be retry_should_terminate to be consistent with the argument name

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small change. Otherwise looks good so can you please add a bullet to NEWS? Please also make sure that a commit message includes Fixes #404

R/retry.R Outdated
length <- ceiling(stats::runif(1, max = min(pause_cap, pause_base * (2 ^ i))))
if (!quiet) {
message("Request failed [", status, "]. Retrying in ", length, " seconds...")
if (inherits(resp, "error")) {
error_description <- gsub("[\n\r]+$", "", as.character(resp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think resp$message is probably a better way of getting the error message.

Copy link
Contributor Author

@asieira asieira Jul 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does work and makes the code a bit cleaner. But doing as.character returns more information, in particular where the error occurred:

> library(httr)
> e <- tryCatch(GET("http://invalidhostname"), error = function(e) e)
> e$message
[1] "Could not resolve host: invalidhostname"
> as.character(e)
[1] "Error in curl::curl_fetch_memory(url, handle = handle): Could not resolve host: invalidhostname\n"

I vote for using as.character for the more complete information, even if that means the code is a little uglier. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that as.character.error() existed, so I now think using as.character() is fine.

R/retry.R Outdated
#' parameter, which by default means a response for which \code{\link{http_error}()}
#' is \code{FALSE}. Will also retry on error conditions raised by the underlying curl code,
#' but if the last retry still raises one, \code{RETRY} will raise it again with
#' \code{\link{stop}()} to maintain backwards compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove "to maintain backwards compatibility"

@hadley hadley merged commit e17169b into r-lib:master Jul 27, 2017
@hadley
Copy link
Member

hadley commented Jul 27, 2017

Thanks for all your work on this!

@asieira
Copy link
Contributor Author

asieira commented Jul 27, 2017

My pleasure. Thank you for all the wonderful packages! 🙇

@mvkorpel
Copy link

Using the terminate_on parameter seems a bit unwieldy. Let's say I would like to stop retrying on code 403 (Forbidden). In my opinion, the intuitive way to do this is to specify terminate_on = 403. However, this does not have the desired effect, as even code 200 (OK) will then cause retries. To do the right thing, one must use terminate_on = c(200, 403), or possibly something like terminate_on = c(100:399, 403) to account for all imaginable non-error codes.

The parameter works exactly as documented, but I think it would be great to have a user-friendly way to (by default?) include non-error codes (< 400) on top of the set given in terminate_on. Alternatively, a puzzled user might appreciate a note in the ?RETRY document about the non-error codes.

Anyway, terminate_on is already useful, but thanks in advance for considering if a small change to the logic or a note in the manual would make sense!

@jennybc
Copy link
Member

jennybc commented May 16, 2018

@mvkorpel A comment on a merged PR is very low visibility, i.e. no one will necessarily see it when they come back to work on this package. I suggest you create a new issue with these observations about how to make RETRY more useful.

@mvkorpel
Copy link

@jennybc Thanks for the comment. I will do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants