Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add paginators #30

Closed
davidkretch opened this issue Nov 17, 2018 · 6 comments · Fixed by #650
Closed

Add paginators #30

davidkretch opened this issue Nov 17, 2018 · 6 comments · Fixed by #650
Labels
enhancement 💡 New feature or request
Milestone

Comments

@davidkretch
Copy link
Member

No description provided.

@davidkretch davidkretch added the enhancement 💡 New feature or request label Aug 7, 2019
@edgBR
Copy link

edgBR commented Nov 19, 2020

Hi colleagues,

It seems that I face the paginator challenge myself :)

Was trying to get my all time historical trainings:

sm_client <- paws::sagemaker(config = list(region = myregion') )
total_training_jobs <- list()
j <- 1
sequence_var <- seq.POSIXt(from = as.POSIXct("2020-04-01 00:00:00"), to=as.POSIXct("2020-11-20 00:00:00"), by="hour")
for(i in sequence_var){
total_training_jobs[[j]] <- sm_client$list_training_jobs(MaxResults=100, CreationTimeAfter = i)
j <- j+1
}  

And I got a nice 400 ThrottlingException.

Anyone that has tried a workaround?

BR
/E

@davidkretch
Copy link
Member Author

Hey, sorry about that. I'll look into this this weekend. To my knowledge the approach to this is to delay requests some amount.

@davidkretch
Copy link
Member Author

davidkretch commented Nov 22, 2020

I put together this attempt at a paginator. You supply it with your AWS API call as the argument to parameter f and it will take care of fetching each page of results and returning them as a list. Below this function is an example call. Let me know if this helps or not.

# Get all pages of a given API call, retrying with exponential backoff.
paginate <- function(f, max_retries = 5) {
  resp <- f
  result <- list(resp)
  while ("NextToken" %in% names(resp) && length(resp$NextToken) > 0 && resp$NextToken != "") {
    next_token <- resp$NextToken
    call <- substitute(f)
    call$NextToken <- next_token
    # Retry with exponential backoff.
    # See https://docs.aws.amazon.com/general/latest/gr/api-retries.html.
    # See also https://github.com/paws-r/paws/blob/main/examples/error_handling.R.
    retry <- TRUE
    retries <- 0
    while (retry && retries < max_retries) {
      resp <- tryCatch(eval(call), error = function(e) e)
      if (inherits(resp, "error")) {
        if (retries == max_retries) stop(resp)
        wait_time <- 2^retries / 10
        Sys.sleep(wait_time)
        retries <- retries + 1
      }
      else retry <- FALSE
    }
    result <- c(result, list(resp))
  }
  return(result)
}

For an example, see below (using CloudWatch instead of SageMaker in my case). In your case, you'll need to modify the call to use a fixed creation time, e.g. sm_client$list_training_jobs(MaxResults=100, CreationTimeAfter = as.POSIXct("2020-04-01 00:00:00")). With a fixed creation time, the API will split the results into pages and the paginator will fetch each one (hopefully) up to the present.

results <- paginate(
  cw$get_metric_data(
    MetricDataQueries = metric_data_queries,
    StartTime = as.POSIXct("2020-01-01"),
    EndTime = as.POSIXct("2020-11-22")
  )
)

@edgBR
Copy link

edgBR commented Nov 23, 2020

Of course,

How bad of me to have overlooked the next token workaround.

The solution is working perfectly @davidkretch, thanks for that!

BR

@DyfanJones
Copy link
Member

@davidkretch @adambanker

For paginates I am toying around the idea of an apply method:

So we have the standard paginator that will loop over every token.

library(paws.common)

s3 <- paws::s3()

out <- paginate(
  S3$list_objects_v2(
    Bucket = "my_bucket"
  )
)

Secondly we have the apply "family" of paginators that allow users to use a function on each response from the operation.

Basic example:

out <- paginate_lapply(
  S3$list_objects_v2(
    Bucket = "my_bucket"
  ),
  \(resp) {
    resp$Contents
  }
)

What are your thoughts on this? Would like your feedback before I go too far down the rabbit's hole 😆

@DyfanJones
Copy link
Member

paws v-0.4.0 has now been released to the cran. I will close this ticket for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 💡 New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants