Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird query behavior: Successful query with HTTP 403 AccessDenied error #96

Closed
mraess opened this issue Jun 12, 2020 · 10 comments
Closed
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@mraess
Copy link

mraess commented Jun 12, 2020

Issue Description

I don't know if you remember, but not to long ago we discovered an issue in paws.common together and in an issue that had to do with pulling in profiles into the profile parameter into dbConnect(). Everything works fine even with AWS_PROFILE as env variables. I've been working with the updated packages and a new error prevents me from running a script on the server.

When I run a query, the data is pulled and written into an r object (see test_results below). However, I also get this error message:

Info: (Data scanned: 625.16 GB)
additional arguments ignored in warning()
Warning: AccessDenied (HTTP 403). Access Denied

Which is weird since the data are actually pulled and the connection to the database is successfully established. Any idea why this message is popping up?

As always, any insights would be great!

Reproducible Example

Example:

library(RJDBC)
library(glue)
library(tidyverse)
library(magrittr)

library(rlang)
library(noctua)
library(DBI)
library(paws.common)
library(paws)

con_3 <- dbConnect(noctua::athena(),
                 profile_name = "profile",
                 s3_staging_dir = "s3://aws-athena-query-results-XXXXXXXXXX-us-east-1/", 
                 region = "us-east-1")

test_query <- glue_sql("select event_date, count(x) as x_count from logs.x group by 1 limit 10;", .con = con_3)

# make sure all results are returned from athena

setMethod("dbGetQuery", signature(conn="JDBCConnection", statement="character"),  def=function(conn, statement, ...) {
r <- dbSendQuery(con, test_query)
on.exit(.jcall(r@stat, "V", "close"))
if (conn@jc %instanceof% "com.amazonaws.athena.jdbc.AthenaConnection") fetch(r, -1, 999) # Athena can only pull 999 rows at a time
else fetch(r, -1)
})

test_results <- dbGetQuery(conn = con_3, statement = test_query)
Session Info
> devtools::session_info()
─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 3.5.3 (2019-03-11)
 os       macOS  10.15.3              
 system   x86_64, darwin15.6.0        
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       America/New_York            
 date     2020-06-12Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
 package        * version  date       lib source        
 assertthat       0.2.1    2019-03-21 [2] CRAN (R 3.5.2)
 backports        1.1.7    2020-05-13 [1] CRAN (R 3.5.3)
 bit              1.1-15.1 2020-01-14 [2] CRAN (R 3.5.2)
 bit64            0.9-7    2017-05-08 [2] CRAN (R 3.5.0)
 blob             1.2.1    2020-01-20 [2] CRAN (R 3.5.2)
 broom            0.5.6    2020-04-20 [1] CRAN (R 3.5.3)
 callr            3.4.3    2020-03-28 [1] CRAN (R 3.5.3)
 cellranger       1.1.0    2016-07-27 [2] CRAN (R 3.5.0)
 cli              2.0.2    2020-02-28 [1] CRAN (R 3.5.2)
 colorspace       1.4-1    2019-03-18 [1] CRAN (R 3.5.2)
 crayon           1.3.4    2017-09-16 [2] CRAN (R 3.5.0)
 curl             4.3      2019-12-02 [1] CRAN (R 3.5.2)
 data.table       1.12.8   2019-12-09 [1] CRAN (R 3.5.2)
 DBI            * 1.1.0    2019-12-15 [2] CRAN (R 3.5.2)
 dbplyr           1.4.4    2020-05-27 [1] CRAN (R 3.5.3)
 desc             1.2.0    2018-05-01 [2] CRAN (R 3.5.0)
 devtools         2.3.0    2020-04-10 [1] CRAN (R 3.5.3)
 digest           0.6.25   2020-02-23 [1] CRAN (R 3.5.2)
 dplyr          * 0.8.5    2020-03-07 [1] CRAN (R 3.5.2)
 ellipsis         0.3.1    2020-05-15 [1] CRAN (R 3.5.3)
 fansi            0.4.1    2020-01-08 [2] CRAN (R 3.5.2)
 forcats        * 0.5.0    2020-03-01 [1] CRAN (R 3.5.2)
 fs               1.4.1    2020-04-04 [1] CRAN (R 3.5.3)
 generics         0.0.2    2018-11-29 [1] CRAN (R 3.5.0)
 ggplot2        * 3.3.0    2020-03-05 [1] CRAN (R 3.5.2)
 glue           * 1.4.1    2020-05-13 [1] CRAN (R 3.5.3)
 gt             * 0.2.1    2020-05-23 [1] CRAN (R 3.5.3)
 gtable           0.3.0    2019-03-25 [2] CRAN (R 3.5.3)
 haven            2.3.0    2020-05-24 [1] CRAN (R 3.5.3)
 hms              0.5.3    2020-01-08 [1] CRAN (R 3.5.2)
 httr             1.4.1    2019-08-05 [1] CRAN (R 3.5.2)
 ini              0.3.1    2018-05-20 [2] CRAN (R 3.5.0)
 jsonlite         1.6.1    2020-02-02 [1] CRAN (R 3.5.2)
 knitr            1.28     2020-02-06 [1] CRAN (R 3.5.2)
 lattice          0.20-38  2018-11-04 [2] CRAN (R 3.5.3)
 lifecycle        0.2.0    2020-03-06 [1] CRAN (R 3.5.2)
 lubridate        1.7.8    2020-04-06 [1] CRAN (R 3.5.3)
 magrittr       * 1.5      2014-11-22 [2] CRAN (R 3.5.0)
 memoise          1.1.0    2017-04-21 [2] CRAN (R 3.5.0)
 modelr           0.1.8    2020-05-19 [1] CRAN (R 3.5.3)
 munsell          0.5.0    2018-06-12 [2] CRAN (R 3.5.0)
 nlme             3.1-143  2019-12-10 [2] CRAN (R 3.5.2)
 noctua         * 1.7.0    2020-05-14 [1] CRAN (R 3.5.3)
 packrat          0.5.0    2018-11-14 [2] CRAN (R 3.5.0)
 paws           * 0.1.8    2020-04-28 [1] CRAN (R 3.5.3)
 paws.analytics   0.1.8    2020-04-26 [1] CRAN (R 3.5.3)
 paws.common    * 0.3.1    2020-04-26 [1] CRAN (R 3.5.3)
 paws.storage     0.1.8    2020-04-27 [1] CRAN (R 3.5.3)
 pillar           1.4.3    2019-12-20 [1] CRAN (R 3.5.2)
 pkgbuild         1.0.6    2019-10-09 [2] CRAN (R 3.5.2)
 pkgconfig        2.0.3    2019-09-22 [2] CRAN (R 3.5.2)
 pkgload          1.0.2    2018-10-29 [2] CRAN (R 3.5.0)
 prettyunits      1.1.1    2020-01-24 [2] CRAN (R 3.5.2)
 processx         3.4.2    2020-02-09 [1] CRAN (R 3.5.2)
 ps               1.3.2    2020-02-13 [1] CRAN (R 3.5.2)
 purrr          * 0.3.4    2020-04-17 [1] CRAN (R 3.5.3)
 R6               2.4.1    2019-11-12 [1] CRAN (R 3.5.2)
 Rcpp             1.0.4.6  2020-04-09 [1] CRAN (R 3.5.3)
 readr          * 1.3.1    2018-12-21 [1] CRAN (R 3.5.0)
 readxl           1.3.1    2019-03-13 [1] CRAN (R 3.5.2)
 remotes          2.1.1    2020-02-15 [1] CRAN (R 3.5.2)
 reprex           0.3.0    2019-05-16 [2] CRAN (R 3.5.2)
 rJava          * 0.9-11   2019-03-29 [2] CRAN (R 3.5.2)
 RJDBC          * 0.2-7.1  2018-04-16 [2] CRAN (R 3.5.0)
 rlang          * 0.4.6    2020-05-02 [1] CRAN (R 3.5.3)
 RPostgreSQL      0.6-2    2017-06-24 [2] CRAN (R 3.5.0)
 rprojroot        1.3-2    2018-01-03 [2] CRAN (R 3.5.0)
 rstudioapi       0.11     2020-02-07 [1] CRAN (R 3.5.2)
 rvest            0.3.5    2019-11-08 [1] CRAN (R 3.5.2)
 scales           1.1.0    2019-11-18 [2] CRAN (R 3.5.2)
 sessioninfo      1.1.1    2018-11-05 [2] CRAN (R 3.5.0)
 stringi          1.4.6    2020-02-17 [1] CRAN (R 3.5.2)
 stringr        * 1.4.0    2019-02-10 [1] CRAN (R 3.5.2)
 testthat         2.3.2    2020-03-02 [1] CRAN (R 3.5.2)
 tibble         * 3.0.1    2020-04-20 [1] CRAN (R 3.5.3)
 tidyr          * 1.0.2    2020-01-24 [1] CRAN (R 3.5.2)
 tidyselect       1.0.0    2020-01-27 [2] CRAN (R 3.5.3)
 tidyverse      * 1.3.0    2019-11-21 [1] CRAN (R 3.5.2)
 usethis          1.6.1    2020-04-29 [2] CRAN (R 3.5.3)
 vctrs            0.3.0    2020-05-11 [1] CRAN (R 3.5.3)
 withr            2.2.0    2020-04-20 [1] CRAN (R 3.5.3)
 xfun             0.12     2020-01-13 [1] CRAN (R 3.5.2)
 xml2             1.3.2    2020-04-23 [1] CRAN (R 3.5.3)
@DyfanJones
Copy link
Owner

Hi @mraess yes I know what is happening :) It is nothing to be worries about.

noctua tries to keep the s3_staging_dir "tidy". The way it does this, is it tries to delete the AWS Athena Query output. However if the user doesn't have permission to do this, it returns the error as a warning so that it doesn't prevent the user using AWS Athena.

If you don't like the warning message all the time. Try query caching :D

This won't "tidy" the s3_staging_dir. Plus you have the extra benefit of repeat querying, which is quicker and cheaper as it doesn't need to call AWS Athena again for the same query :D https://dyfanjones.github.io/noctua/articles/aws_athena_query_caching.html

library(DBI)
library(noctua)

con = dbConnect(athena())

noctua_options(cache_size = 10)

dbGetQuery(con, "select * from iris")

Sorry for any confusion it has caused. Side note noctua now supports returning query in chunks (just released I need to update the documentations :S )

library(DBI)
library(noctua)

con = dbConnect(athena())

noctua_options(cache_size = 10)

res <- dbExecuteQuery(con, "select * from iris")

athena_iris <- dbFetch(res, 3000)

Just a quick heads up, the chunk method isn't the fastest if you want to get the entire data frame. I recommend the standard dbGetQuery as that will pull the data straight from AWS S3 :D

@DyfanJones
Copy link
Owner

Please let me know if this resolves you issue.

P.s. thanks of such detailed ticket :D it was a great help understanding what was going on :D

@DyfanJones
Copy link
Owner

Just thinking out loud, this behaviour should really be documented

@DyfanJones DyfanJones added the documentation Improvements or additions to documentation label Jun 12, 2020
@DyfanJones DyfanJones self-assigned this Jun 12, 2020
@mraess
Copy link
Author

mraess commented Jun 15, 2020

Hey @DyfanJones , Thanks for the swift response as always! I'll definitely give this a shot today to see if that resolves the issue and then I'll follow up with you.

@mraess
Copy link
Author

mraess commented Jun 15, 2020

@DyfanJones that did the trick! It's working now locally and on the server. We can close this out. Thanks for putting it into the documentation!

@DyfanJones
Copy link
Owner

@mraess That is perfect. I will leave this open for the time being so I can get some time to add the documentation

DyfanJones pushed a commit that referenced this issue Jun 16, 2020
DyfanJones pushed a commit that referenced this issue Jun 16, 2020
dbClearResult behaviour documentation #96
@DyfanJones
Copy link
Owner

closing ticket as PR #97 has been merged

@ramnathv
Copy link

Thanks for this package @DyfanJones. I ran into this error and turned caching on as per documentation. But now I run into a different error when querying Athena. Any thoughts? I can open a different issue if it is helpful.

Error in .set_ops_arg_check(x, y, all, .seqn = TRUE) : Item 2 of x is 'sql' but the corresponding item of y is 'character'.

@DyfanJones
Copy link
Owner

hi @ramnathv are you able to provide the code that produced the error so i can try to replicate it :D

@DyfanJones
Copy link
Owner

hi @ramnathv I have managed to replicate the error you are coming across I will raise another issue so I can keep track of it, thanks for point this issue out 😄 .