Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: Test for columns in dbQuoteIdentifier() #372

Merged
merged 15 commits into from
Apr 1, 2024

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Dec 5, 2021

Closes #263.

@krlmlr krlmlr added this to the 1.4.1 milestone Dec 5, 2021
@krlmlr
Copy link
Member Author

krlmlr commented Dec 5, 2021

@dpprdan: Checks are failing now, could you please take a look?

@krlmlr krlmlr modified the milestones: 1.4.1, 1.4.2, 1.4.3 Dec 5, 2021
@dpprdan
Copy link
Contributor

dpprdan commented Dec 24, 2021

👍 I'll look into it over the holidays.

@dpprdan
Copy link
Contributor

dpprdan commented Dec 28, 2021

The checks fail here

With the current CRAN release and an empty database, this looks like this:

library(RPostgres)
con <- postgresDefault()
objects <- dbListObjects(con)

(sql <- lapply(objects$table, dbQuoteIdentifier, conn = con))
#> [[1]]
#> <SQL> "information_schema".
#> 
#> [[2]]
#> <SQL> "pg_catalog".
(unquoted <- vapply(sql, dbUnquoteIdentifier, conn = con, list(1)))
#> [[1]]
#> <Id> schema = information_schema
#> 
#> [[2]]
#> <Id> schema = pg_catalog
testthat::expect_equal(unquoted, unclass(objects$table))

Note however the period (“.”) behind the quoted schemata.

sql[[1]] 
#> <SQL> "information_schema".

This isn’t a valid schema identifier in Postgres.

(qid <- dbQuoteIdentifier(con, Id(schema = "myschema")))
#> <SQL> "myschema".
create_schema_sql <- paste0("CREATE SCHEMA IF NOT EXISTS ", qid)
dbExecute(con, create_schema_sql)
#> Error: Failed to prepare query: ERROR:  syntax error at or near "."
#> LINE 1: CREATE SCHEMA IF NOT EXISTS "myschema".
#>                                               ^

So it seems to me that the test is broken. I don’t think that it can be fixed for schemata either, because the quote-unquote roundtrip cannot be made to work for just schemata. The heuristics employed in dbUnquoteIdentifier() to identify the database objects from a SQL() string can only work for one object class and we chose that to be tables.
The test can probably be fixed for tables if we write a table to the db and then test whether the quote-unquote roundtrip works for that (i.e. only for the objects[!objects$is_prefix] subset). But maybe this test exists already elsewhere?

dbDisconnect(con)
Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Windows 10 x64 (build 19043)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language en
#>  collate  German_Germany.1252
#>  ctype    German_Germany.1252
#>  tz       Europe/Berlin
#>  date     2021-12-28
#>  pandoc   2.16.2 @ C:/PROGRA~1/Pandoc/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date (UTC) lib source
#>  backports     1.4.1   2021-12-13 [1] CRAN (R 4.1.2)
#>  bit           4.0.4   2020-08-04 [1] CRAN (R 4.1.0)
#>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.1.0)
#>  blob          1.2.2   2021-07-23 [1] CRAN (R 4.1.0)
#>  cli           3.1.0   2021-10-27 [1] CRAN (R 4.1.1)
#>  crayon        1.4.2   2021-10-29 [1] CRAN (R 4.1.1)
#>  DBI           1.1.2   2021-12-20 [1] CRAN (R 4.1.2)
#>  desc          1.4.0   2021-09-28 [1] CRAN (R 4.1.1)
#>  digest        0.6.29  2021-12-01 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.2   2021-12-08 [1] CRAN (R 4.1.2)
#>  generics      0.1.1   2021-10-25 [1] CRAN (R 4.1.1)
#>  glue          1.6.0   2021-12-17 [1] CRAN (R 4.1.2)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  hms           1.1.1   2021-09-26 [1] CRAN (R 4.1.1)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.1)
#>  knitr         1.37    2021-12-16 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.1   2021-09-24 [1] CRAN (R 4.1.1)
#>  lubridate     1.8.0   2021-10-07 [1] CRAN (R 4.1.1)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
#>  pillar        1.6.4   2021-10-18 [1] CRAN (R 4.1.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  pkgload       1.2.4   2021-11-30 [1] CRAN (R 4.1.2)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R.cache       0.15.0  2021-04-30 [1] CRAN (R 4.1.0)
#>  R.methodsS3   1.8.1   2020-08-26 [1] CRAN (R 4.1.0)
#>  R.oo          1.24.0  2020-08-26 [1] CRAN (R 4.1.0)
#>  R.utils       2.11.0  2021-09-26 [1] CRAN (R 4.1.1)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.1)
#>  Rcpp          1.0.7   2021-07-07 [1] CRAN (R 4.1.0)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         0.4.12  2021-10-18 [1] CRAN (R 4.1.1)
#>  rmarkdown     2.11    2021-09-14 [1] CRAN (R 4.1.1)
#>  RPostgres   * 1.4.3   2021-12-20 [1] CRAN (R 4.1.2)
#>  rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi       1.7.6   2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.6.2   2021-09-23 [1] CRAN (R 4.1.1)
#>  testthat      3.1.1   2021-12-03 [1] CRAN (R 4.1.2)
#>  tibble        3.1.6   2021-11-07 [1] CRAN (R 4.1.2)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
#>  withr         2.4.3   2021-11-30 [1] CRAN (R 4.1.2)
#>  xfun          0.29    2021-12-14 [1] CRAN (R 4.1.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
#> 
#>  [1] C:/Users/Daniel.AK-HAMBURG/Documents/R/win-library/4.1
#>  [2] C:/Program Files/R/R-4.1.2/library
#> 
#> ------------------------------------------------------------------------------

@krlmlr
Copy link
Member Author

krlmlr commented Dec 28, 2021

Thanks. I need to take a look at the workflows.

@krlmlr
Copy link
Member Author

krlmlr commented Dec 28, 2021

Can you please bring in the latest changes from main to fix the workflows?

@dpprdan
Copy link
Contributor

dpprdan commented Dec 28, 2021

The test can probably be fixed for tables if we write a table to the db and then test whether the quote-unquote roundtrip works for that (i.e. only for the objects[!objects$is_prefix] subset). But maybe this test exists already elsewhere?

😬 Ha, I guess I should've just read the rest of the file. 🤦🏻‍♂️

BTW From reading the code, I'd say that RMariaDB has the same problem(s) (both the "cannot quote columns" and the "terminal dot" issue).

Current status: trying to get RMariaDB::mariadbDefault() to run with a MariaDB Docker container on Windows.

@dpprdan
Copy link
Contributor

dpprdan commented Dec 30, 2021

Sure enough I found a case where we need to unquote a schema only, namely in dbListObjects() with a prefix.
So this PR unfortunately breaks dbListObjects() with a SQL()-type prefix at the moment.

library(RPostgres)
con <- postgresDefault()
schema_qid <- dbQuoteIdentifier(con, Id(schema = 'Robert'))

The following doesn’t work with v1.4.3, due to the terminal dot

dbExecute(con, paste0('CREATE SCHEMA IF NOT EXISTS ', schema_qid))
#> [1] 0

We need to add a table for dbQuoteIdentifier() to find the schema, see #388

table_qid <- dbQuoteIdentifier(con, Id(schema = 'Robert', table = "mtcars"))
dbWriteTable(con, table_qid, mtcars)

The following works in v1.4.3 (but relies on the terminal dot to unquote the prefix as schema, not table)

dbListObjects(con, schema_qid)
#> Error in mapply(FUN = f, ..., SIMPLIFY = FALSE): zero-length inputs cannot be mixed with those of non-zero length

It works with an Id() object (which just passes through dbUnquoteIdentifier()).

dbListObjects(con, Id(schema = 'Robert'))
#>                                  table is_prefix
#> 1 <Id> schema = Robert, table = mtcars     FALSE

# clean-up
dbExecute(con, paste0('DROP TABLE ', table_qid))
#> [1] 0
dbExecute(con, paste0('DROP SCHEMA "Robert"'))
#> [1] 0
dbDisconnect(con)

I don’t know how to fix this ATM.

@krlmlr
Copy link
Member Author

krlmlr commented Jan 3, 2022

Thanks. It looks to me like dbListObjects() should always be called with an Id(), never with a SQL() . The behavior is misspecified. Should we adapt the list_objects_features test?

@dpprdan
Copy link
Contributor

dpprdan commented Jan 4, 2022

It looks to me like dbListObjects() should always be called with an Id(), never with a SQL().

That would be an easy fix, indeed, and way cleaner than all the hacks I came up with until now. I hope we're not missing something.

Can anything else than a schema be passed to prefix?

unquoted <- dbUnquoteIdentifier(conn, prefix)
is_prefix <- vlapply(unquoted, function(x) {
"schema" %in% names(x@name) && !("table" %in% names(x@name))
})
schemas <- vcapply(unquoted[is_prefix], function(x) x@name[["schema"]])

Because this looks like one could also specify (schema-qualified) table(s) (and then only the schema is used), but I don't understand the use case that prompted this design. Update: I misread the code. This is to only select schemas, not (possibly schema-qualified) tables. So I guess the answer is that schemata and tables can be passed to prefix, but only schemas are used.

It also looks like it is supposed to be vectorized (v*apply()), but DBI::dbQuoteIdentifier() isn't AFAIU (x must be SQL or Id), so dbListObjects() isn't, at least not with a prefix.

dpprdan added a commit to dpprdan/DBItest that referenced this pull request Jan 28, 2022
- we cannot get quote-unquote roundtrip to work for schemata, see r-dbi/RPostgres#372
Copy link
Member Author

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It looks like dbQuoteIdentifier() with incomplete Id() (e.g., only "catalog" and "schema") used to leave a trailing dot in the identifier string, but no longer does. As the tests don't fail, I assume this isn't tested anywhere 😱 . Is this even specified?

We could go for a breaking change, but a DBItest specification and test would be great. We could also explore the use of NA to omit (or include?) the trailing dot. If NA means omit trailing dot, we could be fully backward compatible, with a slightly (but not entirely) awkward interface.

@dpprdan
Copy link
Contributor

dpprdan commented Dec 6, 2022

It looks like dbQuoteIdentifier() with incomplete Id() (e.g., only "catalog" and "schema") used to leave a trailing dot in the identifier string, but no longer does.

It still does in RPostgres:

ret <- paste0(ret, dbQuoteIdentifier(conn, x@name[["schema"]]), ".")

And in RMariaDB.

It's not present in DBI (and never was IIUC, at least not since r-dbi/DBI@6c111b7)

library(DBI)
dbQuoteIdentifier(ANSI(), Id(schema = "myschema"))
#> <SQL> "myschema"

This PR includes a potential fix for RPostgres

https://github.com/r-dbi/RPostgres/pull/372/files#diff-b87396211065ce77f8649cc9f6daadb09f90ee9f7d3cfe4cd4b8dfdc377f616fR8-R9

It is all a bit confusing because the "(can't) quote columns" and "trailing dot" issues and PRs are conflated here (as well as over at RMariaDB). Maybe it's best to separate those from each other?

As the tests don't fail, I assume this isn't tested anywhere 😱 .

My guess would be that the implicit assumption has always been that IDs contain/refer to a table name.

We could go for a breaking change

Not sure I follow. Breaking change with regard to the trailing dot? AFAICT we're not breaking something that used to work before in RPostgres or RMariaDB. So this would merely be a bugfix?

but a DBItest specification and test would be great.

I'll have a look.

@dpprdan
Copy link
Contributor

dpprdan commented Dec 6, 2022

In any case, adding support for columns as quoted identifiers is not a breaking change, but rather in accordance with the DBItest spec 👀

Quoted identifiers can be used as table and column names in SQL queries (source)

but a DBItest specification and test would be great.

There aren't any quote-identifier specs and tests at the moment and I am having a hard time coming up with meaningful ones for this problem.

  • We don't want to do (SQL) string comparisons, because quoting is implementation-specific and therefore belong in the particular backend packages.
  • We cannot use the quote-unquote Roundtrip for Id()s that do not reference tables.
  • The trailing dot is a particular bug, but not spec-relevant per se. Even though I am not aware of a DB that uses something other than the dot as a object name separator (cf. MariaDB, Postgres), but in principle it could be implementation-specific, I guess.

So I'd say that the tests belong in the backend packages rather than in DBItest. Am I missing something?

The MariaDB docs mention many more identifiers (and a similar list applies to Postgres as well - MariaDB is missing schema for example):

Databases, tables, indexes, columns, aliases, views, stored routines, triggers, events, variables, partitions, tablespaces, savepoints, labels, users, roles, are collectively known as identifiers

Shouldn't we support/allow those as well?

@krlmlr
Copy link
Member Author

krlmlr commented Mar 16, 2023

Can you please merge the latest changes here? Looks like I'm not permitted to do this in this PR.

Copy link
Contributor

aviator-app bot commented Nov 9, 2023

Current Aviator status

Aviator will automatically update this comment as the status of the PR changes.
Comment /aviator refresh to force Aviator to re-examine your PR (or learn about other /aviator commands).

This PR was merged manually (without Aviator). Merging manually can negatively impact the performance of the queue. Consider using Aviator next time.


See the real-time status of this PR on the Aviator webapp.
Use the Aviator Chrome Extension to see the status of your PR within GitHub.

@dpprdan
Copy link
Contributor

dpprdan commented Dec 21, 2023

Fixes #453 too. You may want to adjust the title and description, @krlmlr (since you're the PR owner).

@krlmlr krlmlr changed the title dbQuoteIdentifier: allow columns feat: Allow columns in dbQuoteIdentifier() Apr 1, 2024
@krlmlr krlmlr changed the title feat: Allow columns in dbQuoteIdentifier() test: Test for columns in dbQuoteIdentifier() Apr 1, 2024
@krlmlr
Copy link
Member Author

krlmlr commented Apr 1, 2024

Thanks. This now has been implemented independently, but the tests are good.

@aviator-app aviator-app bot added the blocked label Apr 1, 2024
Copy link
Contributor

aviator-app bot commented Apr 1, 2024

This pull request failed to merge: PR cannot be automatically rebased, please rebase manually to continue. Once the issues are resolved, remove the blocked label and re-queue the pull request. Note that the pull request will be automatically re-queued if it has the mergequeue label.

Additional debug info: Failed to rebase this PR onto the latest changes from the base branch. You will probably need to rebase this PR manually and resolve conflicts).

@krlmlr krlmlr merged commit 4c99316 into r-dbi:main Apr 1, 2024
16 of 17 checks passed
@dpprdan dpprdan deleted the fix/col_in_dbQuoteIdentifier branch April 3, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dbQuoteIdentifier() does not allow columns in Id() object.
2 participants