-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor(sql): deprecate and remove hierarchical usage of schema #8655
Conversation
ACTION NEEDED Ibis follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
ibis/util.py
Outdated
def _warn_schema(): | ||
warn_deprecated( | ||
name="schema", | ||
as_of="9.0", | ||
removed_in="10.0", | ||
instead="Use the `database` kwarg with one of the following patterns:" | ||
'\ndatabase="database"' | ||
'\ndatabase=("catalog", "database")' | ||
'\ndatabase="catalog.database"', | ||
# TODO: add option for namespace object | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to move this into the deprecation class helper mixin, but after I squash down everything else because that rebase might be a bit gnarly.
dbs = "SHOW SCHEMAS" | ||
|
||
with self._safe_raw_sql(dbs) as cur: | ||
databases = list(map(itemgetter(0), cur)) | ||
|
||
return self._filter_with_like(databases, like) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Risingwave doesn't support information_schema.schemata
but this works.
# Include temporary tables only if no database has been explicitly specified | ||
# to avoid temp tables showing up in all calls to `list_tables` | ||
if db == "public": | ||
out += self._fetch_temp_tables() | ||
|
||
return self._filter_with_like(map(itemgetter(0), out), like) | ||
|
||
def list_databases(self, like=None) -> list[str]: | ||
def _fetch_temp_tables(self): | ||
# postgres temporary tables are stored in a separate schema | ||
# so we need to independently grab them and return them along with | ||
# the existing results | ||
|
||
sql = ( | ||
sg.select("table_name") | ||
.from_(sg.table("tables", db="information_schema")) | ||
.distinct() | ||
.where(C.table_type.eq("LOCAL TEMPORARY")) | ||
.sql(self.dialect) | ||
) | ||
|
||
with self._safe_raw_sql(sql) as cur: | ||
out = cur.fetchall() | ||
|
||
return out |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both risingwave
and postgres
were returning all tables in all databases (formerly schema
) when any list_tables
call was made.
I fixed that to only return tables in self.current_database
but that leaves out temp tables, so I added this helper to also return temp tables (which live in their own set of postgres-schema that have unpredictable names).
raise ValueError( | ||
"Using both the `schema` and `database` kwargs is not supported. " | ||
"`schema` is deprecated and will be removed in Ibis 10.0" | ||
"\nUse the `database` kwarg with one of the following patterns:" | ||
'\ndatabase="database"' | ||
'\ndatabase=("catalog", "database")' | ||
'\ndatabase="catalog.database"', | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the odd duck out, as we had a schema
kwarg in list_tables
but no database
kwarg, so this isn't a breaking change.
b55b64f
to
ab2d621
Compare
ab2d621
to
596e9fc
Compare
9d86b41
to
80e225f
Compare
This is ready for a look (I think). I've left the various backend-specific commits unsquashed for the moment as I think that might make it easier to review, but if folks would rather all the groups of changes were visible together I can squash it down. Upon writing that, I think I'm going to squash it, should make reviewing the blocks of changes a bit easier. |
This is part of a larger refactor where we are removing all usage of the word `schema` in its hierarchical sense. Ibis will adhere to the following convention moving forward: * `schema`: a mapping of column names to datatypes * `database`: a collection of tables * `catalog`: a collection of databases These terms are mapped accordingly onto the specific backend terminology. This change to `list_tables` is not a breaking change. `schema` is deprecated and will warn on usage, but all existing code should remain functional. chore(schema): add mixin to help with schema deprecation warnings refactor(trino): deprecate schema in list_tables refactor(postgres): deprecate schema in list_tables refactor(oracle): deprecate schema in list_tables refactor(bigquery): deprecate schema in list_tables refactor(mysql): deprecate schema in list_tables refactor(duckdb): deprecate schema in list_tables refactor(mssql): deprecate schema in list_tables refactor(snowflake): deprecate schema in list_tables docs(list_tables): add backend-specific docstrings for list_tables fix(postgres): handle temp tables in list_tables
80e225f
to
89e7d7c
Compare
BREAKING CHANGE: We are removing the word `schema` in its hierarchical sense. We use `database` to mean a collection of tables. The behavior of all `*_database` methods now applies only to collections of tables and never to collections of `database` (formerly `schema`) * `CanListDatabases` abstract methods now all refer to collections of tables. * `CanCreateDatabases` abstract methods now all refer to collections of tables. * `list_databases` now takes a kwarg `catalog`. * `create_database` now takes a kwarg `catalog`. * `drop_database` now takes a kwarg `catalog`. * `current_database` now refers to the current collection of tables. * `CanCreateSchema` is deprecated and `create_schema`, `drop_schema`, `list_schemas`, and `current_schema` are deprecated and redirect to the corresponding method/property ending in `database`. * We add a `CanListCatalog` and `CanCreateCatalog` that can list and create collections of `database`, respectively. The new methods are `list_catalogs`, `create_catalog`, `drop_catalog`, * There is a new `current_catalog` property. refactor(duckdb): catalog database switchover refactor(snowflake): catalog database switchover refactor(trino): catalog database switchover refactor(postgres): catalog database switchover refactor(oracle): catalog database switchover refactor(mssql): catalog database switchover refactor(bigquery): catalog database switchover refactor(clickhouse): catalog database switchover refactor(datafusion): catalog database switchover refactor(mysql): catalog database switchover refactor(exasol): catalog database switchover refactor(risingwave): catalog database switchover test(list_databases): add test for list_database contents
This is not a breaking change as `get_schema` was introduced during the-epic-split refactor and has not yet been released.
`schema` kwarg is deprecated in all `Backend.table` usage. Wherever it was present, we now have handling to warn if `schema` is passed, and to encourage users to switch usage to the tuple of `(catalog, database)`.
This is an internals only change, but changes `database` to `catalog` and changes `schema` to `database` for our internal `Namespace` op.
The `schema` kwarg here was introduced during the-epic-split, so not deprecating.
Deprecate the use of `schema` as a hierarchical term in all `*_view` DDL methods. Added a helper mixin to try to avoid continued repetition of the warning code and the repeated parsing of the `database` + `schema` combinations. refactor(pyspark): remove schema kwarg from *_view This would be a breaking change but currently this functionality is broken anyway. In any case, PySpark only has support for a single level of table hierarchy, so it should just be `database` as an option here. refactor(sqlite): deprecate schema in `*_view` ddl refactor(bigquery): deprecate schema in `*_view` ddl methods
`schema` as a hierarchical term is deprecated in `insert`. It will be removed Ibis 10.0. refactor(bigquery): deprecate `schema` kwarg in insert refactor(snowflake): deprecate `schema` kwarg in insert
89e7d7c
to
d605837
Compare
@@ -309,8 +345,31 @@ def list_tables( | |||
with self._safe_raw_sql(sql) as cur: | |||
out = cur.fetchall() | |||
|
|||
# Include temporary tables only if no database has been explicitly specified | |||
# to avoid temp tables showing up in all calls to `list_tables` | |||
if db == "public": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, this is a bit of an edge case.
On the one hand, temporary tables are always visible to the user who created them. On the other, the intent of asking for tables in a specific schema would seem to preclude showing them.
What you've done here seems fine to me.
def _to_catalog_db_tuple(self, table_loc: sge.Table): | ||
if table_loc is None or table_loc == (None, None): | ||
return None, None | ||
|
||
if (sg_cat := table_loc.args["catalog"]) is not None: | ||
sg_cat.args["quoted"] = False | ||
sg_cat = sg_cat.sql(self.name) | ||
if (sg_db := table_loc.args["db"]) is not None: | ||
sg_db.args["quoted"] = False | ||
sg_db = sg_db.sql(self.name) | ||
|
||
return sg_cat, sg_db | ||
|
||
def _to_sqlglot_table(self, database): | ||
if database is None: | ||
return None | ||
elif isinstance(database, tuple): | ||
if len(database) > 2: | ||
raise ValueError( | ||
"Only database hierarchies of two or fewer levels are supported." | ||
"\nYou can specify ('catalog', 'database')." | ||
) | ||
elif len(database) == 2: | ||
catalog, database = database | ||
elif len(database) == 1: | ||
database = database[0] | ||
catalog = None | ||
else: | ||
raise ValueError( | ||
f"Malformed database tuple {database} provided" | ||
"\nPlease specify one of:" | ||
'\n("catalog", "database")' | ||
'\n("database",)' | ||
) | ||
database = sg.exp.Table( | ||
catalog=sg.to_identifier(catalog, quoted=self.compiler.quoted), | ||
db=sg.to_identifier(database, quoted=self.compiler.quoted), | ||
) | ||
elif isinstance(database, str): | ||
# There is no definition of a sqlglot catalog.database hierarchy outside | ||
# of the standard table expression. | ||
# sqlglot parsing of the string will assume that it's a Table | ||
# so we unpack the arguments into a new sqlglot object, switching | ||
# table (this) -> database (db) and database (db) -> catalog | ||
table = sg.parse_one(database, into=sg.exp.Table, dialect=self.dialect) | ||
if table.args["catalog"] is not None: | ||
raise exc.IbisInputError( | ||
f"Overspecified table hierarchy provided: `{table.sql(self.dialect)}`" | ||
) | ||
catalog = table.args["db"] | ||
db = table.args["this"] | ||
database = sg.exp.Table(catalog=catalog, db=db) | ||
else: | ||
raise ValueError("oops") | ||
|
||
return database |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry you had to write this :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, me too, but I think we'll get some mileage out of it. This can also help with future usage patterns like con.table("catalog.database.table")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Epic work, thanks for tackling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
xref #6331 |
No more
schema
This is part of a larger refactor where we are removing all usage of the
word
schema
in its hierarchical sense.Ibis will adhere to the following convention moving forward:
schema
: a mapping of column names to datatypesdatabase
: a collection of tablescatalog
: a collection of databasesThere are a bunch of commits here that I'm currently keeping separate so I can add fixup changes to the appropriate backend as needed.
I plan to squash all of these into their corresponding top-level commits before we rebase.
I've listed out the commit messages below for those top-level commits (each one corresponding to the changes to, e..g
list_tables
orcreate_schema
, etc.).refactor(list_tables): deprecate schema keyword
This change to
list_tables
is not a breaking change.schema
isdeprecated and will warn on usage, but all existing code should remain
functional.
refactor(ddl): deprecate all *_schema methods
BREAKING CHANGE: We are removing the word
schema
in its hierarchicalsense. We use
database
to mean a collection of tables. The behavior ofall
*_database
methods now applies only to collections of tables andnever to collections of
database
(formerlyschema
)CanListDatabases
abstract methods now all refer tocollections of tables.
CanCreateDatabases
abstract methods now all refer tocollections of tables.
list_databases
now takes a kwargcatalog
.create_database
now takes a kwargcatalog
.drop_database
now takes a kwargcatalog
.current_database
now refers to the current collection of tables.CanCreateSchema
is deprecated andcreate_schema
,drop_schema
,list_schemas
, andcurrent_schema
are deprecated and redirect to thecorresponding method/property ending in
database
.CanListCatalog
andCanCreateCatalog
that can list andcreate collections of
database
, respectively.The new methods are
list_catalogs
,create_catalog
,drop_catalog
,current_catalog
property.refactor(get_schema): remove schema kwarg, add catalog, kw-only
This is not a breaking change as
get_schema
was introduced duringthe-epic-split refactor and has not yet been released.
refactor(table): deprecate schema
schema
kwarg is deprecated in allBackend.table
usage. Wherever itwas present, we now have handling to warn if
schema
is passed, and toencourage users to switch usage to the tuple of
(catalog, database)
.refactor(ops): use catalog and database kwargs for Namespace op
This is an internals only change, but changes
database
tocatalog
and changes
schema
todatabase
for our internalNamespace
op.refactor(table ddl): remove hierarchical schema from *_table methods
The
schema
kwarg here was introduced during the-epic-split, so not deprecating.refactor(sql): deprecate schema in _view ddl methods
Deprecate the use of
schema
as a hierarchical term in all*_view
DDLmethods.
Added a helper mixin to try to avoid continued repetition of the warning
code and the repeated parsing of the
database
+schema
combinations.refactor(sql): deprecate schema kwarg in insert
schema
as a hierarchical term is deprecated ininsert
.It will be removed Ibis 10.0.
refactor(ddl): deprecate schema keyword in truncate_table
A few followups
Not for this PR, which is already too big
I misseddoneinsert
, need to do deprecateschema
there, too.schema
kwarg fromconnect
anddo_connect
(although I'm not sure this is necessary)schema
?Fixes #8477
Fixes #8380
Fixes #8491
xref: #8253
xref: #6821
xref: #6489