refactor(language): proposal to remove all usage of schema
to refer to hierarchy
#8477
Labels
Milestone
schema
to refer to hierarchy
#8477
Schema as a word in databases is fraught with dual meanings and is
generally horrible. We do not use it consistently in Ibis, even
sometimes using different meanings for the same method but on different
backends.
I propose a complete elimination of the hierarchical "schema" that Postgres and others have saddled us with.
Glossary
For the purposes of this issue and also moving forward in Ibis:
database
: a collection of tablescatalog
: a collection of databasesschema
: a mapping of column names to dtypes and NOTHING ELSEThe
database
kwarg in IbisPost refactor,
database
can always be one of:"database"
"catalog.database"
(although this might error if youpass a
"catalog.database"
to a backend that only has one level ofhierarchy)
("catalog", "database")
Places where we currently use hierarchical schema and removal proposal
Backend.list_tables
Current:
list_tables
that only takesschema
as kwargFuture:
These 3 are relatively easy, we can deprecate
schema
and have it warnand add
database
Current:
list_tables
that takes bothdatabase
andschema
as kwargsCurrent behavior:
schema
, we assume currentdatabase
database
, we assume currentschema
(duckdb,but this seems like a bug and no one would do this)
database
, we assumeinformation_schema
asschema
(mssql)database
, we error (trino, bigquery, snowflake)Future:
if user only passes deprecated
schema
, warn, treat as newdatabase
if user only passes
database
, treat as newdatabase
This istechnically a hard break that we can't warn users about (in code).
This would only impact (if anyone) users of the
mssql
backend and maybe DuckDB (but I doubt it)if user passes both old
database
and oldschema
, warn, treat as"catalog.database"
Backend.table
Current:
Backend.table
that takesschema
anddatabase
Future:
Deprecate
schema
, warn ifschema
is passed, make kw-onlyIn ibis 8, we were creating an ops.Namespace, passing that to
_get_sqla_table
and only using theschema
attribute ofops.Namespace
so I don't think anyone was using onlydatabase
in afunctional way even if it wasn't explicitly erroring there.
(
sqlalchemy
only accepts aschema
kwarg when defining asqlalchemy table, this is one of the reasons for the current mess.)
While this is a breaking change, I don't believe it will break anyone's
code without warning.
Current:
Backend.table
that takesschema
as mapping_schema
kwarg that is unusedFor
dask
andpandas
you can useschema
to override the mappingschema
used to create a table, sort of similar to usingibis.table
?Future:
Deprecate
schema
keyword, offer no replacement for this weird andinconsistent functionality.
Possible: add the
database
kwarg for API consistency and no-op if itgets used (or error?)
Backend.get_schema
This is new since TES and we can rename it to
get_database
(orsomething else)
Backend.list_schemas
We deprecate this, point users at
list_databases
Backend.list_databases
Current: was undefined or was returning nothing (because backend has no
catalog
support)Future:
This will now return
database
And we add acatalog
kwarg so you canlist
databases
in a givencatalog
Current: returns
catalog
Future:
This will just break. This is unfortunate, but I'm cautiously optimistic
that no one is using this programmatically.
Other changes and possibly additional steps in later versions
Backend.list_catalogs
which behaves like the old version oflist_databases
After we remove
schema
(next major version after deprecating),we can consider adding an additional
catalog
kwarg to several of theabove methods. We will still continue to allow all the
database
behaviors listed above and we add error handling for if someone
specifies
catalog
and also provides a dotted path asdatabase
(wehave this now for overspecifying
database.schema
).The text was updated successfully, but these errors were encountered: