Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backends): enable cross-schema and cross-database table access where the backend supports it #6489

Closed
1 task done
cpcloud opened this issue Jun 20, 2023 · 7 comments · Fixed by #9849
Closed
1 task done
Assignees
Labels
feature Features or general enhancements sql Backends that generate SQL

Comments

@cpcloud
Copy link
Member

cpcloud commented Jun 20, 2023

Is your feature request related to a problem?

I can't perform operations on tables that live in different schemas or databases (catalogs in some backends).

Describe the solution you'd like

I'd like to be able to operate on tables that exist in different catalogs, like doing a cross-catalog join.

What version of ibis are you running?

N/A

What backend(s) are you using, if any?

Trino

Code of Conduct

  • I agree to follow this project's Code of Conduct
@cpcloud cpcloud added feature Features or general enhancements trino The Trino backend labels Jun 20, 2023
@cpcloud cpcloud added this to the 6.1 milestone Jul 5, 2023
@cpcloud cpcloud modified the milestones: 6.1, 7.0 Aug 1, 2023
@cpcloud cpcloud changed the title feat(trino): enable cross-catalog operations with the trino backend feat(backends): enable cross-schema and cross-database table access where the backend supports it Aug 2, 2023
@cpcloud cpcloud added sql Backends that generate SQL and removed trino The Trino backend labels Aug 2, 2023
@contang0
Copy link

contang0 commented Sep 9, 2023

This is also crucial for Impala. In Impala both "SHOW SCHEMAS" and "SHOW DATABASES" return the same result.

In Impala's raw SQL you'd usually do cross-schema joins by simply prepending schema name to the table name:

SELECT t1.*, t2.*
FROM schema1.tableX t1
JOIN schema2.tableY t2 
   ON t1.id=t2.id

My naive approach to resolve this problem was to create two clients, each referring to the schemas from which I want to query.

schema1 = ibis.impala.connect(database="schema1", ...)
schema2 = ibis.impala.connect(database="schema2", ...)

t1 = schema1.tables["table1"]
t2 = schema2.tables["table2"].left_join(t1, _.id==t1.id)

And that gives me: IbisError: Multiple backends found for this expression :(

@cpcloud cpcloud removed this from the 7.0 milestone Sep 28, 2023
@binste
Copy link
Contributor

binste commented Oct 4, 2023

Cross-schema access now works for Snowflake and duckdb using the schema attribute to table as described in #7289 🥳 This is awesome! I also tested cross-database on Snowflake where it fails:

con.table("CLASSES", schema="INFORMATION_SCHEMA", database="SNOWFLAKE")
File [~/analytics-env/lib/python3.11/site-packages/ibis/backends/base/sql/alchemy/__init__.py:623](https://vscode-remote+ssh-002dremote-002bg7eip0-002ddebist01.vscode-resource.vscode-cdn.net/mnt/analytics/Personal%20workspaces/Stefan%20Binder/~/analytics-env/lib/python3.11/site-packages/ibis/backends/base/sql/alchemy/__init__.py:623), in BaseAlchemyBackend.table(self, name, database, schema)
    619         raise com.IbisTypeError(
    620             f"`database` must be a string; got {type(database)}"
    621         )
    622     if database != self.current_database:
--> 623         return self.database(name=database).table(name=name, schema=schema)
    625 sqla_table = self._get_sqla_table(name, schema=schema)
    627 schema = self._schema_from_sqla_table(
    628     sqla_table, schema=self._schemas.get(name)
    629 )

TypeError: Database.table() got an unexpected keyword argument 'schema'

No rush, just wanted to update this issue.

@cpcloud
Copy link
Member Author

cpcloud commented Oct 4, 2023

Thanks @binste!

I think to address this issue fully we're going to need to avoid sqlalchemy here, as it doesn't provide a dialect-agnostic notion of a top-level namespace (database, catalog, etc), only one for schema. Some dialects roll this into their implementations but of course that doesn't work for all of our sqlalchemy-based backends that support two levels of namespacing.

@cpcloud
Copy link
Member Author

cpcloud commented Oct 4, 2023

You should be able to use con.table("t", schema="database.schema") (I know it's a hack 😄)

@binste
Copy link
Contributor

binste commented Oct 4, 2023

:) Nice! Can confirm that I can access another database on Snowflake with this.

A bit off topic and feel free to ignore this question if it takes up too much time: Out of curiosity, why are you moving away from SQLAlchemy to sqlglot?

@cpcloud
Copy link
Member Author

cpcloud commented Oct 4, 2023

Glad you asked! I lay out the primary motivations here: #7213

@contang0
Copy link

contang0 commented May 3, 2024

Even though the issue is still open, I already am able to do cross-schema joins on version 9.0 using Impala. Thanks a lot! Really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements sql Backends that generate SQL
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants