-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Ensure Presto database engine spec correctly handles Trino #20729
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20729 +/- ##
===========================================
- Coverage 66.33% 54.85% -11.49%
===========================================
Files 1767 1767
Lines 67295 67305 +10
Branches 7144 7144
===========================================
- Hits 44643 36919 -7724
- Misses 20824 28558 +7734
Partials 1828 1828
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
) | ||
|
||
if cols: | ||
full_table_name = table_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same logic as previous just indented under the if cols:
statement.
Unlike Presto where get_indexes
returns []
for a non-partition table, Trino returns [{'name': 'partition', 'column_names': [], 'unique': False}]
. Rather than overriding the engine specific normalize_indexes
method I though it would be more prudent to make this method more robust given there was already an expectation that there may be no columns associated with the index, i.e., a non-partitioned table.
|
||
engine = cls.get_engine(database, schema) | ||
with closing(engine.raw_connection()) as conn: | ||
cursor = conn.cursor() | ||
sql = f"SHOW CREATE VIEW {schema}.{table}" | ||
try: | ||
cls.execute(cursor, sql) | ||
return cls.fetch_data(cursor, 1)[0][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should never have been a pyhive.exc
exception to begin with.
@@ -892,19 +892,6 @@ def test_get_create_view_exception(self): | |||
) | |||
schema = "schema" | |||
table = "table" | |||
with self.assertRaises(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure when/why a broad exception should be raised.
@john-bodley Trino using official driver, not PyHive as Presto. Tightly couple though 2 engine specs will make it hard to extends and introduce more bugs. Do u think we should revert #20152 to avoid inherit messy code from |
29247f7
to
93ec699
Compare
@dungdm93 I think in the future when Presto and Trino further diverge it makes sense, however at the moment there is a significant amount of the code that is the same and I sense adhering to the DRY principle (where possible makes sense). |
93ec699
to
10ece47
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a couple of non-blocking nits
if cols: | ||
full_table_name = table_name | ||
if schema_name and "." not in table_name: | ||
full_table_name = "{}.{}".format(schema_name, table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f-string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ktmud this code is unchanged, i.e., it's now nested under the if cols:
section and thus I would prefer not make changes to said code, at least in this PR.
latest_parts = tuple([None] * len(col_names)) | ||
metadata["partitions"] = { | ||
"cols": cols, | ||
"latest": dict(zip(col_names, latest_parts)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should probably change the signature of latest_partition()
itself to return this dict---would worth another PR.
For this code, it seems it can be simplified as:
"latest": dict(zip(col_names, latest_parts)), | |
"latest": { | |
col: latest_parts[i] if latest_parts else None | |
for i, col in enumerate(col_names) | |
} |
@john-bodley 2 projects are now different enough to have separated EngineSpec (at least in driver side).
The significant amount of the code you said is used to patch PyHive driver. Trino in the other hand strictly follow DB-API 2, so default implementations from |
2d29e64
to
c6608a6
Compare
superset/db_engine_specs/presto.py
Outdated
|
||
except DatabaseError: # not a VIEW | ||
return cls.fetch_data(cursor, 1)[0][0] | ||
except SupersetDBAPIDatabaseError: # not a VIEW |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly because we're using a raw connection, i.e., outside of the SQLAlchemy realm, the error comes from the underlying DBAPI, i.e., it's wrapped by SQLAlchemy, and thus we need to define the get_dbapi_exception_mapping
mapping.
@ktmud thanks for the review. I've actually had to update the logic since your approval per #20729 (comment). |
c6608a6
to
0ea8123
Compare
@john-bodley an other way to reusing code is the mixing pattern. |
@john-bodley I have to agree with @dungdm93 here - Over the years the Presto spec has caused me considerable grief due to the code being IMO messy and difficult to maintain. For that reason I've been very happy to see the work that @dungdm93 has done on the Trino spec, which IMO is of a great quality and easy follow and extend. So rather than double up on using the old Presto spec code, I'd rather see us gradually phasing it out. WRT to compatibility of the drivers, I think they're fairly far away from each other right now. For instance, currently clicking on a table in SQL Lab with the recommended This is due to the If we really want to share code between the Presto and Trino specs I would recommend doing something similar to what we're doing for Postgres and Postgres-like databases, where we have the uncontroversial shared pieces in an abstract |
@villebro perhaps |
Closing in favor of #21066. |
SUMMARY
This PR remedies a few issues with Trino—which extends the Presto database engine spec. Specifically it adds a few more safeguards when fetching extra metadata as part of the SQL Lab schema/table workflow.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
CI.
ADDITIONAL INFORMATION