fix: Ensure Presto database engine spec correctly handles Trino #20729

john-bodley · 2022-07-16T03:06:11Z

SUMMARY

This PR remedies a few issues with Trino—which extends the Presto database engine spec. Specifically it adds a few more safeguards when fetching extra metadata as part of the SQL Lab schema/table workflow.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

CI.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

codecov · 2022-07-16T03:09:32Z

Codecov Report

Merging #20729 (d5c5da0) into master (f89ba0c) will decrease coverage by 11.48%.
The diff coverage is 45.83%.

❗ Current head d5c5da0 differs from pull request most recent head 0ea8123. Consider uploading reports for the commit 0ea8123 to get more accurate results

@@             Coverage Diff             @@
##           master   #20729       +/-   ##
===========================================
- Coverage   66.33%   54.85%   -11.49%     
===========================================
  Files        1767     1767               
  Lines       67295    67305       +10     
  Branches     7144     7144               
===========================================
- Hits        44643    36919     -7724     
- Misses      20824    28558     +7734     
  Partials     1828     1828

Flag	Coverage Δ
hive	`53.14% <45.83%> (+<0.01%)`	⬆️
mysql	`?`
postgres	`?`
presto	`53.04% <37.50%> (+<0.01%)`	⬆️
python	`57.77% <45.83%> (-23.70%)`	⬇️
sqlite	`?`
unit	`50.47% <37.50%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/db_engine_specs/presto.py	`40.87% <38.88%> (-47.11%)`	⬇️
superset/db_engine_specs/trino.py	`39.39% <66.66%> (-34.02%)`	⬇️
superset/utils/dashboard_import_export.py	`0.00% <0.00%> (-100.00%)`	⬇️
superset/key_value/commands/update.py	`0.00% <0.00%> (-88.89%)`	⬇️
superset/key_value/commands/delete.py	`0.00% <0.00%> (-85.30%)`	⬇️
superset/key_value/commands/delete_expired.py	`0.00% <0.00%> (-80.77%)`	⬇️
superset/dashboards/commands/importers/v0.py	`15.62% <0.00%> (-76.25%)`	⬇️
superset/datasets/commands/update.py	`25.00% <0.00%> (-69.05%)`	⬇️
superset/datasets/commands/create.py	`29.41% <0.00%> (-68.63%)`	⬇️
superset/datasets/commands/importers/v0.py	`24.03% <0.00%> (-67.45%)`	⬇️
... and 279 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

john-bodley · 2022-07-16T03:09:42Z

superset/db_engine_specs/presto.py

-            )
+
+            if cols:
+                full_table_name = table_name


Same logic as previous just indented under the if cols: statement.

Unlike Presto where get_indexes returns [] for a non-partition table, Trino returns [{'name': 'partition', 'column_names': [], 'unique': False}]. Rather than overriding the engine specific normalize_indexes method I though it would be more prudent to make this method more robust given there was already an expectation that there may be no columns associated with the index, i.e., a non-partitioned table.

john-bodley · 2022-07-16T03:10:24Z

superset/db_engine_specs/presto.py


        engine = cls.get_engine(database, schema)
        with closing(engine.raw_connection()) as conn:
            cursor = conn.cursor()
            sql = f"SHOW CREATE VIEW {schema}.{table}"
            try:
                cls.execute(cursor, sql)
+                return cls.fetch_data(cursor, 1)[0][0]


This should never have been a pyhive.exc exception to begin with.

john-bodley · 2022-07-16T04:33:26Z

tests/integration_tests/db_engine_specs/presto_tests.py

@@ -892,19 +892,6 @@ def test_get_create_view_exception(self):
        )
        schema = "schema"
        table = "table"
-        with self.assertRaises(Exception):


I'm not entirely sure when/why a broad exception should be raised.

dungdm93 · 2022-07-22T16:09:03Z

@john-bodley Trino using official driver, not PyHive as Presto.
This mean that a lot of functions like get_table_names, get_view_names, handle_cursor, etc... are very different between 2 drivers. Further more, Trino and Presto are now 2 different projects, and going to different direction. So their features will be more and more different.

Tightly couple though 2 engine specs will make it hard to extends and introduce more bugs. Do u think we should revert #20152 to avoid inherit messy code from PrestoEngineSpec.

john-bodley · 2022-08-05T00:23:17Z

@dungdm93 I think in the future when Presto and Trino further diverge it makes sense, however at the moment there is a significant amount of the code that is the same and I sense adhering to the DRY principle (where possible makes sense).

ktmud

LGTM with a couple of non-blocking nits

ktmud · 2022-08-05T00:30:15Z

superset/db_engine_specs/presto.py

+            if cols:
+                full_table_name = table_name
+                if schema_name and "." not in table_name:
+                    full_table_name = "{}.{}".format(schema_name, table_name)


@ktmud this code is unchanged, i.e., it's now nested under the if cols: section and thus I would prefer not make changes to said code, at least in this PR.

ktmud · 2022-08-05T04:13:06Z

superset/db_engine_specs/presto.py

+                    latest_parts = tuple([None] * len(col_names))
+                metadata["partitions"] = {
+                    "cols": cols,
+                    "latest": dict(zip(col_names, latest_parts)),


I feel we should probably change the signature of latest_partition() itself to return this dict---would worth another PR.

For this code, it seems it can be simplified as:

Suggested change

"latest": dict(zip(col_names, latest_parts)),

"latest": {

col: latest_parts[i] if latest_parts else None

for i, col in enumerate(col_names)

}

dungdm93 · 2022-08-05T08:06:31Z

@john-bodley 2 projects are now different enough to have separated EngineSpec (at least in driver side).
You just save a few line of code by let TrinoEngineSpec inherit messy code from PrestoEngineSpec (and introducing bugs).

at the moment there is a significant amount of the code that is the same

The significant amount of the code you said is used to patch PyHive driver. Trino in the other hand strictly follow DB-API 2, so default implementations from BaseEngineSpec are good enough.
Other utility functions like estimate_statement_cost, query_cost_formatter, etc... may the same. However, those func can be reused by move to an utils.py if u wanna do DRY.

john-bodley · 2022-08-05T18:00:02Z

superset/db_engine_specs/presto.py

-
-            except DatabaseError:  # not a VIEW
+                return cls.fetch_data(cursor, 1)[0][0]
+            except SupersetDBAPIDatabaseError:   # not a VIEW


Sadly because we're using a raw connection, i.e., outside of the SQLAlchemy realm, the error comes from the underlying DBAPI, i.e., it's wrapped by SQLAlchemy, and thus we need to define the get_dbapi_exception_mapping mapping.

john-bodley · 2022-08-05T18:03:14Z

@ktmud thanks for the review. I've actually had to update the logic since your approval per #20729 (comment).

dungdm93 · 2022-08-06T02:55:03Z

@john-bodley an other way to reusing code is the mixing pattern.

villebro · 2022-08-09T12:28:38Z

@john-bodley I have to agree with @dungdm93 here - Over the years the Presto spec has caused me considerable grief due to the code being IMO messy and difficult to maintain. For that reason I've been very happy to see the work that @dungdm93 has done on the Trino spec, which IMO is of a great quality and easy follow and extend. So rather than double up on using the old Presto spec code, I'd rather see us gradually phasing it out.

WRT to compatibility of the drivers, I think they're fairly far away from each other right now. For instance, currently clicking on a table in SQL Lab with the recommended trino driver gives off a metadata fetching error (causes a 500 in the backend):

This is due to the latest_partition method in the Presto spec being incompatible with the trino driver. There are other similar issues; I'm opening a PR soon to fix query cancellation, which also works slightly differently, but there appear to be other differences, too.

If we really want to share code between the Presto and Trino specs I would recommend doing something similar to what we're doing for Postgres and Postgres-like databases, where we have the uncontroversial shared pieces in an abstract PostgresBaseEngineSpec which the actual implementations, like PostgresEngineSpec, HanaEngineSpec, RedshiftEngineSpec etc use. In this case, we could have a PrestoBaseEngineSpec into which we move the high quality shared code, and then leave the messy bits in the old PrestoEngineSpec, while gradually refactoring them into the shared PrestoBaseEngineSpec, all while adding proper and relevant tests for them.

dungdm93 · 2022-08-15T14:32:01Z

@villebro perhaps latest_partition issue is fixed in trinodb/trino-python-client#219. Waiting for next release.

john-bodley · 2022-09-02T03:28:34Z

Closing in favor of #21066.

pull-request-size bot added the size/M label Jul 16, 2022

john-bodley commented Jul 16, 2022

View reviewed changes

john-bodley requested review from betodealmeida and etr2460 July 16, 2022 03:15

john-bodley commented Jul 16, 2022

View reviewed changes

john-bodley mentioned this pull request Jul 19, 2022

DB API raises exceptions which are not derived from the PEP 249 database API specification trinodb/trino-python-client#199

Closed

john-bodley mentioned this pull request Aug 4, 2022

chore(requirements): Bump trino-python-client #20976

Merged

9 tasks

john-bodley force-pushed the john-bodley--trino-presto branch 3 times, most recently from 29247f7 to 93ec699 Compare August 4, 2022 23:19

john-bodley requested a review from ktmud August 5, 2022 00:25

john-bodley force-pushed the john-bodley--trino-presto branch from 93ec699 to 10ece47 Compare August 5, 2022 00:26

ktmud approved these changes Aug 5, 2022

View reviewed changes

john-bodley added 3 commits August 5, 2022 11:04

fix: Ensure Presto database engine spec correctly handles Trino

2537d58

Update presto.py

ad14e8a

Update presto.py

2287a92

john-bodley force-pushed the john-bodley--trino-presto branch 3 times, most recently from 2d29e64 to c6608a6 Compare August 5, 2022 17:58

john-bodley commented Aug 5, 2022

View reviewed changes

Update presto_tests.py

0ea8123

john-bodley force-pushed the john-bodley--trino-presto branch from c6608a6 to 0ea8123 Compare August 5, 2022 18:04

ktmud approved these changes Aug 6, 2022

View reviewed changes

dungdm93 mentioned this pull request Aug 12, 2022

fix(Trino): create PrestoBaseEngineSpec base class to share common code between Trino and Presto #21066

Merged

9 tasks

john-bodley closed this Sep 2, 2022

john-bodley mentioned this pull request Dec 16, 2022

chore: Re-add inheritance of Presto macros for Trino et al. #22435

Merged

9 tasks

mistercrunch deleted the john-bodley--trino-presto branch March 26, 2024 16:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Ensure Presto database engine spec correctly handles Trino #20729

fix: Ensure Presto database engine spec correctly handles Trino #20729

john-bodley commented Jul 16, 2022 •

edited

Loading

codecov bot commented Jul 16, 2022 •

edited

Loading

john-bodley Jul 16, 2022 •

edited

Loading

john-bodley Jul 16, 2022

john-bodley Jul 16, 2022

dungdm93 commented Jul 22, 2022 •

edited

Loading

john-bodley commented Aug 5, 2022

ktmud left a comment

ktmud Aug 5, 2022

john-bodley Aug 5, 2022 •

edited

Loading

ktmud Aug 5, 2022

dungdm93 commented Aug 5, 2022 •

edited

Loading

john-bodley Aug 5, 2022

john-bodley commented Aug 5, 2022

dungdm93 commented Aug 6, 2022

villebro commented Aug 9, 2022

dungdm93 commented Aug 15, 2022

john-bodley commented Sep 2, 2022

-                    "latest": dict(zip(col_names, latest_parts)),
+                    "latest": {
+                        col: latest_parts[i] if latest_parts else None
+                        for i, col in enumerate(col_names)
+                    }

fix: Ensure Presto database engine spec correctly handles Trino #20729

fix: Ensure Presto database engine spec correctly handles Trino #20729

Conversation

john-bodley commented Jul 16, 2022 • edited Loading

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

codecov bot commented Jul 16, 2022 • edited Loading

Codecov Report

john-bodley Jul 16, 2022 • edited Loading

Choose a reason for hiding this comment

john-bodley Jul 16, 2022

Choose a reason for hiding this comment

john-bodley Jul 16, 2022

Choose a reason for hiding this comment

dungdm93 commented Jul 22, 2022 • edited Loading

john-bodley commented Aug 5, 2022

ktmud left a comment

Choose a reason for hiding this comment

ktmud Aug 5, 2022

Choose a reason for hiding this comment

john-bodley Aug 5, 2022 • edited Loading

Choose a reason for hiding this comment

ktmud Aug 5, 2022

Choose a reason for hiding this comment

dungdm93 commented Aug 5, 2022 • edited Loading

john-bodley Aug 5, 2022

Choose a reason for hiding this comment

john-bodley commented Aug 5, 2022

dungdm93 commented Aug 6, 2022

villebro commented Aug 9, 2022

dungdm93 commented Aug 15, 2022

john-bodley commented Sep 2, 2022

john-bodley commented Jul 16, 2022 •

edited

Loading

codecov bot commented Jul 16, 2022 •

edited

Loading

john-bodley Jul 16, 2022 •

edited

Loading

dungdm93 commented Jul 22, 2022 •

edited

Loading

john-bodley Aug 5, 2022 •

edited

Loading

dungdm93 commented Aug 5, 2022 •

edited

Loading