Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix prepared statement handling #242

Merged
merged 1 commit into from
Oct 3, 2022

Conversation

hashhar
Copy link
Member

@hashhar hashhar commented Sep 30, 2022

Description

The prepared statement handling code assumed that for each query we'll
always receive some non-empty response even after the initial response
which is not a valid assumption.

This assumption worked because earlier Trino used to send empty fake
results even for queries which don't return results (like PREPARE and
DEALLOCATE) but is now invalid with
trinodb/trino@bc794cd.

The other problem with the code was that it leaked HTTP protocol details
into dbapi.py and worked around it by keeping a deep copy of the request
object from the PREPARE execution and re-using it for the actual query
execution.

The new code fixes both issues by processing the prepared statement
headers as they are received and storing the resulting set of active
prepared statements on the ClientSession object. The ClientSession's set
of prepared statements is then rendered into the prepared statement
request header in TrinoRequest. Since the ClientSession is created and
reused for the entire Connection this also means that we can now
actually implement re-use of prepared statements within a single
Connection.

Non-technical explanation

Fix errors when using prepared statements with Trino >= 398.

Release notes

( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

* Fix errors when using prepared statements with Trino servers newer than 398.

@cla-bot cla-bot bot added the cla-signed label Sep 30, 2022
@hashhar
Copy link
Member Author

hashhar commented Sep 30, 2022

FYI @mdesmet - note that this is incomplete code - the commit for now just cleans up the HTTP portion (client.py).

dbapi.py changes I'm still working on - but created this to receive some early feedback.
In dbapi.py with the new changes I think I don't need to do anything at all other than just executing the PREPARE, EXECUTE and DEALLOCATE since the headers will all be persisted as long as we use same Cursor object.

In a future PR I plan to add additional parameter if user wants prepared statements to be re-used.

@hashhar hashhar force-pushed the hashhar/prepared-statement-fix branch 2 times, most recently from b2ff5ff to b529df9 Compare September 30, 2022 20:35
@@ -881,21 +881,6 @@ def __call__(self, *args, **kwargs):
return http_response


def test_trino_result_response_headers():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is redundant now since the actual user-facing functionality is that additional_http_headers works by merging passed headers to existing headers which is tested in test_trino_query_response_headers already.

The use of this test was to ensure that headers were travelling between TrinoQuery and TrinoResult since older prepared statement code basically passed-through the prepare headers through all layers - which was very simple solution but unclean and couples things together (and also buggy as Trino 398 shows).

@hashhar hashhar force-pushed the hashhar/prepared-statement-fix branch from b529df9 to cf6af76 Compare September 30, 2022 21:09
trino/dbapi.py Outdated
@@ -295,58 +295,42 @@ def warnings(self):
return self._query.warnings
return None

def _new_request_with_session_from(self, request):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is better than copy.deepcopy but I still hate it.

I think the way we try to make prepared statement work needs improvement.
Currently it's modelled as:

  • A Connection creates a ClientSession - makes total sense - all persistent information for a connection is maintained using the session (roles, prepared statements, session properties, transactions)
  • A Cursor is created from Connection which either gets it's own TrinoRequest or re-uses one from an existing transaction. The TrinoRequest inherits a reference to the Connection's ClientSession. This doesn't make too much sense. It assumes a Cursor will only execute a query at a time (reasonable) but also leaks HTTP protocol information to the Cursor.
  • Cursor.execute(sql, params) is called which creates a TrinoQuery using the Cursor's TrinoRequest.
  • Any operation on the cursor then affects the TrinoRequest - which means we should not share TrinoRequest objects.

These are the current rules, we break them very badly:

  • In Cursor.execute(sql, params) if params are provided we need to execute PREPARE statement FROM sql
  • To execute something we need a TrinoRequest - so we re-use the one from current Cursor - bad bad because any other future query or running query would get affected
  • We workaround this by deepcopy. This copies too much and means that the ClientSession is no longer shared.
  • We then workaround this by creating new request but re-using the ClientSession from old one (to persist headers etc.) What if we miss other things that need to be re-used? Things will break.
  • We then execute our PREPARE with copied request, EXECUTE using original request and then DEALLOCATE using copied request.

I'll create an issue with some pseudocode class outline mentioning what refactoring I want to do to improve this - we can discuss other possible improvements there and improve what is honestly a very organically grown code which does not have enough structure to safeguard us from doing mistakes or introducing changes with confidence that we only affect what we changed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Cursor is created from Connection which either gets it's own TrinoRequest or re-uses one from an existing transaction

Does it mean that for each cursor object a new TrinoRequest is created? Should it be somehow reused, you mentioned that is re-uses one from an existing transaction but where it's handled?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply call self.connection._create_request()?

The ClientSession is already shared and the prepared statement headers are also persisted in the ClientSession.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that for each cursor object a new TrinoRequest is created? Should it be somehow reused, you mentioned that is re-uses one from an existing transaction but where it's handled?

Creating new TrinoRequest per cursor makes sense since the TrinoRequest tracks query state (nextUri, stats etc.) - it must not be re-used across queries where possible. Transactions re-use it here -

if self.transaction is not None:
request = self.transaction.request
else:
request = self._create_request()
(but it's safe since even with transactions a Cursor can only be executing one query at a time).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply call self.connection._create_request()?

The ClientSession is already shared and the prepared statement headers are also persisted in the ClientSession.

Good catch. Much simpler. Let me try if that works (it should).

@hashhar hashhar marked this pull request as ready for review September 30, 2022 21:12
@hashhar
Copy link
Member Author

hashhar commented Sep 30, 2022

BTW the existing code was buggy even before the Trino change because it didn't get to inspect the initial response from the coordinator because that response's headers were not being set to TrinoResult because the setting of response headers was handled in fetch which only sees the 2nd response and onward.

@hashhar hashhar force-pushed the hashhar/prepared-statement-fix branch 3 times, most recently from 49e051e to e6d588f Compare September 30, 2022 21:34
trino/client.py Show resolved Hide resolved
trino/client.py Show resolved Hide resolved
trino/dbapi.py Outdated
@@ -295,58 +295,42 @@ def warnings(self):
return self._query.warnings
return None

def _new_request_with_session_from(self, request):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Cursor is created from Connection which either gets it's own TrinoRequest or re-uses one from an existing transaction

Does it mean that for each cursor object a new TrinoRequest is created? Should it be somehow reused, you mentioned that is re-uses one from an existing transaction but where it's handled?

trino/dbapi.py Outdated Show resolved Hide resolved
trino/dbapi.py Outdated Show resolved Hide resolved
trino/dbapi.py Outdated
@@ -295,58 +295,42 @@ def warnings(self):
return self._query.warnings
return None

def _new_request_with_session_from(self, request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply call self.connection._create_request()?

The ClientSession is already shared and the prepared statement headers are also persisted in the ClientSession.

@hashhar
Copy link
Member Author

hashhar commented Oct 3, 2022

applied comments @mdesmet @hovaesco . Please take a look again.

Copy link
Contributor

@mdesmet mdesmet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simplifications in this PR.

@hashhar hashhar force-pushed the hashhar/prepared-statement-fix branch from cc9199d to 1b47a3c Compare October 3, 2022 12:32
@hashhar
Copy link
Member Author

hashhar commented Oct 3, 2022

@ebyhr @hovaesco Added an explicit test for prepared statement + also verified that we don't leak prepared statements in the ClientSession.

@hashhar hashhar force-pushed the hashhar/prepared-statement-fix branch from 1b47a3c to fcfb321 Compare October 3, 2022 12:35
@mdesmet
Copy link
Contributor

mdesmet commented Oct 3, 2022

Python: analysis.........................................................Failed

  • hook id: flake8
  • exit code: 1

tests/integration/test_dbapi_integration.py:1071:1: E302 expected 2 blank lines, found 1

Python: types............................................................Passed
Error: Process completed with exit code 1.

That's why I quite like the pre-commit checks to be present as git hooks. Note that if you haven't explicitly installed the pre-commit git hook (running pre-commit install, it wouldn't run on commit. Maybe we should consider activating this? It would save us some CI minutes and a failing build.

cc: @hovaesco, @hashhar

The prepared statement handling code assumed that for each query we'll
always receive some non-empty response even after the initial response
which is not a valid assumption.

This assumption worked because earlier Trino used to send empty fake
results even for queries which don't return results (like PREPARE and
DEALLOCATE) but is now invalid with
trinodb/trino@bc794cd.

The other problem with the code was that it leaked HTTP protocol details
into dbapi.py and worked around it by keeping a deep copy of the request
object from the PREPARE execution and re-using it for the actual query
execution.

The new code fixes both issues by processing the prepared statement
headers as they are received and storing the resulting set of active
prepared statements on the ClientSession object. The ClientSession's set
of prepared statements is then rendered into the prepared statement
request header in TrinoRequest. Since the ClientSession is created and
reused for the entire Connection this also means that we can now
actually implement re-use of prepared statements within a single
Connection.
@hashhar hashhar force-pushed the hashhar/prepared-statement-fix branch from fcfb321 to 7e6edff Compare October 3, 2022 14:27
@hashhar hashhar merged commit d5f779b into trinodb:master Oct 3, 2022
@hashhar hashhar deleted the hashhar/prepared-statement-fix branch October 3, 2022 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants