Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escape url parameters in sqlalchemy connection strings #235

Merged
merged 1 commit into from
Oct 5, 2022

Conversation

mdesmet
Copy link
Contributor

@mdesmet mdesmet commented Sep 19, 2022

Currently it is not possible to create a Trino url in sqlalchemy urls containing characters like &, ' ' or @. The values should be properly encoded and decoded.

This PR provides a helper method that properly encodes the passed values and ensures that values are properly decoded before passing down as dbapi.connect parameters.

>>> URL(
...                         user="user@test.org/my_role",
...                         password="pass /*&",
...                         host="localhost",
...                         session_properties={"query_max_run_time": "1d"},
...                         http_headers={"trino": 1},
...                         extra_credential=[
...                             ("user1@test.org/my_role", "user2@test.org/my_role"),
...                             ("user3@test.org/my_role", "user36@test.org/my_role")],
...                         experimental_python_types=True,
...                         client_tags=["1 @& /\"", "sql"],
...                     )
'trino://user%40test.org%2Fmy_role:pass %2F*&@localhost:8080?source=trino-sqlalchemy&session_properties=%7B%22query_max_run_time%22%3A+%221d%22%7D&http_headers=%7B%22trino%22%3A+1%7D&extra_credential=%5B%28%27user1%40test.org%2Fmy_role%27%2C+%27user2%40test.org%2Fmy_role%27%29%2C+%28%27user3%40test.org%2Fmy_role%27%2C+%27user36%40test.org%2Fmy_role%27%29%5D&client_tags=%5B%221+%40%26+%2F%5C%22%22%2C+%22sql%22%5D&experimental_python_types=true'

Release notes

( ) This is not user-visible and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

* Support sqlalchemy url generation through `trino.sqlalchemy.URL` helper method

@cla-bot cla-bot bot added the cla-signed label Sep 19, 2022
@mdesmet mdesmet force-pushed the bug/sqlalchemy-urlencode branch 2 times, most recently from b934f2d to 3ac3aab Compare September 20, 2022 06:38
@@ -19,17 +20,29 @@ def setup(self):
"url, expected_args, expected_kwargs",
[
(
make_url("trino://user@localhost"),
make_url(trino_url(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's worth to keep existing tests and create new ones with URL method next to them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can simply assert that the string being generated is equal to the above string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

key: Optional[str] = None,
) -> str:
"""
Composes a SQLAlchemy connect string from the given database connection
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Composes a SQLAlchemy connect string from the given database connection
Composes a SQLAlchemy connection string from the given database connection

from sqlalchemy.engine.url import _rfc_1738_quote # noqa


def _url(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some parameters are missing like verify or redirect_handler

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OAuth is not yet supported in the url. I think we can leave that for another PR. I have added verify

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just please create a follow-up issue for it after merging, Kerberos doesn't seem supported as well, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct I supported only the fields that are currently parsed in the sqlalchemy connection string (and added verify).

port=8080,
catalog="system",
user="user@test.org/my_role",
auth=BasicAuthentication("user@test.org/my_role", "pass /*&"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add test for other supported auth methods: JWT and Certificate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added both tests.

from sqlalchemy.engine.url import _rfc_1738_quote # noqa


def _url(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, just please create a follow-up issue for it after merging, Kerberos doesn't seem supported as well, right?

@mdesmet mdesmet force-pushed the bug/sqlalchemy-urlencode branch 2 times, most recently from c81c1db to 54d8603 Compare September 26, 2022 11:51
@mdesmet
Copy link
Contributor Author

mdesmet commented Sep 26, 2022

@hovaesco: PTAL

@mdesmet
Copy link
Contributor Author

mdesmet commented Sep 27, 2022

@hashhar, @ebyhr : Please have a look. This is important for Galaxy users to be able to use sqlalchemy.

README.md Outdated Show resolved Hide resolved
@mdesmet
Copy link
Contributor Author

mdesmet commented Oct 3, 2022

@hashhar : PTAL


if "extra_credential" in url.query:
kwargs["extra_credential"] = literal_eval(url.query["extra_credential"])
kwargs["extra_credential"] = literal_eval(unquote_plus(url.query["extra_credential"]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unsafe to do literal_eval on user input. Might not be a concern for the client itself but any tool using the client then becomes impacted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that literal_eval only takes into account literals. So it is not actually evaluating arbitrary code only structural and literal types.

def literal_eval(node_or_string):
    """
    Safely evaluate an expression node or a string containing a Python
    expression.  The string or node provided may only consist of the following
    Python literal structures: strings, bytes, numbers, tuples, lists, dicts,
    sets, booleans, and None.
    """

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's safe from executing arbitrary code but I can still crash the Python VM with my input. Try "1 + 100"*1000000 as input.

My point is there are alternative ways to serialize this and we should use them instead of doing the easy thing.

trino_url += f"&http_headers={quote_plus(json.dumps(http_headers))}"

if extra_credential is not None:
trino_url += f"&extra_credential={quote_plus(repr(extra_credential))}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using repr for serialization seems like a shortcut.

Copy link
Contributor Author

@mdesmet mdesmet Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json.dumps converts tuples into array. that's why we can't use it there. I have added a comment to explain the reasoning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we can write our own value class over this list of tuple, serialize the value class, deserialize value class and provide a method on value class to convert back into list of tuple.

@mdesmet
Copy link
Contributor Author

mdesmet commented Oct 5, 2022

@hashhar : PTAL. I have not added another class as it is a simple one liner using the typical json.loads recipe but looping through it and instantiating a tuple based on the array.

kwargs["extra_credential"] = [
    tuple(extra_credential) for extra_credential in json.loads(unquote_plus(url.query["extra_credential"]))
]

@hashhar hashhar merged commit cbda49c into trinodb:master Oct 5, 2022
@hashhar hashhar mentioned this pull request Oct 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants