-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(oauth2): add support for trino #30081
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e08135d
to
a79dbe0
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #30081 +/- ##
===========================================
+ Coverage 60.48% 83.91% +23.42%
===========================================
Files 1931 534 -1397
Lines 76236 38788 -37448
Branches 8568 0 -8568
===========================================
- Hits 46114 32549 -13565
+ Misses 28017 6239 -21778
+ Partials 2105 0 -2105
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Looks like @betodealmeida was working on #30126 at the same time which might be related -- Beto, are you able to review this one? |
tagging @betodealmeida as he has most context to review this |
@@ -192,3 +192,8 @@ class OAuth2ClientConfigSchema(Schema): | |||
) | |||
authorization_request_uri = fields.String(required=True) | |||
token_request_uri = fields.String(required=True) | |||
request_content_type = fields.String( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be mistaken, but I believe this validation is only applied to the client_info contained within encrypted_extra
when provided by a user. Is the intent that this is where the request_content_type
is going to be provided, or is it going to be set as a default for the Trino engine spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think the idea is that DB engine specs can have a default content type (defined in the oauth2_token_request_type
class attribute), but it can be overridden on a per-database basis by setting it in the encrypted_extra
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I left a few comments but nothing blocking.
if config["request_content_type"] == "data": | ||
return requests.post(uri, data=req_body, timeout=timeout).json() | ||
return requests.post(uri, json=req_body, timeout=timeout).json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I think that the most common (standard?) workflow is to send form encoded data, and not JSON. It's just that the first OAuth2 implementation done for Superset targeted GSheets, and Google uses JSON instead of form encoded (it's the same for BigQuery, for example). But now changing the default would break existing databases, so we need to leave JSON as the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that data is likely the more common implementation. As we're still in early days, I feel it's well motivated to introduce a breaking change, as long as we have an UPDATING.md
explaining the required steps to get it to work again. This in the interest of having a clean/idiomatic API in the long term..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@villebro yeah, that sounds good to me.
@@ -192,3 +192,8 @@ class OAuth2ClientConfigSchema(Schema): | |||
) | |||
authorization_request_uri = fields.String(required=True) | |||
token_request_uri = fields.String(required=True) | |||
request_content_type = fields.String( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I think the idea is that DB engine specs can have a default content type (defined in the oauth2_token_request_type
class attribute), but it can be overridden on a per-database basis by setting it in the encrypted_extra
.
superset/db_engine_specs/trino.py
Outdated
if isinstance(instance, HttpError): | ||
return "error 401: b'Invalid credentials'" in str(instance) | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I like this approach!
You can simplify this to:
return isinstance(instance, HttpError) and "error 401: b'Invalid credentials'" in str(instance)
@pytest.fixture | ||
def oauth2_config() -> OAuth2ClientConfig: | ||
""" | ||
Config for GSheets OAuth2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config for GSheets OAuth2. | |
Config for Trino OAuth2. |
tried my luck fixing the merge conflict in the GitHub web editor, didn't pull the branch and run CI, so no pre-commit checks.... let's see if I get lucky, if not feel free to disregard my commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, other than that LGTM
if config["request_content_type"] == "data": | ||
return requests.post(uri, data=req_body, timeout=timeout).json() | ||
return requests.post(uri, json=req_body, timeout=timeout).json() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that data is likely the more common implementation. As we're still in early days, I feel it's well motivated to introduce a breaking change, as long as we have an UPDATING.md
explaining the required steps to get it to work again. This in the interest of having a clean/idiomatic API in the long term..
superset/db_engine_specs/base.py
Outdated
oauth2_authorization_request_uri: str | None = "" # pylint: disable=invalid-name | ||
oauth2_token_request_uri: str | None = "" | ||
oauth2_token_request_type: str | None = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: If we're defaulting to ""
, when might these still be None
?
@villebro @betodealmeida looks like we have a few nits and unresolved questions here, but the thread has gone quiet. If @joaoferrao doesn't respond, would anyone want to block this, or should someone smash that merge button? |
(note that code suggestions can be committed here by the person offering them, btw!) |
Sorry, I will take a look at the comments over the coming days. I’ve been rather busy so I haven’t had time to refocus attention again.
|
…d query execution
1bb9303
to
4011e69
Compare
@betodealmeida @villebro I have addressed the main comments, I believe. One question I still made when opening my PR is still open:
I don't know enough about superset, but should this be addressed in a separate PR? Basically, I had to do a lot of hacking to be able to add the connection without the |
I’ve been working on setting up OAuth2.0 integration for Trino with Okta in Superset, following the guidance from this PR (#30081). However, despite configuring authorization_request_uri, token_request_uri, and handling impersonation, I’m still encountering issues with 401 Unauthorized errors when trying to connect to Trino. trino_oauthDATABASE_OAUTH2_REDIRECT_URI: "http://localhost:8088/api/v1/database/oauth2/" {"connect_args":{"http_scheme":"https"}} Impersonate: true is in place as well and UI shows the correct configs. I've reviewed the setup and ensured that the OAuth tokens are obtained correctly from Okta, and all necessary permissions seem in place. Are there any additional configurations or specifics for Trino that this PR setup might require? Any guidance or insights on further troubleshooting would be greatly appreciated! Thank you! |
A bit more detail would help fine tune the issue. Can I ask if you, even if redundantly:
If you are sure you are getting the access token back from the IdP (after being requested to authenticate), then this is what is sent to trino: if backend_name == "trino" and username is not None:
connect_args["user"] = username
if access_token is not None:
http_session = requests.Session()
http_session.headers.update({"Authorization": f"Bearer {access_token}"})
connect_args["http_session"] = http_session |
@joaoferrao the first part works. I verified that. I am unclear about the username part. Can you shed some more light on that? |
Yeah, this was a chicken-and-egg problem. The token is stored associated with the database ID, so you can't do OAuth2 before the database is created. But for the database to be created, we require a successful connection, which in this case is impossible. I added some logic in #30071 for Snowflake so that if the test connection fails because OAuth2 is needed then the test connection command succeeds: Wasn't this logic enough for the same flow with Trino? |
@betodealmeida that is indeed very similar to what I did for me to test the feature, just that I didn't publish it, because I didn't delve enough to the Thanks for the feedback. I see all checks are passed and my PR is still approved. Do I assume correctly I should merge it? |
Take what I'm saying with a grain of salt, but the point of impersonation is for your superset user to be impersonated upon the request to trino. Whatever the username is for superset, is going to be used in the |
@joaoferrao I did verify the command trino --server='' --external-authentication=true with my okta username and that works on the browser but throws cannot impersonate user error in the end on the cluster. Tried with making that setting false still no luck. |
Does it or doesn't in work when you remove superset from the equation? The previous comment left me a bit confused about the scenario you described.
|
@joaoferrao thanks for all the pointers. Noticed that trino is using the email and superset uses okta token as the username hence the mismatch and 401. I will need to change the setup on either trino or superset side to get the authentication correct. |
Thanks for working on this, @joaoferrao! |
Hello @joaoferrao, do you have a solution to create the connection to Trino with oauth 2 enabled in a production environment ? |
SUMMARY
Under #27631 under #20300
It also fixes an issue not totally resolved here #29981, which is required for OAuth2 to work for trino.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Screen.Recording.2024-08-31.at.16.57.22.mov
TESTING INSTRUCTIONS
superset_docker_config.py
:ADDITIONAL INFORMATION
Need feedback with:
We still need to trigger this OAuth2 dance in, at least, 2 contexts (I don't know much about superset, possibility there are more):