-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tasks/WP-190: Handle concurrency with Tapis OAuth Token Refresh #932
Conversation
* Fix and enable shared workspaces unit test * Remove submodule added in a previous PR
initial commit
initial commit
…l into tasks/WP-190-Tapis-Mutex
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #932 +/- ##
==========================================
- Coverage 65.40% 65.25% -0.16%
==========================================
Files 437 438 +1
Lines 12653 12681 +28
Branches 2667 2636 -31
==========================================
- Hits 8276 8275 -1
- Misses 4140 4169 +29
Partials 237 237
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good I think
Co-authored-by: Sal Tijerina <r.sal.tijerina@gmail.com>
I haven’t had the chance to test it yet, but I appreciate the detailed PR description, particularly the 'Possible Solutions' section 💯 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
server/portal/apps/auth/models.py
Outdated
client.refresh_tokens() | ||
except Exception: | ||
logger.exception('Tapis Token refresh failed') | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On failure, perhaps we should call logout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will test it and share info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current logic is all in a model, can't do http redirect or control view responses from here. Have to do from view. I setup an custom exception and handled it in Base View to send 401 back to client. On testing, by forcing an error - it turned the 401 to redirect to tapis oauth, but that failed due to CORS policy issue, I have to check if this is local setup or something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came to that realization as well today. I tried another solution with DesignSafe, which is to put the refresh logic in a middleware. In CEP we originally moved that logic from middleware to the client()
method, but I think the solution you propose here might solve the original issue there?
Haven't tested yet, what are your thoughts?
https://github.com/DesignSafe-CI/portal/blob/task/DES-2702--tapis-v3-oauth/designsafe/apps/auth/middleware.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstijerina - sorry for delay in response, I missed this note.
I looked at the code in that branch. It looks good, one comment on overall integration:
- Should you do this also for logout?
logout(request)
return HttpResponseRedirect(reverse('designsafe_auth:login'))
Some testing aspects:
- Behavior on xhr requests when tapis token expiry fails. If xhr does not handle 302 cleanly, some extra check and redirect might be needed.
- Walking through the code, it is protected from infinite loop through this(which is good). If refresh fails, goes to login, which has to authenticate with tapis, if authentication works, this middleware immediately returns (because there is no expiry) and move away from this middleware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use do this also for logout?
logout(request)
return HttpResponseRedirect(reverse('designsafe_auth:login'))
Yes, thanks. Added:
https://github.com/DesignSafe-CI/portal/blob/task/DES-2709--v3-apps-views/designsafe/apps/auth/middleware.py
Behavior on xhr requests when tapis token expiry fails. If xhr does not handle 302 cleanly, some extra check and redirect might be needed.
Can you expand more on this part? Where would tapis token expiry fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavior on xhr requests when tapis token expiry fails. If xhr does not handle 302 cleanly, some extra check and redirect might be needed.
Can you expand more on this part? Where would tapis token expiry fail?
I meant - "Behavior on xhr requests when tapis token expires, and the refresh fails - this will hit the logout code and send a 302 back to client. If javascript side of response handling does not handle 302 cleanly (page rendering after 302, etc), may be extra logic need be needed to check for 302 status and specific error type(token expired) and then setting location href to logout".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rstijerina - regarding this PR, if middleware is the right place for auth and if it is working in designsafe, should I do the same here and start testing. What is your opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The solution I have in DesginSafe does work, here are example logs from a refresh that just occurred for me:
des_django | [DJANGO] INFO 2024-04-12 14:28:57,764 middleware designsafe.apps.auth.middleware.process_request:49: Tapis OAuth token expired for user sal. Refreshing token
des_django | [DJANGO] INFO 2024-04-12 14:28:57,769 middleware designsafe.apps.auth.middleware.process_request:49: Tapis OAuth token expired for user sal. Refreshing token
des_django | [DJANGO] INFO 2024-04-12 14:28:57,775 middleware designsafe.apps.auth.middleware.process_request:61: Refreshing Tapis OAuth token
des_django | [DJANGO] INFO 2024-04-12 14:29:02,626 middleware designsafe.apps.auth.middleware.process_request:72: Token updated by another request. Refreshing token from DB.
It might not be fool-proof though, and could definitely use more testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could talk about best place for token refresh in the next infra scrum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! And tested well
Re-working this PR as middleware similar to https://github.com/DesignSafe-CI/portal/blob/task/DES-2709--v3-apps-views/designsafe/apps/auth/middleware.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
* Handle validation for FORK job type * Prettier fix * Remove unrelated fix * Adjust Yup validation and initialValues
Bumps [sqlparse](https://github.com/andialbrecht/sqlparse) from 0.4.4 to 0.5.0. - [Changelog](https://github.com/andialbrecht/sqlparse/blob/master/CHANGELOG) - [Commits](andialbrecht/sqlparse@0.4.4...0.5.0) --- updated-dependencies: - dependency-name: sqlparse dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chandra Y <cyemparala@tacc.utexas.edu>
Bumps [idna](https://github.com/kjd/idna) from 3.4 to 3.7. - [Release notes](https://github.com/kjd/idna/releases) - [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.rst) - [Commits](kjd/idna@v3.4...v3.7) --- updated-dependencies: - dependency-name: idna dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chandra Y <cyemparala@tacc.utexas.edu>
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 3.0.1 to 3.0.3. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst) - [Commits](pallets/werkzeug@3.0.1...3.0.3) --- updated-dependencies: - dependency-name: werkzeug dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [braces](https://github.com/micromatch/braces) from 3.0.2 to 3.0.3. - [Changelog](https://github.com/micromatch/braces/blob/master/CHANGELOG.md) - [Commits](micromatch/braces@3.0.2...3.0.3) --- updated-dependencies: - dependency-name: braces dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chandra Y <cyemparala@tacc.utexas.edu>
Bumps [urllib3](https://github.com/urllib3/urllib3) from 1.26.18 to 1.26.19. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/1.26.19/CHANGES.rst) - [Commits](urllib3/urllib3@1.26.18...1.26.19) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chandra Y <cyemparala@tacc.utexas.edu>
Bumps [ws](https://github.com/websockets/ws) from 7.5.7 to 7.5.10. - [Release notes](https://github.com/websockets/ws/releases) - [Commits](websockets/ws@7.5.7...7.5.10) --- updated-dependencies: - dependency-name: ws dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chandra Y <cyemparala@tacc.utexas.edu>
Bumps [certifi](https://github.com/certifi/python-certifi) from 2023.7.22 to 2024.7.4. - [Commits](certifi/python-certifi@2023.07.22...2024.07.04) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [django](https://github.com/django/django) from 4.2.11 to 4.2.14. - [Commits](django/django@4.2.11...4.2.14) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [setuptools](https://github.com/pypa/setuptools) from 68.2.2 to 70.0.0. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](pypa/setuptools@v68.2.2...v70.0.0) --- updated-dependencies: - dependency-name: setuptools dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chandra Y <cyemparala@tacc.utexas.edu>
* Allow docs to be behind login * Lint fix * Customize url path * Fix unit test * conditionally add urls.py
* TAS: Country is no longer available. * Adjust tests
Bumps [django](https://github.com/django/django) from 4.2.14 to 4.2.15. - [Commits](django/django@4.2.14...4.2.15) --- updated-dependencies: - dependency-name: django dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Chandra Y <cyemparala@tacc.utexas.edu>
Bumps [twisted](https://github.com/twisted/twisted) from 23.10.0 to 24.7.0. - [Release notes](https://github.com/twisted/twisted/releases) - [Changelog](https://github.com/twisted/twisted/blob/trunk/NEWS.rst) - [Commits](twisted/twisted@twisted-23.10.0...twisted-24.7.0) --- updated-dependencies: - dependency-name: twisted dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Working version of execution system changes + get available systems + handle client side scenarios * Add tests, fix issues found from unit tests * Fix job history and also lint * Use one attribute for exec systems instead of two * Fix formatting * Sort system list in UI * Address code review comments * Adjusted exec system label text and fixed jest tests * Working version of execution system changes + get available systems + handle client side scenarios * Add tests, fix issues found from unit tests * Fix job history and also lint * Use one attribute for exec systems instead of two * Fix formatting * Sort system list in UI * Address code review comments * Adjusted exec system label text and fixed jest tests * Fix bug related to job history execution system * Merge fix * Redo the fix on job history * Rework commit for exec and allocation * Prettier, lint and test fix * Fix merge issues * Make exec system dependent on allocation * Fix job history display of system name * Handle max memory on a system * Validation for execSystemId and fix express VM job submission --------- Co-authored-by: Sal Tijerina <r.sal.tijerina@gmail.com>
initial commit
initial commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!!
Overview
When Tapis OAuth token expires for a user and multiple tapis api calls are requested concurrently (example: page refresh), all of them send requests to Tapis to refresh token. This duplicate request is a waste of resource and slows performance.
The solution is to only make one request per user when token expires. Any solution should work with requests distributed across multiple processes.
Possible Solutions
django select_for_update
django-db-mutex
This PR uses select_for_update since is readily available in django and has waits.
Update
PR now uses middleware to process (similar to solution used in DesignSafe)
Related
Changes
Testing
Expire token
Test cases:
Whole page refresh - triggers multiple concurrent requests with expired token:
** Only one "Refreshing Tapis OAuth token" message is seen, rest are all waiting for row lock
** 2 of waiting transactions, get the update access token info and process the request. And see log message
Refreshing token from DB
for those 2 waiting transactionsSingle request
** Only one "Refreshing Tapis OAuth token" message is seen, rest are all waiting for row lock
** No other request acquired row lock because they already saw the non-expired token
Basic UI Sanity Tests
Ran through basic UI sanity tests to ensure acquiring client does not fail
UI
Notes