-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gracefully handling db failure for /artifacts
API
#272
Gracefully handling db failure for /artifacts
API
#272
Conversation
services/data/db_utils.py
Outdated
|
||
Returns: | ||
`list` of artifacts filtered based on `attempt_id` | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eh you could as well use type annotations, they are already used pretty widely in metaflow-service
codebase. And I don't think this style of docstring is used anywhere else here.
Returns: | ||
`list` of filtered artifacts for the latest attempt | ||
""" | ||
def filter_artifacts_for_latest_attempt(artifacts:List[ArtifactRow]) -> List[ArtifactRow]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙈 ; I have changed to type hints but they are not the most accurate. This change has some nuance; List[ArtifactRow]
is, in fact, list having a dictionary representation ArtifactRow
Context
Users of metaflow after upgrading the service were getting
500
Errors when they try to access Data for task related to some flow; They managed to access metadata though. Slack Thread with more context : https://outerboundsco.slack.com/archives/C02116BBNTU/p1636723735291800The Debug logs from the MD service looked like the following :
What Is the Root Cause
We are not handing the exception from the database cleanly over here; In that line
artifacts
is aDBResponse
object which consists of a status code. IN this line we don't validate the response code of the DB's response and directly pass theartifacts
tofilter_artifacts_for_latest_attempt
. This is where the service is failing.Based on the logs shared, we are getting a log like the following
2021-12-10 08:12:10.755,ERROR:AsyncPostgresDB:global:Exception occured
; This is emitted by aiopg_exception_handling function. Ideally, we should be checking the status_code and then making the call tofilter_artifacts_for_latest_attempt
This PR tries to gracefully Handle such errors thrown by the Database.