-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry decorator #864
Retry decorator #864
Conversation
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
src/modelgauge/retry_decorator.py
Outdated
MAX_BACKOFF = 60 # 1 minute in seconds | ||
|
||
|
||
def retry(unacceptable_exceptions=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming was very confusing to me on first read. Would it still make sense to use a parameter like exceptions_to_retry
or transient_exceptions
or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like transient_exceptions
! Updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress, and I like that you've applied it to some of the clients.
src/modelgauge/retry_decorator.py
Outdated
MAX_BACKOFF = 60 # 1 minute in seconds | ||
|
||
|
||
def retry(unacceptable_exceptions=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, same.
src/modelgauge/retry_decorator.py
Outdated
try: | ||
return func(*args, **kwargs) | ||
except unacceptable_exceptions as e: | ||
# Keep retrying "unacceptable" exceptions for 1 day. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could definitely use some logging. Basic logging now is configured in run.py:cli
. I'd say log anything you think will be interesting in trying to a) figure out what the problem is with a service, and b) verifying that our retry logic is sensible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much improved. Looks great!
A retry decorator that SUTs can use on their
evaluate
method. It is designed around the idea that there are two types of exceptions: those that should only retried a handful of times and those that should be more persistently retried until success (e.g. rate limits).