Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry decorator #864

Merged
merged 11 commits into from
Feb 17, 2025
Merged

Retry decorator #864

merged 11 commits into from
Feb 17, 2025

Conversation

bkorycki
Copy link
Contributor

A retry decorator that SUTs can use on their evaluate method. It is designed around the idea that there are two types of exceptions: those that should only retried a handful of times and those that should be more persistently retried until success (e.g. rate limits).

Copy link

github-actions bot commented Feb 13, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Contributor

@rogthefrog rogthefrog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

tests/modelgauge_tests/test_retry_decorator.py Outdated Show resolved Hide resolved
MAX_BACKOFF = 60 # 1 minute in seconds


def retry(unacceptable_exceptions=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This naming was very confusing to me on first read. Would it still make sense to use a parameter like exceptions_to_retry or transient_exceptions or something similar?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like transient_exceptions! Updated.

Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress, and I like that you've applied it to some of the clients.

MAX_BACKOFF = 60 # 1 minute in seconds


def retry(unacceptable_exceptions=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same.

src/modelgauge/retry_decorator.py Show resolved Hide resolved
try:
return func(*args, **kwargs)
except unacceptable_exceptions as e:
# Keep retrying "unacceptable" exceptions for 1 day.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could definitely use some logging. Basic logging now is configured in run.py:cli. I'd say log anything you think will be interesting in trying to a) figure out what the problem is with a service, and b) verifying that our retry logic is sensible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok added.

@bkorycki bkorycki temporarily deployed to Scheduled Testing February 14, 2025 18:24 — with GitHub Actions Inactive
@bkorycki bkorycki temporarily deployed to Scheduled Testing February 14, 2025 18:24 — with GitHub Actions Inactive
Copy link
Contributor

@wpietri wpietri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much improved. Looks great!

@bkorycki bkorycki merged commit 5e4f5b1 into main Feb 17, 2025
4 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants