-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow for larger scale submissions in Inference when moving from Preview to Available #176
Comments
@psyhtest Previously we had similar discussions and there were issues raised on this (on a different context) when a single model is split across multiple GPUs and hence the performance per accelerator might be better on a larger scale system. So, a rule change for this might be tricky - but may be the WG can agree on the "similarity" of the available and preview systems? |
I agree how Offline may be affected, but could Server latency constraints counterbalance that? |
The same issue can happen for Server scenario too right? But if the model is not split across multiple GPUs may be we can do a rule proposal. |
If the preview submission was on say 4 accelerators and available submission is on 6 accelerators and as well as say 2 accelerators, and in both cases if the performance per accelerator is greater than that of the preview system, then I think the 4 accelerator submission may not be needed. Also, another proposal can be doing just offline scenario (may be even open) for the same number of accelerators if a larger scale system is submitted as available. |
Instead of an amendment asking for permissions before submission, it might be worth having a change in the main rules itself to permit an available submission if both these conditions are satisfied:
|
Another scenario to consider: having done a Preview submission with an old Available server (e.g. v5) equipped with new Preview accelerators, a submitter may want to do an Available submission with a newer server (e.g. v6) equipped with now Available accelerators. |
WG notes: @psyhtest to draft PR |
@psyhtest @mrmhodak @mrasquinha-g The deadline is close by. Can Inference submitters assume that this rule change applies to v4.1 and thus save some effort in their preview->available submissions? What is the conclusion? |
@ashwin I think it is better if the submitter requests and get a waiver from the WG for v4.1. |
A number of Preview systems in MLPerf Inference v4.0 used fewer cards than would be typical in production due to a limited availability of cards at the time. Rather than benchmarking the systems with exactly the same, atypical number of cards as in Preview, it would be desirable to benchmark them in a more typical configuration, with a higher number of cards. Of course, for Available submissions the performance per accelerator would still need to be demonstrated to be equal or better than in Preview submissions.
We have a similar provision in the submission policies, but at the moment it only covers Training:
The text was updated successfully, but these errors were encountered: