-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarify allowed changes in the system scale for Inference #178
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -732,9 +732,18 @@ An _Available_ software component must be well supported for general use. For op | |
|
||
#### Preview Systems | ||
|
||
A _Preview_ system is a system which did not qualify as an _Available_ system as of the previous MLPerf submission date, but will qualify in the next submission after 140 days of the current submission date, or by the next MLPerf submission date, whichever is more, and which the submitter commits to submitting as an _Available_ system by that time. If it is not submitted in that submission round with equal or better performance (allowing for noise), the _Preview_ benchmark will be marked as invalid. A _Preview_ submission must include performance on at least one benchmark which will be considered _MLPerf Compatible_ (xref:MLPerf_Compatibility_Table.adoc[see the MLPerf Compatibility Table]) in the upcoming round where transition to _Available_ is made (consult SWG for Benchmark Roadmap). On each of the benchmarks that are previewed and are _Compatible_, the _Available_ submission must show equal or better performance (allowing for noise, for any changes to the benchmark definition) on all systems for Inference and across at least the smallest and the largest scale of the systems used for _Preview_ submission on that benchmark for Training (e.g. _Available_ Training submissions can be on scales smaller than the smallest and larger than the largest scale used for _Preview_ submission). For submissions accompanied by power measurements, "equal or better" must use power-normalized performance rather than absolute performance. | ||
|
||
* Training: For an _Available_ system that is larger than the _Preview_ system, absolute performance must be better. For an _Available_ system that is smaller than the _Preview_ system, efficiency (time-to-train * number of chips) must be better. | ||
A _Preview_ system is a system which did not qualify as an _Available_ system as of the previous MLPerf submission date, but will qualify in the next submission after 140 days of the current submission date, or by the next MLPerf submission date, whichever is more, and which the submitter commits to submitting as an _Available_ system by that time. If it is not submitted in that submission round with equal or better performance (allowing for noise), the _Preview_ benchmark will be marked as invalid. A _Preview_ submission must include performance on at least one benchmark which will be considered _MLPerf Compatible_ (xref:MLPerf_Compatibility_Table.adoc[see the MLPerf Compatibility Table]) in the upcoming round where transition to _Available_ is made (consult SWG for Benchmark Roadmap). | ||
|
||
On each of the benchmarks that are previewed and are _Compatible_, the _Available_ submission must show _equal or better performance_ than the _Preview_submission_, allowing for noise, for changes in the benchmark definition, or for changes in the system scale (defined as the number of system components principally determining performance e.g. accelerator chips): | ||
* Training: An _Available_ submission can be on a system larger than the largest system used for _Preview_, or smaller than the smallest system used for _Preview_: | ||
* For an _Available_ system that is larger than the _Preview_ system, absolute performance must be equal or better. | ||
* For an _Available_ system that is smaller than the _Preview_ system, efficiency (time-to-train * number of accelerators) must be equal or better. | ||
* Performance must be equal or better at least across the smallest and the largest of the systems used for _Preview_. | ||
* Inference without Power measurements: An _Available_ submission can be on a system larger than the largest system used for _Preview_. | ||
* For an _Available_ system that is larger than the _Preview_ system, performance per accelerator must be equal or better. | ||
* Inference with Power measurements: An _Available_ submission must be on a system of the same scale as used for _Preview_. | ||
* Power-normalized performance (not absolute performance) must be equal or better. | ||
Any other changes must be approved by the relevant Working Group prior to submission. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rule for Training submissions along with Power is missing. I recommend keeping this line under the Training section- "For submissions accompanied by power measurements, "equal or better" must use power-normalized performance rather than absolute performance." |
||
|
||
If none of the _Preview_ benchmarks are MLPerf _Compatible_ in the upcoming round where transition to Available is made in a rare event, a submitter may get their performance validated in the upcoming round by making a submission on the old/retired benchmark to the Results WG during review period (such a submission will not show up on the Results table but will only be used by the Results WG to validate a past Preview Submission). | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the original rule implies an
AND
condition and notOR
right? And I thinkAND
makes sense and the same can be applied for inference too. i.e., if the preview submission was on say 1,2,4 and 8 accelerators, a submitter can do an Available submission on 1 and 8 accelerators or even 1 and 16 accelerators. But the smallest and the largest scales must be submitted. This ensures that both scaling up and scaling down of the performance.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this, we should still keep -
"across at least the smallest and the largest scale of the systems used for Preview submission on that benchmark (e.g. Available Training submissions can be on scales smaller than the smallest and larger than the largest scale used for Preview submission)"