General MLPerf Submission Rules v0.2

Table of Contents

1. Basics
2. Review committee
3. Operating principles
4. Schedule
- 4.1. Schedule of submission rounds
- 4.2. Single submission round schedule
5. Submission
6. Review
7. Publication
8. After publication
- 8.1. Terms of use
- 8.2. Issues discovered after publication
9. Appendices

1. Basics

Unless otherwise stated all rules apply to closed and open.

2. Review committee

The MLPerf submission, review, and publication process is overseen by a review committee.

2.1. Structure

The review committee consists of the following people:

The chair or chairs of the following working groups:
- [ Inference or training, whichever is being submitted ] results
- [ Inference or training, whichever is being submitted ] submitters
- [ Inference or training, whichever is being submitted ] special topics
- Power
Until such time a permanent host organization exists:
- The general chair
When a permanent host organization exists:
- The executive director
- The chairperson of the board
For v0.5 only, and only for the review process: One representative of each accelerator vendor (as opposed an OEM or system vendor) who submitted and is not otherwise represented above, provide that in the opinion of the majority of the above said representative is a significant contributor to MLPerf through active participation in one or more working groups.

If representatives of a single organization, its parents, subsidiaries, affiliates, or contractors hold multiple positions on this list, only one such representative is eligible to serve on the review committee.

The review committee is chaired by the relevant results working group chair, e.g. training results working chair during a training submission review. For this reason, the results working groups may never have multiple chairs.

2.2. Meetings

The review committee makes decisions during properly scheduled meetings. The base meeting schedule is dictated by the review process. The review committee chair may schedule additional meetings during the review process in the course of any meeting or by emailing all committee members (e.g. mlperf-chairs mailing list) at least 24 hours in advance.

The review committee has a quorum if it includes an eligible chair and at four other eligible members.

Review committee meetings are typically held at a specific location determined by the review committee chair and in-person attendance is encouraged, but virtual attendance is to be enabled and allowed.

Review committee meetings are open only to the review committee and all submitters in the current submission round.

2.3. Agenda and decisions

The review committee agenda is set by, and decisions are tracked, using issues filed against the Github submission repo. In general, issues must be filed as dictated by the review process schedule. Exceptions are discouraged but may be allowed during a meeting by a vote of the review committee. In general, issues are decided in the order filed, but the chair may choose a different order if circumstances warrant.

The review committee should attempt to decide issues through discussion and grudging consensus whenever possible. However, if the review committee and all submitters are unable to reach a grudging consensus, the review committee will vote to decide the issue.

Review committee votes are determined by a majority of eligible review committee members attending a meeting. In the event of a tie, the committee chair has the casting vote. Votes are initiated by the chair, and are cast openly and may be cast verbally or using a shared spreadsheet or other voting software.

The review committee operates on balance of interests rather than by avoiding conflict of interest. Members may cast votes on all matters, including those directly affecting benchmark submissions made by their organization, as a practical response to the fact that competitors are also on the review committee.

2.4. Confidential and not precedent setting

Because the submission round is confidential to the submitters, the review committee agenda, deliberations, and specific decisions are confidential and shared only with committee members and submitters for that round. The general nature of decisions may be shared outside the review process because such decisions may expose the need for rules changes.

The private submission repo will be deleted when the next MLPerf submission repo is created, or after 90 days.

Review committee decisions do not create precedents. Instead, the decisions should be explicitly incorporated into the rules through the normal process.

3. Operating principles

MLPerf’s purpose is to produce fair and useful benchmark results.

The MLPerf review committee reserves the right to amend these rules and/or exclude submissions that conflict with this purpose with a two-thirds (rounded up) vote. For instance, if the schedule is discovered to be untenable in practice, it may be amended. If a submission is judged to be deceptive or not of interest to the community, it may be excluded.

The role of the review process is to ensure fairness of submissions, not to litigate details in an effort to disquality competitors. For example:

Reviewing submitters should discuss issues with owning submitters after filing objections, and attempt to resolve the issue if possible.
If an objection is supported by the review committee, the objecting submitter should communicate with the owning submitter to ensure a satisfactory fix.
Issues in submission that are agreed to require correction, but that do not meaningfully impact performance (less than 2% cumulative performance difference) or competitive ordering may be waived by the review committee, subject to its discretion, and with the understanding that the submitter will correct the issue in future submissions.

4. Schedule

MLPerf has several submission rounds each year. Each submission round follows a detailed schedule.

4.1. Schedule of submission rounds

The submission schedule is to be set yearly, and must be approved by both the inference and training submitters meetings. The following is the remaining 2019 submission schedule.

Submission round	Submission date
Inference v0.5	October 11th

The following is the draft 2020 submission schedule:

Submission round	Submission date
Training v0.7	February 21st [tentative]
Inference v0.7 (numbering adjusted to align, replaces v0.6)	May, first Friday [tentative]
Training v0.8	August, first Friday [tentative]
Inference v0.8	November, first Friday [tentative]

4.2. Single submission round schedule

Each submission round has the following detailed schedule, which has three major phases:

Submission
Review
1. Objection filing
2. Objection review
3. Objection revision
Publication

Each of these phases is described in more detail later in this document.

Day	Meeting or deadline (all deadlines are 11:59pm San Jose unless otherwise specified)
Week -2	Presubmission
Monday
Tuesday
Wednesday	Submitters must sign CLA and provide primary and secondary POCs with Github handles and email addresses
Thursday
Friday	Result chair/General chair creates submission repo. Gives all submitters access. Send submitter POCs test email requesting they make a test submission to confirm access.
Week -1	Presubmission
Monday
Tuesday
Wednesday
Thursday
Friday	All “due in advance” writeups due (e.g. for inference calibration / weight transformation [link to inference rules, example]), submit as PR to repo (sample quantization writeup)"
	Receive random seed(s) for inference load generation, if applicable. [TODO remove from section 2.5 of inference doc.]
Week 0	Submission
Monday
Tuesday
Wednesday
Thursday	Last opportunity to notify chair that you will not submit
Friday [ TODO: consider Thursday for v0.6+]	1:00pm San Jose: Human readable results due by email to designated coordinator"
	1:30pm San Jose: Code due, submit as PR to repo
	1:30pm San Jose: Human readable results summary distributed by designated coordinator
Week 1	Review: objection filing
Monday	Begin drafting neutral press release [general chair until org, then executive director]
Tuesday	Review committee meeting, discuss objections
Wednesday
Thursday	Review committee meeting, discuss objections
Friday	Objections due in Github
Week 2	Review: objection review*
Monday	Submitter response to objections [ For v0.5 folks in Israel have two extra days to respond, and their issues are delayed to 2nd meeting.]
Tuesday	Review committee meeting, makes easy decisions and requests information about difficult ones.
Wednesday	Requested information due
	Distribute neutral press release for comment by [general chair until org, then executive director]
Thursday	Review committee meeting, makes any remaining decisions
Friday
Week 3	Review: objection revision
Monday
Tuesday	Review committee meeting, discusses any fixes
Wednesday	Final code due
Thursday	Review committee meeting, decides to approve/reject fixes if required
	Approve final draft of press release"
Friday	1:00pm San Jose: Final results in human readable form due, opportunity to withdraw
	1:30pm San Jose: Human readable results summary distributed by chair
Week 4	Publication
Monday	Press and analyst pre-briefings allowed under embargo, all briefings to include neutral press release
	1:00pm San Jose: Draft of results page available for comment
Tuesday	1:00pm San Jose: Corrections to results page due
	5:00pm San Jose: Results page and press release live on staging site
Wednesday	10:00am San Jose: results and PR public, press embargo ends

5. Submission

The submission process defines how to submit code and results for review and eventual publication.

5.1. Registration

Submitters must register with the submitters working group and begin attending meetings at least eight weeks before the deadline. In order to register, a submitter or their org must sign the relevant CLA and provide primary and secondary github handles and primary and secondary POC email address.

5.2. Github repo

MLPerf will provide a private Github repository for submissions. Each submitter will submit one or more pull requests containing their submission to the appropriate Github repo before the submission deadline. Pull requests may be amended up until the deadline.

5.3. Licensing

All submissions of code must be made under the MLPerf CLA, which is temporarily the Google open source CLA. Per the CLA, all submissions of code will be Apache 2 compatible. Third party libraries need not be Apache 2 licensed.

5.4. Submission content

A submission must contain the following:

Metadata for the systems under test
Code that implements the benchmarks
Metadata that describes each system-implementation combination tested
Scripts that setup and execute each system-implementation tested
Result logs for each system-implementation tested

5.5. Directory structure

A submission is for one code base for the benchmarks submitted. An org may make multiple submissions. A submission should take the form of a directory with the following structure. The structure must be followed regardless of the actual location of the actual code, e.g. in the MLPerf repo or an external code host site.

5.5.1. Training

<submitting_organization>/
- systems/
  - <system_desc_id>.json
- benchmarks/
  - <benchmark_name per reference>/ [TODO: rename the reference directories]
    
    implementations/
    
    <implementation_id>/
    
    <arbitrary stuff>
    
    <system_desc_id>/
    
    <system_desc_id>_<implementation_id>.json
    
    README.md
    
    setup.sh (one-time configuration script)
    
    init_datasets.sh (one-time dataset init script)
    
    run_and_time.sh (run the benchmark and produce a result)
- results/
  - <system_desc_id>/
    
    <benchmark>/
    
    result_<i>.txt # log file

System names and implementation names may be arbitrary.

Training benchmark directory names must be one of { resnet, ssd, maskrcnn, transformer, gnmt, ncf, minigo }.

5.5.2. Inference

<submitting_organization>/
- systems/
  - <system_desc_id>.json # combines hardware and software stack information
- code/
  - <benchmark_name per reference>/
    
    <implementation_id>/
    
    <Code interface with loadgen and other arbitrary stuff>
- measurements/
  - <system_desc_id>/
    
    <benchmark>/
    
    <scenario>
    
    <system_desc_id>_<implementation_id>_<scenario>.json
    
    README.md
    
    user.conf
    
    mlperf.conf
    
    calibration_process.adoc
- results/
  - <system_desc_id>/
    
    <benchmark>/
    
    <scenario>
    
    performance/
    
    run_x/ # 1 run for single stream and offline, 5 otherwise
    
    mlperf_log_summary.txt
    
    mlperf_log_detail.txt
    
    mlperf_log_trace.json
    
    accuracy/
    
    mlperf_log_accuracy.json
  - compliance_checker_log.txt

System names and implementation names may be arbitrary.

Inference benchmark directory names must be one of { mobilenet, ssd-small, resnet, ssd-large, gnmt }.

Here is the list of mandatory files for all submissions in any division/category. However, your submission should still include all software information and related information for results replication.

mlperf_log_summary.txt
mlperf_log_detail.txt
mlperf_log_trace.json
Mlperf_log_accuracy.json [ TODO: handle differently in v0.6?]
user.conf
calibration or weight transformation related code if the original MLPerf models are not used
actual models if the models are not deterministically generated
READMEs to enable users to replicate performance results
code which interfaces with the loadgen
<system_desc_id>_<implementation_id>_<scenario>.json
<system_desc_id>.json
compliance_checker_log.txt

5.6. <system_desc_id>.json metadata

The file <system_desc_id>.json should contain the following metadata describing the system:

Field	Meaningful response required	Cloud example	On-premise example
submitter	Yes	Google	[ TODO David Kanter to add ]
division	Yes	closed
status	Yes	available

system_name	Yes	tpu-v3
number_of_nodes	Yes	1
host_processors_per_node	Yes	1
host_processor_model_name	Yes	Intel Skylake
host_processor_core_count	Yes, or vcpu
host_processor_vcpu_count	Yes, or core	96
host_processor_frequency
host_processor_caches
host_processor_interconnect
host_memory_capacity	Yes	128GB
host_storage_type	Yes	SSD
host_storage_capacity	Yes	1 200 GB + 1 50 GB
host_networking
host_networking_topology
host_memory_configuration
accelerators_per_node	Yes	16
accelerator_model_name	Yes	tpu-v3
accelerator_host_interconnect
accelerator_frequency
accelerator_on-chip_memories
accelerator_memory_configuration	Yes	HBM
accelerator_memory_capacity	Yes	32 GB
accelerator_interconnect
accelerator_interconnect_topology
cooling
hw_notes

framework	Yes	TensorFlow 1.14 commit hash = faf9db515c4bf550daacc1c3a22fedf3ff5dde63
other_software_stack	Yes	TPU stack 1.14.1.dev20190518, python 3.6, sacrebleu 1.2.11
operating_system	Yes	Ubuntu 16.04
sw_notes

5.7. <system_desc_id>_<implementation_id>_<scenario>.json metadata

The file <system_desc_id>_<implementation_id>.json should metadata describing use of the specified implementation on the specified system.

Field	Meaningful response required	Example
Starting weights filename?	Yes	[ TODO David Kanter to add ]
Weight transformations?	Yes
Weight data type(s)	Yes
Input data type(s)	Yes
Retraining	Yes

5.8. Logging requirements

For Training, the results logs must be verified and stamped by the training log verification script [TODO log]. The easiest way to produce such a log is to use the

For Inference, the results logs must have been produced by the [standard load generator](https://github.com/mlperf/inference/tree/master/loadgen). Power information may be appended using the standard power information appending script [TODO link or remove].

5.9. Source code requirements for replication

The following section applies to all submissions in all divisions.

The source code must be sufficient to reproduce the results of the submission, given all source components specified. Any software component that would be required to substantially reproduce the submission must be uniquely identified using one of the following methods:

Source	Possible method	Considered “Available” for Category purposes (see later section)
Included in the submission	Source code	Yes
Build from anysupportedpublic Github repo	Commit hash	Yes
Built from a public Github repo plus a PR	Commit hash, PR number	Yes
Availablebinary (could be free to download or for purchase / customers only)	Name, version, url	Yes
Built from an internal source control system	Unique source identifier	No
Private binary	Checksum	No

5.10. Source code requirements for inference inspection

The following section applies to all submissions in the Closed division. We encourage Open division submissions to be as transparent as possible. We will re-examine in v0.6.

For inference, the source code, pseudo-code, or prose description must be sufficient to determine:

The connection to the loadgen
Preprocessing
The architecture of the model, and the operations performed
Weights (please notify results chair if > 2 GB combined)
Weight transformations
- If weight transformations are non-deterministic, then any randomness seeds used must be included in the submission.

For the inference server scenario, the source code, pseudo-code, or prose must be sufficient to determine:

Online batching, meaning how the server batches queries for processing

5.11. Source code requirements for training inspection

TBD

5.12. Automated compliance checking

Submitters must run the automated compliance checker to verify that their submission contains all content organized and formatted as described. Submission of the output of the compliance checker is required.

6. Review

6.1. Visibility of results and code during review

During the review process, only certain groups are allowed to inspect results and code.

Group	Can Inspect
Review committee	All results, all code
Submitters	All results, all code
Public	No results, no code

6.2. Auditing

TBD

6.3. Filing objections

Submitters must officially file objections to other submitter’s code by creating a GitHub issue prior to the “Filing objections” deadline that cites the offending lines, the rules section violated, and, if pertinent, corresponding lines of the reference implementation that are not equivalent.

Each submitter must file objections with a “by <org>” tag and a “against <org>” tag. Multiple organizations may append their “by <org>” to an existing objection if desired. If an objector comes to believe the objection is in error they may remove their “by <org>” tag. All objections with no “by <org>” tags at the end of the filing deadline will be closed.

Submitters should file an objection, then discuss with the submitter to verify if the objection is correct. Following filing of an issue but before resolution, both objecting submitter and owning submitter may add comments to help the review committee understand the problem.

If the owning submitter acknowledges the problem, they may append the “fix_required” tag and begin to fix the issue.

6.4. Resolving objections

The review committee will review each objection, and either establish consensus or vote. If the committee votes to support an objection, it will provide some basic guidance on an acceptable fix and append the “fix_required” tag. If the committee votes against an objection, it will close the issue.

6.5. Fixing objections

Code should be updated via a pull request prior to the “fixing objections” deadline. Following submission of all fixes, the objecting submitter should confirm that the objection has been addressed with the objector(s) and ask them to remove their “by <org> tags.

If the objector is not satisfied by the fix, then the review committee will decide the issue at its final review meeting. The review committee may vote to accept a fix and close the issue, or reject a fix and request the submission be moved to open or withdrawn.

6.6. Hyperparameter borrowing (training only)

Hyperparameters may be updated in accordance with the training rules prior to the final code due date.

6.7. Withdrawing results or changing division

Anytime up until the final human readable deadline, an entry may be withdrawn by amending the pull request. Alternatively, an entry may be voluntarily moved from the closed division to the open division.

7. Publication

MLPerf will publish all results simultaneously via an update to the results page. After publication, code and results are public and free for use under the MLPerf Terms of Use.

7.1. Results tables

There will be two results table published, one for Closed and one for Open.

7.2. Results table content

Each results table will contain the following information:

Field	Description
TBD	TBD

7.3. Results categories

Results will be divided into categories based on the availability of the hardware and software components

Category	Hardware	Software
Available in cloud	Available for rent in the cloud	Available
Available on premise	Available for purchase	Available
Preview	Must be available for rent or purchase in time for the next submission or within 180 days whichever is longer	Available except for software required to support substantially new hardware
Research, Development, or Internal	Does not meet the above requirements	Does not meet the above requirements

7.3.1. Available Systems

Available cloud systems must (1) have available pricing (either publicly advertised or available by request), (2) have been rented by at least one third party, (3) have public evidence of availability (web page saying product is available, statement by company, etc), and (4) be “reasonably available” for rent by additional third parties by the submission date.

An on-premise system is Available if all of its components that substantially determine ML performance are Available either individually or in aggregate (development boards that meet the substantially determine clause are allowed). An Available component or system must (1) have available pricing (either publicly advertised or available by request), (2) have been shipped to at least one third party, (3) have public evidence of availability (web page saying product is available, statement by company, etc), and (4) be “reasonably available” for purchase by additional third parties by the submission date. In addition, submissions for on-premise systems must describe the system and its components in sufficient detail to enable third parties to build a similar system.

In both cases, “reasonably available” means:

Supply and lead times are appropriate for system scale, i.e. on-demand and in quantity for the smallest systems and a few months and with limited supply for the largest systems.
Access to rent or purchase may be subject to conditions that are common to generally available products (such as financial qualifications, size of customer, support burden, export restrictions, etc.) but is not otherwise restricted (i.e. no “early access” approval requirements).

However, it is allowed for the qualifying pre-submission rentals/purchases to have been made with restrictions such as “early access” approval.

Available systems must use an Available software stack. A software stack consists of the set of software components that substantially determine ML performance but are not in the uploaded source code. For instance, for training this includes at a minimum any required ML framework (e.g. TensorFlow, pyTorch) and ML accelerator library (e.g. cuDNN, MKL). An Available software stack consists of only Available software components.

An Available software component must be well supported for general use. For open source software, you must base the software on any commit in an "official" repo plus a PR to support a particular architecture. For binaries, the binary must be made available as release, or as a "beta" release with the requirement that optimizations will be included in a future "official" release. The beta must be made available to customers as a clear part of the release sequence. The software must be available at the time of submission.

7.3.2. Preview Systems

A Preview system is a system which will meet the requirements of an Available system within 180 days of the submission date, or by the next MLPerf submission date, whichever is more, and which the submitter commits to submitting as an Available system by that time. If it is not submitted in that submission round with equal or better performance (allowing for noise), the Preview submission will be marked as invalid. Systems are exempt from this requirement if the submitted benchmarks are retired or changed to such a degree as no longer reasonably runnable on that system.

If a Preview system contains a newly developed hardware component (e.g. a new ML accelerator) that is a substantial contributor to the determination of ML performance, then for that submission only, the “Available software stack” requirement is waived for software that is necessary to support that component. Otherwise, Preview systems must meet the same Available software stack requirements as an Available system. For example, the first shipping version of a new accelerator need not meet the Available _software stack requirements, but subsequent SKUs of that accelerator are not considered newly developed, and must meet _Available software stack requirements.

7.3.3. Research, Development, or Internal Systems

A research, development, or internal (RDI) component does not meet the requirements for an available or preview component. An RDI system is a system containing one or more RDI components. The RDI components may not be submitted as Available components until the submission cycle after next or 181 days whichever is longer

8. After publication

8.1. Terms of use

Any use of published results in connection with the MLPerf trademark must follow the [terms of use.](https://github.com/mlperf/training_policies/blob/master/TERMS%20OF%20USE.md)

8.2. Issues discovered after publication

If a substantial issue (>5% cumulative change) with a closed division result is discovered after publication and confirmed by the review committee, the result may be fixed if possible in a two week timeframe, otherwise moved to the open division if possible, or marked non-compliant if necessary.