Update documentation for training to be accurate #425

TheKanter · 2020-11-01T17:24:23Z

https://github.com/mlperf/training/blob/master/README.md seems stale.

Should we list the current training benchmarks/datasets/accuracy?

johntran-nv · 2022-11-21T22:02:06Z

I agree that this README is not useful at the moment. I think most of the useful information is in the training_policies repo, which begs the question: why does that need to be a separate repo? At minimum, we should point people to the training rules and the contributing guidelines. But maybe we could consider merging the repos as well? Does anyone have history/context on why they need to be separate?

TheKanter · 2022-11-27T17:29:16Z

@petermattson Originally set it up so that:

Training rules
submission rules
Training code

Were all separate.

The submission rules are used by other benchmarks (E.g., inference, HPC).

The training rules are used by other benchmarks (e.g., HPC).

So we have this complicated inheritance scheme that makes things somewhat complicated and hard to understand.

Additionally, it is difficult in GitHub to enforce cross-repo checks (e.g., if we wanted a checker that would ensure training code and rules are consistent).

I think it is possible to revisit, but this is definitely something that would impact all benchmarks and require a big refactoring. I think it could also significantly enhance understandability.

I understand the idea that having a single place to change things is attractive, but that conceptually favors writes (change rules) over reads (understand rules).

petermattson · 2022-11-28T10:08:35Z

The write benefit is less about less work and more about knowledge sharing -- find an issue in one place a propagate. That said, we're probably over shared right now. I believe there's a way to do document "#include" -- which could potentially let us have one doc per benchmarks that pulls a few well-defined pieces from other places. This would let us increase or decrease sharing gradually.

…

On Sun, Nov 27, 2022 at 6:29 PM David Kanter ***@***.***> wrote: @petermattson <https://github.com/petermattson> Originally set it up so that: 1. Training rules 2. submission rules 3. Training code Were all separate. The submission rules are used by other benchmarks (E.g., inference, HPC). The training rules are used by other benchmarks (e.g., HPC). So we have this complicated inheritance scheme that makes things somewhat complicated and hard to understand. Additionally, it is difficult in GitHub to enforce cross-repo checks (e.g., if we wanted a checker that would ensure training code and rules are consistent). I think it is possible to revisit, but this is definitely something that would impact all benchmarks and require a big refactoring. I think it could also significantly enhance understandability. I understand the idea that having a single place to change things is attractive, but that conceptually favors writes (change rules) over reads (understand rules). — Reply to this email directly, view it on GitHub <#425 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIIVUHKJEU7YYN4EA6NFPG3WKOK7NANCNFSM4TGUVTHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

peladodigital · 2022-12-01T16:32:03Z

In an effort to clean up the git repo so we can maintain it better going forward, the MLPerf Training working group is closing out issues older than 2 years, since much has changed in the benchmark suite. If you think this issue is still relevant, please feel free to reopen. Even better, please come to the working group meeting to discuss your issue

TheKanter · 2022-12-03T21:32:55Z

This needs to be fixed. Please have @johntran-nv or @erichan1 put on agenda.

hiwotadese · 2024-07-25T16:09:38Z

Closing this because the readme right now list current benchmark and dataset. If there is something specific that is not listed in the readme create a new issue.

TheKanter assigned bitfort and johntran-nv Nov 1, 2020

johntran-nv added the general applies to all benchmarks label Nov 8, 2022

johntran-nv unassigned bitfort and johntran-nv Nov 21, 2022

peladodigital closed this as completed Dec 1, 2022

TheKanter reopened this Dec 3, 2022

hiwotadese closed this as completed Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update documentation for training to be accurate #425

Update documentation for training to be accurate #425

TheKanter commented Nov 1, 2020 •

edited

Loading

johntran-nv commented Nov 21, 2022

TheKanter commented Nov 27, 2022

petermattson commented Nov 28, 2022 via email

peladodigital commented Dec 1, 2022

TheKanter commented Dec 3, 2022

hiwotadese commented Jul 25, 2024

Update documentation for training to be accurate #425

Update documentation for training to be accurate #425

Comments

TheKanter commented Nov 1, 2020 • edited Loading

johntran-nv commented Nov 21, 2022

TheKanter commented Nov 27, 2022

petermattson commented Nov 28, 2022 via email

peladodigital commented Dec 1, 2022

TheKanter commented Dec 3, 2022

hiwotadese commented Jul 25, 2024

TheKanter commented Nov 1, 2020 •

edited

Loading