-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update documentation for training to be accurate #425
Comments
I agree that this README is not useful at the moment. I think most of the useful information is in the training_policies repo, which begs the question: why does that need to be a separate repo? At minimum, we should point people to the training rules and the contributing guidelines. But maybe we could consider merging the repos as well? Does anyone have history/context on why they need to be separate? |
@petermattson Originally set it up so that:
Were all separate. The submission rules are used by other benchmarks (E.g., inference, HPC). The training rules are used by other benchmarks (e.g., HPC). So we have this complicated inheritance scheme that makes things somewhat complicated and hard to understand. Additionally, it is difficult in GitHub to enforce cross-repo checks (e.g., if we wanted a checker that would ensure training code and rules are consistent). I think it is possible to revisit, but this is definitely something that would impact all benchmarks and require a big refactoring. I think it could also significantly enhance understandability. I understand the idea that having a single place to change things is attractive, but that conceptually favors writes (change rules) over reads (understand rules). |
The write benefit is less about less work and more about knowledge sharing
-- find an issue in one place a propagate.
That said, we're probably over shared right now.
I believe there's a way to do document "#include" -- which could
potentially let us have one doc per benchmarks that pulls a few
well-defined pieces from other places.
This would let us increase or decrease sharing gradually.
…On Sun, Nov 27, 2022 at 6:29 PM David Kanter ***@***.***> wrote:
@petermattson <https://github.com/petermattson> Originally set it up so
that:
1. Training rules
2. submission rules
3. Training code
Were all separate.
The submission rules are used by other benchmarks (E.g., inference, HPC).
The training rules are used by other benchmarks (e.g., HPC).
So we have this complicated inheritance scheme that makes things somewhat
complicated and hard to understand.
Additionally, it is difficult in GitHub to enforce cross-repo checks
(e.g., if we wanted a checker that would ensure training code and rules are
consistent).
I think it is possible to revisit, but this is definitely something that
would impact all benchmarks and require a big refactoring. I think it could
also significantly enhance understandability.
I understand the idea that having a single place to change things is
attractive, but that conceptually favors writes (change rules) over reads
(understand rules).
—
Reply to this email directly, view it on GitHub
<#425 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AIIVUHKJEU7YYN4EA6NFPG3WKOK7NANCNFSM4TGUVTHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
In an effort to clean up the git repo so we can maintain it better going forward, the MLPerf Training working group is closing out issues older than 2 years, since much has changed in the benchmark suite. If you think this issue is still relevant, please feel free to reopen. Even better, please come to the working group meeting to discuss your issue |
This needs to be fixed. Please have @johntran-nv or @erichan1 put on agenda. |
Closing this because the readme right now list current benchmark and dataset. If there is something specific that is not listed in the readme create a new issue. |
https://github.com/mlperf/training/blob/master/README.md seems stale.
Should we list the current training benchmarks/datasets/accuracy?
The text was updated successfully, but these errors were encountered: