-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4.1 postmortem] Unit test module for inference repo #1865
Comments
We also need a deadline for these to be done before every submission round. May be the same as code freeze date? Currently we do have github tests for reference benchmark runs which includes download of models and datasets and even the submission checker - it is completed for small models and we also have GPTJ and SDXL tests running on self hosted github runners. We hope to cover LLAMA2, DLRMv2 and Mixtral this month. The main concern is with upcoming benchmarks - say if the benchmark needs multiple GPUs and Terabytes of memory like say the GNN - we do not have an infrastructure to do the runs. There unit tests for the conf files and log parsing may be the only option. |
@arjunsuresh Thank you for the suggestions, I think we can split into steps and cover the most important parts first (submission checker, config files, loadgen) |
Yes, we do need unit tests for submission checker. Currently we check it by running it on the previous submission repository which can catch most of the issues but not all. It is currently failing because "images" folder was removed before publication for stable diffusion but this was not updated in the submission checker. Similar situation for loadgen - we currently have tests for it and we run it on the previous benchmarks. But if a new benchmark comes and new configs are added - they are not tested. |
@pgmpablo157321 to work on this |
We propose to add some basic unit test framework (likely pytest) and tests to the inference repo. Ideally, it should test:
To get this started, we can aim at some simple tests:
@pgmpablo157321 to help implement
@nv-alicheng for more suggestions.
The text was updated successfully, but these errors were encountered: