-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test hook support #263
Test hook support #263
Conversation
8e3e909
to
17f8388
Compare
17f8388
to
e5841d2
Compare
ec87e78
to
07fc86f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For existing prologue we use real NCCL run. In your examples it seems that we are switching to some predefined commands.
- How are we going to generate it?
- Will that cover our needs? cc @srivatsankrishnan
I do have some code related notes, but let's leave it for later discussion.
Yes, it is one of the main design choices that we need to make. |
Can we rely on existing mechanisms? Each plugin will be defined as a regular Test TOML, meaning we can generate a CLI for it for a particular system. This is what we do now and it seems to cover all our needs for this feature. |
7594c19
to
852fee8
Compare
@amaslenn , I ran verify-configs and got this warning
|
Let's always add hooks into lookup, we always now where it is: ...
err, tomls = expand_file_list(root, glob="**/*.toml")
err, hook_tomls = expand_file_list(HOOKS_DIR, glob="**/*.toml")
tomls += hook_tomls
... Let's also change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the call, Taekyung mentioned that there he tested with different nccl test for both pre and post scenarios. This will be a continuing feature to cover for other use cases.
Summary
This PR introduces hooks to CloudAI. Hooks are tests that run either before or after each test in a test scenario. They are defined globally within a test scenario and are automatically executed for each test. There are two types of hooks: pre-tests and post-tests. Pre-tests run before the tests, while post-tests are executed after the tests. Multiple pre-tests and post-tests can be specified in each scenario.
An example of how hooks are defined within a test scenario:
You can see the pre_test and post_test fields. These are used to look up the corresponding hook file. A hook file is a separate test scenario file as shown below:
If any of the tests in the pre-test fail, the main test or the post-test will not run. In other words, the main test and post-test run conditionally when the pre-test is successful. The tests in hooks have time limits, just as tests in the main scenario do. Output files should be stored in the output directory, in a subdirectory called "pre_test" or "post_test," following a proper directory hierarchy. Hooks are not supported for NeMo 1.0 (NeMo launcher).
Note
Test Plan
2.1 Success
2.2 Failure