-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optionally disable logging in the data sampler to support predict_step #10127
Commits on Aug 21, 2024
-
Resolve merge conflicts with consumed sample logging
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 9fa0364 - Browse repository at this point
Copy the full SHA 9fa0364View commit details -
Add test file that captures the predict step error
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 6374c2e - Browse repository at this point
Copy the full SHA 6374c2eView commit details -
Add fixme comment around proper checkpoint nemo2 handling
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for c6c93ac - Browse repository at this point
Copy the full SHA c6c93acView commit details -
Skip megatron training test on CPU nodes
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 7058723 - Browse repository at this point
Copy the full SHA 7058723View commit details -
Move output_log to last arg for compatibility
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for e391c72 - Browse repository at this point
Copy the full SHA e391c72View commit details -
try setting the default root dir in predict to avoid writing artifact…
…s to cwd Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 9997393 - Browse repository at this point
Copy the full SHA 9997393View commit details -
Handle the new check for batch samplers to enable predict_step
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 3720193 - Browse repository at this point
Copy the full SHA 3720193View commit details -
Only reset the global microbatch, not entire parallel state
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 70fe6fa - Browse repository at this point
Copy the full SHA 70fe6faView commit details -
Destroy the right sets of state in test of lightning trainer
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for 8c1ea86 - Browse repository at this point
Copy the full SHA 8c1ea86View commit details -
Fix typo and rename state resetting functions
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for dfdf426 - Browse repository at this point
Copy the full SHA dfdf426View commit details -
Run test in a subprocess to avoid contaminating global state
Signed-off-by: John St John <jstjohn@nvidia.com>
Configuration menu - View commit details
-
Copy full SHA for a6ff157 - Browse repository at this point
Copy the full SHA a6ff157View commit details