-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable logging during precompilation #539
Conversation
ae57c3f
to
7095426
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
The inf2 builders have changed, and for the TGI workflow that uses the Makefile, it is missing the gawk package that was previously there. |
What does this PR do?
neuron_parallel_compile
reads the logs to extract the HLO graphs to compile. For some reason, strange characters are logged which makesneuron_parallel_compile
fail when it tries to read the file, resulting in only a small portion of the total number of graphs being compiled. This in turn results in compilation during actual training which makes things prohibitively long. To avoid all of this we simply do not log.dp_rank=tp_rank=0 and pp_rank = pp_size - 1
under PP setting so that Wandb and other callback can actually track the loss. While this is not ideal, and will be improved in the coming days, it works for now.