-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip model saving during precompilation, provide option to skip cache push #365
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thank you very much for the pull-request. Tagging @michaelbenayoun which is the main maintainer for this code. |
if not os.environ.get("NEURON_PARALLEL_COMPILE"): # Avoid unnecessary model saving during precompilation | ||
if output_dir is None: | ||
output_dir = self.args.output_dir | ||
|
||
self._save_xla(output_dir) | ||
self._save_xla(output_dir) | ||
|
||
# Push to the Hub when `save_model` is called by the user. | ||
if self.args.push_to_hub and not _internal_call: | ||
self.push_to_hub(commit_message="Model save") | ||
# Push to the Hub when `save_model` is called by the user. | ||
if self.args.push_to_hub and not _internal_call: | ||
self.push_to_hub(commit_message="Model save") | ||
else: | ||
logger.info("Skipping trainer.save_model() while running under neuron_parallel_compile") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this, but wont it trigger compilation when saving during actual training?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a very quick incidental compilation related to checkpointing - I will time it to see if it's significant and update here.
This is really meant to avoid the several minutes waiting and 38GB storage used up by checkpointing during neuron_parallel_compile. We do something similar for neuronx-nemo-megatron
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, then let's do it!
This PR contains 2 changes:
--skip_cache_push
so we can avoid pushing compilation artifacts to the hub cache when this step isn't required. This saves a few minutes for llama2-7B. (Perhaps skipping the push should be the default behaviour?).Tested with Llama-2-7b-hf using run_clm.py.