-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save output model to output_dir #1430
base: main
Are you sure you want to change the base?
Conversation
@@ -416,7 +416,7 @@ def save_output_model(config: Dict, output_model_dir: Union[str, Path]): | |||
|
|||
This assumes a single accelerator workflow. | |||
""" | |||
run_output_path = Path(config["output_dir"]) / "output_model" | |||
run_output_path = Path(config["output_dir"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw without this output_model
nesting, now the output of the cli would also have the footprint and other files copied over even though they mean nothing to the user of the cli and is messy+confusing. I think we might need an option to disable saving these files like you mentioned once before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or make it opt-in to save the extra files instead of opt-out. most users only care about the final output model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! I don't think users need those files. We should only copy model files here.
for resource_key, resource_path in all_resources.items(): | ||
src_path = Path(resource_path.get_path()).resolve() | ||
if src_path.is_dir(): | ||
hardlink_copy_dir(src_path, output_model_dir / src_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. output_model_dir / src_path = src_path
since src_path
is a fully resolved path. you probably wanted str_path.name
like in 448?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah true! I forgot to update this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please also check how the outputs for a graph capture for with --use_ort_genai
and --use_model_builder
look. the additional_files
attributes might make this copy the additional files into the output directory, even though they refer to files already in the model
subdir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I described in the previous comment, I think it will be easier to just make saving the footprint, etc opt-in in the workflow config. that way, even for cli, we can just use the final output dir and not need to do any of this copy and path update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention is copying additional files to output folder as they are also a part of the output model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the additional files get copied twice in this. Once is as part of the model_path
resource copy into model
subdir. Then they are again copied individually directly into the output_dir itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all additional files stored in the model path? I kinda remember it can be everywhere depending on the pass who created it. Was this logic updated before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is stored in model path normally. some passes like modelbuilder with metadata saves it in a different folder but that was because we weren't sure if should copy/hardlink the existing model files. But both the pass carry forward https://github.com/microsoft/Olive/blob/main/olive/passes/olive_pass.py#L274 and cache model save have always saved in model_path
Line 451 in 0db2d72
# we only have additional files for onnx models so saving to "model" is safe |
Since the output of a workflow goes through the cache model save, the additional files are already in model_path resources for onnx models. this is different for composite models where they are saved in output_dir.
So i think it's less hacky to just save the cli output directly in output_path
and opt out of saving the other footprints, etc. No need to temp directories or copy.
Describe your changes
Save output model to output_dir
Checklist before requesting a review
lintrunner -a
(Optional) Issue link