-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename CLI tools and move to proper entrypoint #396
Conversation
✅ Deploy Preview for torchtune-preview ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
sys.argv = [str(cmd)] + args.recipe_args | ||
runpy.run_path(str(cmd), run_name="__main__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the _script
folder is no more, does it still make sense to rely on runpy
and on manually building cmd
to run those files?
E.g. instead of building
cmd = pkg_path / "_cli" / "cli_utils" / "recipe_utils.py"
and then calling it with runpy.run_path
, we should just be able to call the stuff we need in torchtune._cli.cli_utils.recipe_utils
from within Python (i.e. right here)?
I'm not sure why this was done this way originally, but maybe we don't need that anymore?
CC @kartikayk @ebsmothers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't need to be addressed for this PR since this is pre-existing logic. But I suspect that whole logic could deserve a revamp.
It might be worth considering relying on argparse
's subcommands: https://docs.python.org/dev/library/argparse.html#sub-commands
Many programs split up their functionality into a number of sub-commands, for example, the svn program can invoke sub-commands like svn checkout, svn update, and svn commit. Splitting up functionality this way can be a particularly good idea when a program performs several different functions which require different kinds of command-line arguments [...]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with @NicolasHug on both points here. The subparser logic seems like it could be a viable approach for tackling our different tune subcommands. Feel free to file a follow-up issue for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great points - filed #397 for a follow-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making these changes! Generally looks good to me. A couple questions on locations of things:
(1) I'm a bit confused about the structure of torchtune/_cli
. For example why is the primary entry point a CLI util but its subcommands (like ls.py
) are not?
(2) What about the tests directory? Now that we are moving _cli
down a level in the directory hierarchy, should we be doing the same for tests/_cli
?
@@ -1,4 +1,8 @@ | |||
#!/usr/bin/env python3 | |||
# Copyright (c) Meta Platforms, Inc. and affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for renaming this to a .py file, that was honestly driving me insane
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do it for my fans.
return total > script_args | ||
|
||
if __name__ == "__main__": | ||
|
||
def main(): | ||
parser = get_args_parser() | ||
_update_parser_help(parser) | ||
args = parser.parse_args() | ||
|
||
distributed_args = _is_distributed_args(args) | ||
cmd = args.recipe | ||
if not cmd.endswith(".py"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb q: what are we doing here if the command does end in .py? (E.g. I wanna run tune my_local_recipe.py
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic will be entirely revamped in a follow-up PR. Just trying to do minimal changes here.
sys.argv = [str(cmd)] + args.recipe_args | ||
runpy.run_path(str(cmd), run_name="__main__") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with @NicolasHug on both points here. The subparser logic seems like it could be a viable approach for tackling our different tune subcommands. Feel free to file a follow-up issue for this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Joe,
This LGTM as a first step. I have left some comments, each of which would deserve its own follow-up (Evan's as well), but it's best to merge this PR now since it already provides a net improvement.
I can open the follow-up issues if that helps, LMK.
description="Package for finetuning LLMs and diffusion models using native PyTorch", | ||
entry_points={ | ||
"console_scripts": [ | ||
"tune = torchtune._cli.cli_utils.tune:main", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the _cli
namespace was added, _cli.cli_utils
could probably become _cli.utils
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I plan on getting rid of that subdirectory ASAP.
torchtune/_cli/hf_upload/LICENSE
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the plan for those files in hf_upload
? They're not available from tune
at the moment. Should they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upload
is still an open issue, according to our roadmap it's somewhere between P0.5 and P1. I know it's a bit of a cop-out, but I guess my response is "probably, but I'm going to cross this bridge when we get there".
The way it's currently structured,
Good catch, I'll go ahead and make this change in this PR. |
Sorry to clarify this point: my question is more around the locations of these files. The cli_utils subdirectory contains config_utils.py and recipe_utils.py, which makes sense. Meanwhile, the subcommands ls.py and download.py are just under _cli/. To me, tune.py is logically more similar to the second set of files than to the first set, so why is it grouped with the utilities instead of its own subcommands? |
Oh this will be changed. Waiting until landing 'tune cp' bc then I can just remove the whole subdir. |
Context
As mentioned in #370,
_scripts
was currently being packaged as it's own standalone package. The proper way to do this would be to move into thetorchtune
package and specify it as anentry_point
. This PR is the first in a collection of changes to properly package TorchTune.Why did you rename
_scripts/
to_cli
? Everything currently contained in the_scripts/
dir is related to the cli tool. As such, it makes sense to rename the dir. Once #388 lands, the_cli_utils
dir will also be deleted along w/recipe_utils.py
andconfig_utils.py
. Then thetune.py
command will live at the same level under the_cli
dir as the rest of the sub commands.Do I need to read all 19 files? NO, most of these are just moved files. Be sure to checkout the setup.py file though and some of the changes to
tune.py
.Changelog
_scripts
to_cli
_cli
undertorchtune
pkg dirtune
into a proper python file and modifysetup.py
to run it as an entry_pointTest plan
pytest tests/torchtune/_cli
Below indicates the steps taken in the test, a checkmark indicates a run equivalent to the behavior before these code changes:
tune ls
,tune full_finetune --config alpaca_llama2_full_finetune
tune ls
,tune full_finetune --config alpaca_llama2_full_finetune