Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename CLI tools and move to proper entrypoint #396

Merged
merged 6 commits into from
Feb 21, 2024

Conversation

joecummings
Copy link
Contributor

@joecummings joecummings commented Feb 20, 2024

Context

As mentioned in #370, _scripts was currently being packaged as it's own standalone package. The proper way to do this would be to move into the torchtune package and specify it as an entry_point. This PR is the first in a collection of changes to properly package TorchTune.

Why did you rename _scripts/ to _cli? Everything currently contained in the _scripts/ dir is related to the cli tool. As such, it makes sense to rename the dir. Once #388 lands, the _cli_utils dir will also be deleted along w/ recipe_utils.py and config_utils.py. Then the tune.py command will live at the same level under the _cli dir as the rest of the sub commands.

Do I need to read all 19 files? NO, most of these are just moved files. Be sure to checkout the setup.py file though and some of the changes to tune.py.

Changelog

  • Rename _scripts to _cli
  • Move _cli under torchtune pkg dir
  • Update all file paths
  • Change tune into a proper python file and modify setup.py to run it as an entry_point

Test plan

pytest tests/torchtune/_cli
Screenshot 2024-02-20 at 2 59 31 PM

Below indicates the steps taken in the test, a checkmark indicates a run equivalent to the behavior before these code changes:

  • Fresh conda install, tune ls, tune full_finetune --config alpaca_llama2_full_finetune
  • Pip uninstall, re-install, tune ls, tune full_finetune --config alpaca_llama2_full_finetune

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 20, 2024
Copy link

netlify bot commented Feb 20, 2024

Deploy Preview for torchtune-preview ready!

Name Link
🔨 Latest commit 48cf03b
🔍 Latest deploy log https://app.netlify.com/sites/torchtune-preview/deploys/65d6425524e3230008db4c35
😎 Deploy Preview https://deploy-preview-396--torchtune-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@joecummings joecummings marked this pull request as ready for review February 20, 2024 20:31
Comment on lines +125 to +126
sys.argv = [str(cmd)] + args.recipe_args
runpy.run_path(str(cmd), run_name="__main__")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that the _script folder is no more, does it still make sense to rely on runpy and on manually building cmd to run those files?

E.g. instead of building

cmd = pkg_path / "_cli" / "cli_utils" / "recipe_utils.py"

and then calling it with runpy.run_path, we should just be able to call the stuff we need in torchtune._cli.cli_utils.recipe_utils from within Python (i.e. right here)?

I'm not sure why this was done this way originally, but maybe we don't need that anymore?
CC @kartikayk @ebsmothers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be addressed for this PR since this is pre-existing logic. But I suspect that whole logic could deserve a revamp.

It might be worth considering relying on argparse's subcommands: https://docs.python.org/dev/library/argparse.html#sub-commands

Many programs split up their functionality into a number of sub-commands, for example, the svn program can invoke sub-commands like svn checkout, svn update, and svn commit. Splitting up functionality this way can be a particularly good idea when a program performs several different functions which require different kinds of command-line arguments [...]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with @NicolasHug on both points here. The subparser logic seems like it could be a viable approach for tackling our different tune subcommands. Feel free to file a follow-up issue for this

Copy link
Contributor Author

@joecummings joecummings Feb 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great points - filed #397 for a follow-up.

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these changes! Generally looks good to me. A couple questions on locations of things:

(1) I'm a bit confused about the structure of torchtune/_cli. For example why is the primary entry point a CLI util but its subcommands (like ls.py) are not?

(2) What about the tests directory? Now that we are moving _cli down a level in the directory hierarchy, should we be doing the same for tests/_cli?

@@ -1,4 +1,8 @@
#!/usr/bin/env python3
# Copyright (c) Meta Platforms, Inc. and affiliates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for renaming this to a .py file, that was honestly driving me insane

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do it for my fans.

return total > script_args

if __name__ == "__main__":

def main():
parser = get_args_parser()
_update_parser_help(parser)
args = parser.parse_args()

distributed_args = _is_distributed_args(args)
cmd = args.recipe
if not cmd.endswith(".py"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb q: what are we doing here if the command does end in .py? (E.g. I wanna run tune my_local_recipe.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic will be entirely revamped in a follow-up PR. Just trying to do minimal changes here.

Comment on lines +125 to +126
sys.argv = [str(cmd)] + args.recipe_args
runpy.run_path(str(cmd), run_name="__main__")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with @NicolasHug on both points here. The subparser logic seems like it could be a viable approach for tackling our different tune subcommands. Feel free to file a follow-up issue for this

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Joe,

This LGTM as a first step. I have left some comments, each of which would deserve its own follow-up (Evan's as well), but it's best to merge this PR now since it already provides a net improvement.

I can open the follow-up issues if that helps, LMK.

description="Package for finetuning LLMs and diffusion models using native PyTorch",
entry_points={
"console_scripts": [
"tune = torchtune._cli.cli_utils.tune:main",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the _cli namespace was added, _cli.cli_utils could probably become _cli.utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I plan on getting rid of that subdirectory ASAP.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the plan for those files in hf_upload? They're not available from tune at the moment. Should they?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

upload is still an open issue, according to our roadmap it's somewhere between P0.5 and P1. I know it's a bit of a cop-out, but I guess my response is "probably, but I'm going to cross this bridge when we get there".

@joecummings
Copy link
Contributor Author

@ebsmothers

(1) I'm a bit confused about the structure of torchtune/_cli. For example why is the primary entry point a CLI util but its subcommands (like ls.py) are not?

The way it's currently structured, tune.py is the entrypoint and it batches out subcommands to things like ls.py, download.py.

(2) What about the tests directory? Now that we are moving _cli down a level in the directory hierarchy, should we be doing the same for tests/_cli?

Good catch, I'll go ahead and make this change in this PR.

@ebsmothers
Copy link
Contributor

The way it's currently structured, tune.py is the entrypoint and it batches out subcommands to things like ls.py, download.py

Sorry to clarify this point: my question is more around the locations of these files. The cli_utils subdirectory contains config_utils.py and recipe_utils.py, which makes sense. Meanwhile, the subcommands ls.py and download.py are just under _cli/. To me, tune.py is logically more similar to the second set of files than to the first set, so why is it grouped with the utilities instead of its own subcommands?

@joecummings
Copy link
Contributor Author

The way it's currently structured, tune.py is the entrypoint and it batches out subcommands to things like ls.py, download.py

Sorry to clarify this point: my question is more around the locations of these files. The cli_utils subdirectory contains config_utils.py and recipe_utils.py, which makes sense. Meanwhile, the subcommands ls.py and download.py are just under _cli/. To me, tune.py is logically more similar to the second set of files than to the first set, so why is it grouped with the utilities instead of its own subcommands?

Oh this will be changed. Waiting until landing 'tune cp' bc then I can just remove the whole subdir.

@joecummings joecummings merged commit 5ae6169 into main Feb 21, 2024
17 checks passed
@joecummings joecummings deleted the move-scripts-to-proper-entrypoint branch February 21, 2024 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants