-
Notifications
You must be signed in to change notification settings - Fork 448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Break BC][RFC] utils directory refactor #1421
Comments
This was referenced Aug 27, 2024
Merged
13 tasks
13 tasks
As a bonus, can this include replace the generic logger everywhere it's cropped up with the utils.logger? |
Sorry, what do you mean exactly? do you have an example? |
Here is one example. The utils version makes sure the logger is setup right for distributed runs. |
This was referenced Aug 28, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Collecting all the discussions around this here from many private channels. @ebsmothers has already written up an excellent quick RFC on this topic (#1414) but I wanted to formalize the discussion here.
Problem
torchtune/utils/
is a massive directory with a wide spectrum of helpers, from things likeshard_model
for distributed training toget_logger
to retrieve Python's built-in logger... this is not great. It's served as a "i don't know where else to put this function" type of folder but this is leading to tangible problems:Ideally, the utils directory remains a miscellaneous collection of helpers that any other directory in the library can take a dependency on, but this isn't the case with folders like
utils/_checkpointing
andutils/_distributed.py
depending onmodels/
andmodules/
Here is the current dependency graph (A -> B means B depends on A). Notice how easy it is to create a cycle.
If
data
took a dependency onutils
, you would get a cycle.modules
andmodels
can never take a dependency on utils otherwise there's a clear cycle.modules
andmodels
cannot take a dependency onconfig
ordataset
because that would create a cycle. This is not scalable and is currently blocking important features, like #1193, and adding unnecessary tech debt, like deprecate being placed indata
instead ofutils
to avoid a cycle (see #1286)Approach
We need a way to restructure our dependency graph. However, this would induce a massive refactor and would undoubtedly break BC. So there's a couple of options.
utils/_checkpointing
->utils/checkpointing
and removecheckpointing
imports fromutils/__init__.py
. Import fromutils/checkpointing
directly. This removes the utils-models dependency and stops a common cause of cycles. This is the easiest approach, but will still require updating all our configs, will break BC, does not address the utils-modules dependency, nor the core problem of utils being bloatedtraining
. Keep utils as a directory that does not take a dependency on any other directory. This fundamentally restructures our dependency graph and will prevent further cycles.Since both approaches are breaking BC, we may as well only break it a single time and fundamentally fix the problem, so I propose 2. We can debate the actual name of the new folder. Some options:
This new directory will contains all utilities related to training and are used in recipes. So these are the new locations for the files in utils:
training
generation
config
data
utils
Distributed utilities could possibly be in their own folder since these are usually shared across training and inference recipes, and it may be odd to import from training in a generate/inference recipe.
@kartikayk @ebsmothers @felipemello1 @pbontrager
The text was updated successfully, but these errors were encountered: