Replies: 3 comments 5 replies
-
@ananthsub I fully agree here. I think many of those arguments can actually be removed by forcing users to pass in the respective callback for that, since in many cases the argument is just a shortcut for the callback instantiation. This means we can unbloat the trainer interface and also make things more explicit. With the following approach (which is just my opinion) we would be down from 60 trainer init args to ~ 15 core args and some others added to the respective trainer entrypoints (since they are only valid for that specific function) Detailed list of what should happen with each argument IMOFrom the list of the current trainer init args:
On the other hand, I know that one of the main concerns was to have everything easily accessible for the user. However, I strongly think that it's more of a downside now since it becomes overwhelming. Maybe we could have a function that takes all these arguments and provides condensed trainer arguments (including callbacks etc.) for users instead? directly tagging @tchaton @awaelchli @SeanNaren @carmocca @kaushikb11 :) EDIT: |
Beta Was this translation helpful? Give feedback.
-
Thanks for the thoughtful arguments and overview @ananthsub ❤️ To give some more context, the Trainer reached due to the following decisions:
This number of I see this as a trade-off between increased usability for general users vs good engineering design and practices. A side effect of the latter can actually be increased usability for power and experienced users (all of us having this discussion). As a power user myself, I find your approach much more satisfying, but I want for us to see the argument both ways before moving forward. |
Beta Was this translation helpful? Give feedback.
-
@PyTorchLightning/core-contributors we should keep this in mind and learn from these experiences. If any of you made similar observations in past projects, share your learnings <3 |
Beta Was this translation helpful? Give feedback.
-
I wanted to raise this discussion based on observations of going through the trainer constructor, the number of arguments it currently contains, and why this list has continually grown. I think this is representative of a number of other components in the framework as it has grown over time (hooks added, args added, etc).
I've now filed several issues around how we can deprecate arguments in the trainer constructor
stochastic_weight_avg
from Trainer constructor #8971process_position
from the Trainer constructor #8968More often than not:
callbacks
argument instead of creating a new flag). I think we should update the guidelines here that adding a flag to the trainer is not cheap: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/.github/CONTRIBUTING.md#force-user-decisions-to-best-practicesFrom discussions with @awaelchli , some patterns emerge:
The trainer is also partially a command line interface where users can interact with it without code changes. By supporting a flat list of primitive type arguments, it lends well to configuration tools like argparsers for command-line based invocation and script execution. By asking users to instantiate objects, users have to write some additional code.
Psychologically, I think another reason this argument list grows is that it's already so big, so people feel less bad about adding one more line. What's 56 vs 57 arguments anyways?
My thoughts are:
Configuration should never be the determining factor for the API design. If we do, our API is limited by the bounds of the configuration system. We lose out on a lot of flexibility and features if we restrict how we can write code, or how we expect users to instantiate the Trainer. This is fundamentally a problem for configuration tools to solve, such as the LightningCLI/jsonargparse and Hydra to figure out. Ideally, the trainer (or any Lightning component) should not take any dependence on any config system or assume that a user is even using a config system.
Unchecked, this list will continue to grow. The longer this list grows, the harder it is for users to onboard. Who has patience to read through all these arguments anyways? If this is part of people's getting started with lightning journey, how many users do we lose here in the funnel because they decide its easier to write their own training program?
The longer this list grows, the harder it is for us to make changes over time, especially with seemingly redundant flags, or with arguments which nobody has context over anymore. Deprecation cycles are painful for both developers & users. The project slows down in the meanwhile because the framework has to support both old and new paths at the same time, which makes the internal codebase much more complex.
This is based on my prior experience working on PyTorch training frameworks. The codebase got so complex that the decision was made to do a full rewrite, which is the most painful project of all. One of my motivations for switching to lightning was its relative simplicity at the time. I'd really like to focus more on how we can simplify the framework!
@PyTorchLightning/core-contributors I'd love your input here as well
Beta Was this translation helpful? Give feedback.
All reactions