-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update docs to be clear on --gpus behaviour. #563
Comments
My two cents : I would be against 2 and 3.
BTW, this line makes that when |
@jeffling Let's keep the first option. As mentioned by @mpariente here are the reasons: Case 1: gpus=2 Case 2: gpus=[0, 3] Case 3: python main.py --gpus "0,1,2,3" I do agree that we need to support -1 for all (i thought we already did). So, the resolution seems to be updated docs, maybe a table with examples? |
I told that in the case the suggestion was adopted. -1 is supported as an integer and a string but not as a list. But it seems normal with the current behavior.
Can the docs be updated on master directly? |
I would not use a string for multiple options, argparse supports multiple values directly: parser.add_argument('--gpus', type=int, nargs='*', help='list of GPUs') to be used as python main.py --gpus 0 1 2 3 |
@Borda yeah, but this assumes users know argparse super well.... again, we need to keep in mind non-expert users. I have however thought about providing a default parser with args for each trainer flag so users don't have to remember them (maybe a new issue)? |
well, you should know it well as developer, but for the users you have the help message to guide them... then this default parameters can be written in docs or readme :] |
i'm talking about scientists, physicists etc... deep learning is not just being done by engineers or developers. That level of usability is critical and at the core of lightning |
I meant that the people writing argparser should do it easier like with this listed parameters, I personally would be very confused passing it as a string with a separator... but I am open discussion :] |
From commandline, an issue happens with using Let's say I'm doing two runs on 3 GPUs, one that uses 2 GPUs and one that uses 1. First run: If our ideal for the framework user is to be able to do straight pass-throughs to lightning, the string is a better choice. Otherwise everyone will need to deal with the 'list vs int' logic their own way. The original issue: if we're using string, a user can easily screw up Final resolution: The resolution then should be alternative 1, since we agree that don't want to get rid of the 'number of gpus' functionality (which was the original proposed aggressive solution). If we detect I still have a few PRs I want to do before this, so if anybody would like an easy one feel free to take it :) |
This is s simple fix, and we shall correct it in |
yeah, either way we need to keep support for passing in a string because this is exactly the friction we need to avoid with users |
How about we make a list in the docs with all accepted inputs and how they are parsed/interpreted, with the edge cases discussed here? Then we write a test for each case to make sure that the parsing works properly. The parsing is already quite complicated and it takes a bit of time to see the edge cases. |
@awaelchli yes, mind submitting a PR? |
sure. I will try to deliver it before next release. |
next release on 6th of Dec? :) |
@Borda yep, today or tomorrow :) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I have spent quite a while browsing a couple of discussions on this issue, and the latest referred official doc is missing: https://pytorch-lightning.readthedocs.io/en/latest/multi_gpu.html I have to write a small script to be sure of the behaviour of "python train.py --gpus 0" when the trainer is created using ".from_argparse_args(args)". The document is not doing is job well. Basically, I think that one needs to be explicit, no matter expert/non-expert in coding. So "--gpus" may not be a good parameter name from the first place. If we stick to it, there should be examples of using it
|
Final resolution:
The resolution then should be alternative 1, since we agree that don't want to get rid of the 'number of gpus' functionality (which was the original proposed aggressive solution).
If we detect --gpus 0 with int, a warning should suffice alongside updated docs.
Is your feature request related to a problem? Please describe.
Trainer.gpus can currently be used to specify a number of GPUs or specific GPUs to run on. This makes values like
0 (run on CPU), "0" (Run on GPU 0), [0] (run on GPU 0)
confusing for newcomers.
Describe the solution you'd like
As an aggressive solution to this issue, we move to have
gpus
always specify specific GPUs as that is the more encompassing case. Going forward, we can put a deprecation notice up when a single int is passed in:Then, in the next breaking version, we can simplify the behaviour.
Describe alternatives you've considered
gpus
mean number of GPUs: There are many cases where researchers need to run multiple experiments on the same time on a multi-gpu machine. Being able to specify which GPU easily would be useful. As an argument for this, one could use 'CUDA_VISIBLE_DEVICES' to do this.num_gpus
argument: This could make it self-documenting and allow for both workflows. However, it will be an additional argument to maintain.Additional context
The text was updated successfully, but these errors were encountered: