-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
'NoneType' object has no attribute 'test_step' when DDP #577
Comments
good catch. Mind submitting a PR? |
@williamFalcon > I investigated and I think the problem comes from this line: After exiting function this line, |
Is there a workaround available in the meantime? |
Option A: option B: |
Sorry - I assumed that this had already led to a pr that included the fact that both trainer.test() and trainer.test(model) with various errors when using ddp. Ill do some work to narrow it and submt a pr. |
Hi, I am using the latest-version (pip install https://github.com/PytorchLightning/pytorch-lightning/archive/master.zip --upgrade) and I have met the same error when calling trainer.test(model). |
We are working on it now... this appears only for multi-GPU or have you observed it elsewhere? |
@pableeto can you put code here? |
I do have a fix for this with DDP – I’ve added a True/False argument to Trainer() that is used to call trainer.test() for you if you are running ddp or ddp2 back end.
It only calls it after training is complete. Or it can be used with any back end, if you like.
I have no way to test this on a cluster, but have tested it on my single node, 7 gpu machine running Ubuntu.
I do not believe there is a way to make the current syntax work to resolve this in any of the modes which call pytorch spawn(). There is just no way to get the trainer and model back from that is more efficient than simply reloading a saved checkpoint. It is possible to create a system of hooks for saving and restoring pieces of models – a little like current practice with checkpoints. But I think this is fraught with potential user errors, as every little change in the way the model is created will affect this.
I call the new trainer argument ‘ddp_run_test_auto’, and it defaults to False.
If you would like me to submit a pr, I will move the fix to master, retest, and submit.
seth
|
Describe the bug
When I activate the DDP, the
test_step
function is replaced byNone
. No problem when I run on one GPU.To Reproduce
Give the following error:
Change
ddp = True
byddp = False
and no error.Version:
The text was updated successfully, but these errors were encountered: