-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trainer on colab TPU error: process X terminated with signal SIGSEGV #1956
Comments
Hi! thanks for your contribution!, great first issue! |
The problem here is the learning rate finder doesn't work with anything multi-process... @SkafteNicki . |
This is weird. It is correct that learning rate finder does not have multi process support, but that is because the state of the search is destroyed when @VictorCallejas do |
The error is when running Yes, it works on cpu or gpu. |
Then I guess that the problem is unrelated to the learning rate finder, if standard |
that is spawn issue, shall be fixed in #2632 |
I am trying to train an image encoder with pytorch-lightning on colab using TPU(8 cores).
I am following this demo notebook: https://colab.research.google.com/drive/1-_LKx4HwAxl5M6xPJmqAAu444LTDQoa3#scrollTo=dEeUzX_5aLrX
LIbraries version:
torch: 1.5.0
torchvision_ 0.6.0
pytorch-lightning: 0.7.5
pytorch-xla:1.6
I have also tried with nightly versions and older, but same error.
When running:
Error:
My notebook on Gist here: https://colab.research.google.com/gist/VictorCallejas/10e4c39fc25051012ae28a2a7261f814/untitled.ipynb
It seems it is raising the exception because the other processes are not joining, but I have not a clue why. I have everything as in the demo notebook.
Thanks.
The text was updated successfully, but these errors were encountered: