-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vast.ai instance - **No module named 'upfirdn2d_plugin'** #72
Comments
Please post the full stacktrace for the "No module named 'upfirdn2d_plugin" exception, as requested in the issue template too:
|
Just updated the original post with the traceback for generate.py |
Somehow the real reason why the cpp extension build fails is not shown. You confirm this is on the latest version from github? Can you post git commit id also? See if you get any more information if you apply the suggestion from #39 (comment) |
I have followed the advice to modify those files and what I got is:
Ran it on the machine with gcc5.5 installed and got another error message
PS. The irony is that my windows machine is happily working with this repository while ubuntu fails. |
Are you sure you can't run Docker on this machine? It's usually an easy way to fix stuff like this. Anyway, your run with GCC 5.5 gets a lot further, so at least there's some progress. This error:
seems to suggest the compilation cannot find some cuda headers. In my containers it's here:
Do you have CUDA installed in the first place? There's another error here that indicates it can't even find the CUDA compiler:
|
vast ai support answered that I can't reinstall cuda, just get a new instance with a cuda of my choice. Which I did. UPD. I can't run docker since their instances are already inside Docker. |
Bummer that you can't use Docker. I'm not sure how much more help I can give apart from what I've already given above. I guess you'll have to work through the CUDA compilation issues on these instances. For example, why is nvcc not found when the extension gets built? Look through what the file system on the vast.ai instance looks like, does /usr/local/cuda exist, can you find nvcc in the expected location, ditto for the CUDA header files. If the CUDA toolkit is installed in some non-standard location, maybe you can point PyTorch to use it by setting CUDA_HOME appriately? See https://pytorch.org/docs/stable/cpp_extension.html and |
Thank you for your time. I am going to go to the square one and try to do this all over again and hope it works. Or rent an instance somewhere else. |
By the way, just analyzed my Windows logs and found that unfirdn2d is indeed not building properly either. Though this is a one-time error and it doesn't spam like in previous cases:
|
UPD. Vast ai issue fixed by choosing a "devel" type Ubuntu installation instead of "runtime", since runtime does not have nvcc and gcc and it's impossible to properly install them. |
@dokluch Hi, could you share how exactly you set the vast.ai instance up for stylegan training? It would be amazing if you could share the exact name of the image you used and the on-start script! Is it as simple as choosing |
That's pretty much it. You choose nvidia-cuda image with appropiate cuda version You don't have to install gcc, toolkit etc. Docker won't let you anyway. Then SSH to the instance and start training. I install miniconda and then run
If you need UI, then start jupyter lab from SSH. Here's a guide on that: https://gist.github.com/hsed/197ded8431bb545dffefb742dab5efb8 |
The solution is cool. |
Banging my head on this issue too... Which miniconda did you install? The StyleGAN docs say we should use python3.7 64 bits, but that installer is missing on the miniconda installers page... https://docs.conda.io/en/latest/miniconda.html#linux-installers it's got 32 bits for python3.7. Also that docker instance comes very bare bones, no man, no vim. But your conda and pip commands should be enough? Thanks a lot for all the pointers! I might finally see this through tonight... |
Later Python versions should work fine too. I regularly run StyleGAN2 pytorch with Python 3.8 and 3.9. |
It is finally working, phewwww. Thank you so much! So indeed, future confused users, just go straight for the docker image and enjoy your training! |
@dokluch Hi, I encountered exactly the same problem as you.... My error showed that I could not find nvcc, and my file cuda_runtime_api.h could not be found either .But there is no problem with other compilation tasks with nvcc ,I don't know why it fails when compiling. I am running on my local host, this is my machine information: ubuntu 16.04, pytorch 1.9.0 ,python3.7,CUDA 11.3, gcc 5.4.0,RTX Titan I have tried all the methods in the issue but the problem is still not solved. I don’t know if something is wrong with my ubuntu system. I hope to get some of your comments and opinions. I haven’t tried to use Docker yet. I don’t know if I can only move to Docker for training in the next step. Expect all the advice and suggestions. |
I can add that miniconda with Python 3.9 doesn't work (current latest version), while miniconda with Python 3.8 works like a charm. |
Stuck here big time with ImportError: No module named 'upfirdn2d_plugin'
I am using a vast.ai instance nvidia/cuda:11.2.1-cudnn8-runtime-ubuntu18.04
Conda environment is set with
conda install pytorch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2 cudatoolkit=11.0 -c pytorch --yes
(doesn't matter if I try a newer one)
What I've tried
FIrst I made sure my VM has CUDA 11.2 installed. Then I've installed a newer torch with CUDA 11.1.1, which did not help and I've rolled back (made a new env).
Removed torch_extensions
Just as described here:
#11
Didn't help
gcc
I found this thread and
#35
And tried installing gcc7
conda install -c conda-forge/label/gcc7 gcc_linux-64
(didn't help)and even gcc5
conda install -c psi4 gcc-5
The latter sent me in a weird loop and I've abandoned this path.
This does not help either
#2 (comment)
Google Colab works fine and has ubuntu 18.04 with gcc 7.5.0 installed which I am trying to mimic. Hope that is the correct logic.
UPD:
Another instance with gcc 7.5.0 throws the same error as well
UPD2
Installing gcc 5 as described here: https://askubuntu.com/questions/1087150/install-gcc-5-on-ubuntu-18-04
Did not help either
UPD3
Sorry for not including the traceback originally
Please advice on any possible next steps. No idea where to move next.
Originally posted by @dokluch in #2 (comment)
The text was updated successfully, but these errors were encountered: