-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cannot Install NVTabular along PyTorch 1.9 #1175
Comments
Hello @init27, Based on the details you provided above, I think you may want to try using pip to install torch instead of conda. Unfortunately, conda will require you download separate cuda toolkits, one for nvtabular (11.0) and one for pytorch(10.2). That combination is already unstable and may lead to strange behaviors. If you want to run using conda to install nvtabular you may want to try installing pytorch using pip while in your activated conda environment. You may need to also update your path to include the pip package install location. Alternatively you could use our prepackaged container hosted publicly on NGC. You would probably be interested in our Merlin Pytorch container, which can be found here: https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-pytorch-training. This has everything you need for all the pytorch examples (excluding inference) to execute successfully. |
Hello @jperez999! Thank you for the reply! If it's helpful, I also tried installing pytorch with I will try the pip approach, thank you for suggesting it and for the NGC Links. I was skipping the NGC approach since I'm absolutely new to docker but incase pip is too difficult, I will try the NGC approach! I had another Q, as I was trying to run the first few example notebooks, I had to install a few packages and ran into an erorr that required me to downgrade the sklearn version. If it helpful, may I raise a PR with a I think there's some value for it, for someone who wants to run through the example notebooks and get a hang of the API. Please let me know. TIA! :) |
Also having problems installing NVTabular with PyTorch.
Note: I use NVTabular via the |
We don't have a conda package for python 3.9 - mainly because one of our major dependencies (cudf) only provides conda packages for python 3.7 and 3.8. We do have a docker container with pytorch, nvtabular, transformers4rec, and all their dependencies here: @pintonos : can you post what you tried? How are you installing nvtabular? |
@benfred
|
@pintonos - cudf doesn't support cuda 10.2, and will require cuda 11.0+ https://github.com/rapidsai/cudf#cudagpu-requirements . nvtabular itself is pretty easy to install (you can install without cudf by going Can you try the docker container to see if that works for you ( |
Closing - let me know if you are still having any issues getting this up and running |
Side-point: I'm curious why there's a cuda 10.2 cudf provided by the rapidsai channel when docs explicitly say it doesn't work? Main-point: Doing a simple/naive install of NVTabular+Transformers4rec+Pytorch is extremely frustrating as it stands now, as it by default resolves to cpu packages for Pytorch. It's not like Pytorch doesn't support cuda 11, so isn't this simply a misspesification in some conda files somewhere? I would think enough of your target audience uses Pytorch that it would be a priority to have a working conda setup? Sorry if my wording seems harsh, but I've just spent a few hours trying to work around this. I cannot use the preinstalled container you've previously linked @benfred, as I am already working in a containerized environment. |
@NegatioN sorry to hear that you are having issues. What's the reason you cannot use merlin docker containers? can you try to do these steps:
Then Open a browser from the host OS to access the jupyter-lab server using Let us know what's the issue you are facing in launching the container? |
Hi @rnyak. The reason I can't use the container you're linking, is that I'm already running from inside a container. I don't want to run the merlin container inside my already running container, that's running on top of Kubernetes. I know it's technically possible, but that would force a lot of complexity on me. So I'm not struggling with getting the container itself to work, it probably works fine. I just want to not use it. I'm still curious what the issue of installing Pytorch + Cudf (+ Nvtabular + transformers4rec) all from Conda is though? If it works when we do a lot of contrived steps to install it, there shouldn't be any incompatabilities, except not specifying package requirements properly? |
Bump and tag @rnyak: I guess the major issue is that [conda] Cudf is compatible with Cuda 11.0 & 11.2, while [conda] Pytorch[1.10] is compatible with cuda 11.3. Is there no option for Nvidia to build Cudf against all versions of cuda from 11.0 to 11.4 and release this in the conda repo? That wouldn't help for potential incompatabilities in the NVTabular & Transformers4Rec repos that might arise with each bump of Pytorch, but shouldn't it make the build-process a lot smoother for pleb end-users like me? And it seems like a nice feature to have, unless it's utterly impossible. Edit: Seems like there's already a thread for this going on in Cudf rapidsai/cudf#8510 |
Hi Team,
I have been trying to run through the examples and wanted to setup my local environment for it:
Steps/Code to reproduce bug
conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.0
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
Based on my previous encounters, I know PyTorch 1.9 doesn't like
Python <=3.8
due to a torchvision dependecy which is still being looked into, and it appears NVTabular won't install with Python3.9
?I tried installing PyTorch
1.7
however, it seems that the required version is1.9
for making the examples work?Apologies if I'm missing something straightforward here, I've tried different iterations of making the install work but I'm still stuck at getting things up and running.
Could you please point me to the right steps of getting everything up and running?
Thanks in advance for your help!
Aha! Link: https://nvaiinfra.aha.io/features/MERLIN-504
The text was updated successfully, but these errors were encountered: