[BUG] Cannot Install NVTabular along PyTorch 1.9 #1175

init27 · 2021-10-10T05:02:11Z

Hi Team,
I have been trying to run through the examples and wanted to setup my local environment for it:

Steps/Code to reproduce bug

I install NVTabular using conda install -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.7 cudatoolkit=11.0
Following which PyTorch 1.9 wouldn't install, I follow the instructions from the PyTorch website for the same: conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

Based on my previous encounters, I know PyTorch 1.9 doesn't like Python <=3.8 due to a torchvision dependecy which is still being looked into, and it appears NVTabular won't install with Python 3.9?

I tried installing PyTorch 1.7 however, it seems that the required version is 1.9 for making the examples work?

Apologies if I'm missing something straightforward here, I've tried different iterations of making the install work but I'm still stuck at getting things up and running.

Could you please point me to the right steps of getting everything up and running?

Thanks in advance for your help!

Aha! Link: https://nvaiinfra.aha.io/features/MERLIN-504

The text was updated successfully, but these errors were encountered:

jperez999 · 2021-10-12T14:02:52Z

Hello @init27,

Based on the details you provided above, I think you may want to try using pip to install torch instead of conda. Unfortunately, conda will require you download separate cuda toolkits, one for nvtabular (11.0) and one for pytorch(10.2). That combination is already unstable and may lead to strange behaviors. If you want to run using conda to install nvtabular you may want to try installing pytorch using pip while in your activated conda environment. You may need to also update your path to include the pip package install location.

Alternatively you could use our prepackaged container hosted publicly on NGC. You would probably be interested in our Merlin Pytorch container, which can be found here: https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-pytorch-training. This has everything you need for all the pytorch examples (excluding inference) to execute successfully.

init27 · 2021-10-12T17:04:49Z

Hello @jperez999!

Thank you for the reply!

If it's helpful, I also tried installing pytorch with conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia cuda 11.1, but the same install would fail as well.

I will try the pip approach, thank you for suggesting it and for the NGC Links. I was skipping the NGC approach since I'm absolutely new to docker but incase pip is too difficult, I will try the NGC approach!

I had another Q, as I was trying to run the first few example notebooks, I had to install a few packages and ran into an erorr that required me to downgrade the sklearn version. If it helpful, may I raise a PR with a requirements.txt file to work with the examples?

I think there's some value for it, for someone who wants to run through the example notebooks and get a hang of the API. Please let me know.

TIA! :)

pintonos · 2021-10-22T11:37:08Z

Also having problems installing NVTabular with PyTorch.
I already tried all possible combinations of conda and pip install, but its not possible to use NVTabular as a dataloader via PyTorch:

Traceback (most recent call last):
  File "main.py", line 89, in <module>
    trainer.train()
  File "/home/.../miniconda3/envs/nvtabular10.2/lib/python3.8/site-packages/transformers/trainer.py", line 1093, in train
    train_dataloader = self.get_train_dataloader()
  File "/home/.../miniconda3/envs/nvtabular11.0/lib/python3.8/site-packages/transformers4rec/torch/trainer.py", line 128, in get_train_dataloader
    return T4RecDataLoader.parse(self.args.data_loader_engine).from_schema(
  File "/home/.../miniconda3/envs/nvtabular11.0/lib/python3.8/site-packages/transformers4rec/torch/utils/data_utils.py", line 53, in parse
    return dataloader_registry.parse(class_or_str)
  File "/home/.../miniconda3/envs/nvtabular11.0/lib/python3.8/site-packages/merlin_standard_lib/registry.py", line 265, in parse
    return self[class_or_str]
  File "/home/.../miniconda3/envs/nvtabular11.0/lib/python3.8/site-packages/merlin_standard_lib/registry.py", line 232, in __getitem__
    raise KeyError(
KeyError: 'nvtabular never registered with registry torch.dataloader_loader. Available:\n '

Note: I use NVTabular via the transformers4rec lib.

benfred · 2021-10-22T18:19:50Z

We don't have a conda package for python 3.9 - mainly because one of our major dependencies (cudf) only provides conda packages for python 3.7 and 3.8.

We do have a docker container with pytorch, nvtabular, transformers4rec, and all their dependencies here: docker pull nvcr.io/nvidia/merlin/merlin-pytorch-training:21.09 (see https://ngc.nvidia.com/catalog/containers/nvidia:merlin:merlin-pytorch-training for more info). If you're stuck installing nvtabular yourself, can you try the docker container?

@pintonos : can you post what you tried? How are you installing nvtabular?

pintonos · 2021-10-25T06:46:02Z

@benfred
In this order (including activating the environment):
conda create -n nvtabular102 -c nvidia -c rapidsai -c numba -c conda-forge nvtabular python=3.8 cudatoolkit=10.2

pip3 install torch==1.10.0+cu102 torchvision==0.11.1+cu102 torchaudio===0.10.0+cu102 -f https://download.pytorch.org/whl/cu102/torch_stable.html

pip install transformers4rec[all]

benfred · 2021-11-03T21:29:47Z

@pintonos - cudf doesn't support cuda 10.2, and will require cuda 11.0+ https://github.com/rapidsai/cudf#cudagpu-requirements . nvtabular itself is pretty easy to install (you can install without cudf by going pip install nvtabular - but this will mean running nvtabular only on the CPU, and remove gpu support). The problem is getting cudf installed.

Can you try the docker container to see if that works for you (docker pull nvcr.io/nvidia/merlin/merlin-pytorch-training:21.09). This container includes cuda 11.4 and user mode drivers so should work from your host machine, even if that only has cuda 10.2 installed

benfred · 2021-11-15T17:24:06Z

Closing - let me know if you are still having any issues getting this up and running

NegatioN · 2021-11-24T12:27:39Z

Side-point: I'm curious why there's a cuda 10.2 cudf provided by the rapidsai channel when docs explicitly say it doesn't work? cuda_10.2_py37_gab3b3f653a_0 rapidsai/linux-64. This seems to crash doing a random task I'm trying to run.

Main-point: Doing a simple/naive install of NVTabular+Transformers4rec+Pytorch is extremely frustrating as it stands now, as it by default resolves to cpu packages for Pytorch.
conda install -c pytorch -c fastai -c nvidia -c conda-forge -c anaconda -c rapidsai nvtabular pytorch transformers4rec cudf python=3.7 cudatoolkit=11.*

It's not like Pytorch doesn't support cuda 11, so isn't this simply a misspesification in some conda files somewhere? I would think enough of your target audience uses Pytorch that it would be a priority to have a working conda setup?

Sorry if my wording seems harsh, but I've just spent a few hours trying to work around this. I cannot use the preinstalled container you've previously linked @benfred, as I am already working in a containerized environment.

rnyak · 2021-11-24T14:40:12Z

Side-point: I'm curious why there's a cuda 10.2 cudf provided by the rapidsai channel when docs explicitly say it doesn't work? cuda_10.2_py37_gab3b3f653a_0 rapidsai/linux-64. This seems to crash doing a random task I'm trying to run.

Main-point: Doing a simple/naive install of NVTabular+Transformers4rec+Pytorch is extremely frustrating as it stands now, as it by default resolves to cpu packages for Pytorch. conda install -c pytorch -c fastai -c nvidia -c conda-forge -c anaconda -c rapidsai nvtabular pytorch transformers4rec cudf python=3.7 cudatoolkit=11.*

It's not like Pytorch doesn't support cuda 11, so isn't this simply a misspesification in some conda files somewhere? I would think enough of your target audience uses Pytorch that it would be a priority to have a working conda setup?

Sorry if my wording seems harsh, but I've just spent a few hours trying to work around this. I cannot use the preinstalled container you've previously linked @benfred, as I am already working in a containerized environment.

@NegatioN sorry to hear that you are having issues. What's the reason you cannot use merlin docker containers? can you try to do these steps:

docker pull nvcr.io/nvidia/merlin/merlin-pytorch-training:21.11   # current latest container
docker run -it --gpus all -p 9999:8888 -p 9797:9998 -p 9796:8777 --ipc=host nvcr.io/nvidia/merlin/merlin-pytorch-training:21.11 
cd /nvtabular
pip install torch==1.10.0
jupyter-lab --allow-root --ip='0.0.0.0' --NotebookApp.token=''

Then Open a browser from the host OS to access the jupyter-lab server using http://<MachineIP>:9999.

Let us know what's the issue you are facing in launching the container?

NegatioN · 2021-11-25T10:18:47Z

Hi @rnyak. The reason I can't use the container you're linking, is that I'm already running from inside a container. I don't want to run the merlin container inside my already running container, that's running on top of Kubernetes. I know it's technically possible, but that would force a lot of complexity on me. So I'm not struggling with getting the container itself to work, it probably works fine. I just want to not use it.

I'm still curious what the issue of installing Pytorch + Cudf (+ Nvtabular + transformers4rec) all from Conda is though? If it works when we do a lot of contrived steps to install it, there shouldn't be any incompatabilities, except not specifying package requirements properly?
If the installation process is so brittle that a Docker image is recommended, surely that means the installation process should be fixed? That way there would surely be more users of these libraries as well. Even if I get this working now in my environment, I'm worried about using it in production if the installation process isn't stable.

NegatioN · 2021-11-29T11:44:04Z

Bump and tag @rnyak: I guess the major issue is that [conda] Cudf is compatible with Cuda 11.0 & 11.2, while [conda] Pytorch[1.10] is compatible with cuda 11.3. Is there no option for Nvidia to build Cudf against all versions of cuda from 11.0 to 11.4 and release this in the conda repo?

That wouldn't help for potential incompatabilities in the NVTabular & Transformers4Rec repos that might arise with each bump of Pytorch, but shouldn't it make the build-process a lot smoother for pleb end-users like me? And it seems like a nice feature to have, unless it's utterly impossible.

Edit: Seems like there's already a thread for this going on in Cudf rapidsai/cudf#8510

init27 added the bug Something isn't working label Oct 10, 2021

benfred assigned benfred and jperez999 Oct 12, 2021

viswa-nvidia added P0 bug Something isn't working and removed bug Something isn't working P0 labels Oct 18, 2021

benfred added this to 21.12 Release Nov 3, 2021

benfred moved this to In Progress in 21.12 Release Nov 4, 2021

benfred closed this as completed Nov 15, 2021

Repository owner moved this from In Progress to Done in 21.12 Release Nov 15, 2021

benfred added the PyTorch label Nov 16, 2021

SalvadorMartiRoman mentioned this issue Jul 20, 2022

[QST] Running example notebooks without using the preset docker container NVIDIA-Merlin/Transformers4Rec#459

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Cannot Install NVTabular along PyTorch 1.9 #1175

[BUG] Cannot Install NVTabular along PyTorch 1.9 #1175

init27 commented Oct 10, 2021 •

edited by viswa-nvidia

Loading

jperez999 commented Oct 12, 2021

init27 commented Oct 12, 2021

pintonos commented Oct 22, 2021 •

edited

Loading

benfred commented Oct 22, 2021

pintonos commented Oct 25, 2021

benfred commented Nov 3, 2021

benfred commented Nov 15, 2021

NegatioN commented Nov 24, 2021 •

edited

Loading

rnyak commented Nov 24, 2021 •

edited

Loading

NegatioN commented Nov 25, 2021

NegatioN commented Nov 29, 2021 •

edited

Loading

[BUG] Cannot Install NVTabular along PyTorch 1.9 #1175

[BUG] Cannot Install NVTabular along PyTorch 1.9 #1175

Comments

init27 commented Oct 10, 2021 • edited by viswa-nvidia Loading

jperez999 commented Oct 12, 2021

init27 commented Oct 12, 2021

pintonos commented Oct 22, 2021 • edited Loading

benfred commented Oct 22, 2021

pintonos commented Oct 25, 2021

benfred commented Nov 3, 2021

benfred commented Nov 15, 2021

NegatioN commented Nov 24, 2021 • edited Loading

rnyak commented Nov 24, 2021 • edited Loading

NegatioN commented Nov 25, 2021

NegatioN commented Nov 29, 2021 • edited Loading

init27 commented Oct 10, 2021 •

edited by viswa-nvidia

Loading

pintonos commented Oct 22, 2021 •

edited

Loading

NegatioN commented Nov 24, 2021 •

edited

Loading

rnyak commented Nov 24, 2021 •

edited

Loading

NegatioN commented Nov 29, 2021 •

edited

Loading