Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load image extension - Windows CUDA 11.7 #7185

Open
atalman opened this issue Feb 6, 2023 · 9 comments
Open

Failed to load image extension - Windows CUDA 11.7 #7185

atalman opened this issue Feb 6, 2023 · 9 comments

Comments

@atalman
Copy link
Contributor

atalman commented Feb 6, 2023

🐛 Describe the bug

I observe following failures
https://github.com/pytorch/builder/actions/runs/4104686412/attempts/3
Windows CUDA 11.7, python 3.8-3.10

RuntimeError: Module torchvision FAIL: 1 Output: C:\Jenkins\Miniconda3\envs\conda-env-4104686412\lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Jenkins\Miniconda3\envs\conda-env-4104686412\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
torchvision: 0.15.0.dev20230206
Traceback (most recent call last):
  File "C:\Jenkins\Miniconda3\envs\conda-env-4104686412\lib\site-packages\torch\_ops.py", line 562, in __getattr__
    op, overload_names = torch._C._jit_get_operation(qualified_op_name)
RuntimeError: No such operator image::decode_png

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 65, in <module>
    main()
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 57, in main
    smoke_test_torchvision()
  File "C:\actions-runner\_work\builder\builder\pytorch\builder\vision\test\smoke_test.py", line 17, in smoke_test_torchvision
    all(x is not None for x in [torch.ops.image.decode_png, torch.ops.torchvision.roi_align]),
  File "C:\Jenkins\Miniconda3\envs\conda-env-4104686412\lib\site-packages\torch\_ops.py", line 566, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' 'image' object has no attribute 'decode_png'

Its same failure as: #7036
But now on windows.

cc @pmeier @NicolasHug @malfet

Versions

nightly

@atalman
Copy link
Contributor Author

atalman commented Feb 6, 2023

This issue seems to be mitigated by:

conda install libnvjpeg-dev -c nvidia     

@malfet
Copy link
Contributor

malfet commented Feb 7, 2023

It's a good old #4894

@malfet
Copy link
Contributor

malfet commented Feb 7, 2023

Another fun fact, libnvjpeg conda package on Windows is 4Kb, compared to 1.2Mb for Linux
https://anaconda.org/nvidia/libnvjpeg/files?version=11.8.0.2

@ptrblck , is this expected?

@ptrblck
Copy link
Contributor

ptrblck commented Feb 7, 2023

It seems it was available in CUDA 11.7: https://anaconda.org/nvidia/libnvjpeg/files?version=11.7.2.34, so I'll check if this change is expected for Windows.

@ptrblck
Copy link
Contributor

ptrblck commented Feb 7, 2023

Yes, it seems to be expected based on a response from the nvJPEG team:
The libnvjpeg-dev package should contain the .dll and headers while libnvjpeg contains the .lib.
The same convention is used for other libraries in the CUDA toolkit (on Windows).

@atalman
Copy link
Contributor Author

atalman commented Feb 7, 2023

Thank you @ptrblck , this means we need to include dev packages for all libraries that we link dynamically with

@cleebp
Copy link

cleebp commented Oct 26, 2023

I'm trying to build torchvision 0.15.2 with cuda 11.7 and pytorch 1.13.1 on windows and hitting this issue.

My environment has the following versions and is hitting this warning on package import, I tried adding libnvjpeg-dev and libnvjpeg from the nvidida channel but the warnings still are thrown on import with windows popup alerts, if they were just warnings it would be fine but the windows pop up makes this unusable in a CI system.

    cudatoolkit:        11.7.0
    cudnn:              8.5.0.96-0             
    jpeg:               9e-0                       
    libnvjpeg:          11.7.2.34-0                    
    libnvjpeg-dev:      11.7.2.34-0                    
    libpng:             1.6.39-h8cc25b3_0
    libtiff:            4.5.1-0                          
    pillow:             9.5.0-py39_1                     
    python:             3.9.18-0                         
    pytorch:            1.13.1-py3.9_cuda11.7_cudnn8.5_4 
    torchvision:        0.15.2-py39_torch1131_cuda117_2  

@NicolasHug
Copy link
Member

@cleebp you'll need pytorch 2.0 if you're using torchvision 0.15 - you can refer to our compatibility table here

@cleebp
Copy link

cleebp commented Oct 27, 2023

Thanks for the quick attention @NicolasHug!

Unfortunately we are stuck on pytorch 1.13.1 but are also moving to py311 this release so I don't think we have a supported torchvision version we can use from pypi's wheels. Similar to this issue with using pytorch lts and torchvision: pytorch/pytorch.github.io#828

I think we'd have to revert to torchvision 0.14.x for our version of pytorch and try to build from source for py311 but that probably isn't supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants