Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure in docker because of libnvcuvid #102

Closed
trifle opened this issue Oct 16, 2020 · 14 comments · Fixed by #104
Closed

Build failure in docker because of libnvcuvid #102

trifle opened this issue Oct 16, 2020 · 14 comments · Fixed by #104

Comments

@trifle
Copy link

trifle commented Oct 16, 2020

Hi,

this is a bug report because some people get stuck at this point, but I'm not quite convinced it's your fault :)

Issue: cmake fails to find libnvcuvid when you try to build decord in an nvidia cuda container will all necessary libs linked in (-e NVIDIA_DRIVER_CAPABILITIES=all - this should provide the container with pretty much everything that's in CUDA, cuvid, cudnn and so on).

[edit]: Just to add, this is using the official nvidia pre-built cuda dev container nvidia/cuda:10.1-cudnn7-devel

That should obviously not happen. I've also successfully built ffmpeg with all the cuvid accelerations, dlib and other software - all successfully using cmake.

So why is this happening? The container has libnvcuvid here:

lrwxrwxrwx 1 root root   20 Oct 16 15:24 /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 -> libnvcuvid.so.450.57
-rw-r--r-- 1 root root 3.6M Jul  5 15:12 /usr/lib/x86_64-linux-gnu/libnvcuvid.so.450.57

This is kind of strange. We know that libnvcuvid is not part of cuda but part of the driver. The host has 450.57 installed, so that's why the lib is versioned and provided via a link. But!! Nvidia itself recommends against using *.so.1 to load dlibs - instead, one apparently should use libnvcuvid.so(no suffix). I guess that's what you guys do.

After linking /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 to libnvcuvid.so, everything works as expected.

So, maybe you want to add some safety check to your cmake config to catch this behavior? Or document it somewhere? (I've at least written up this report so people can find it via the issues).

BTW, thanks a lot for decord and your hard work!

@bravma
Copy link

bravma commented Oct 16, 2020

I have exactly this problem. Could you post your full Dockerfile?

@trifle
Copy link
Author

trifle commented Oct 17, 2020

I have exactly this problem. Could you post your full Dockerfile?

Good to see. Hope my hints above help you.
I'll try to find the time to extract the non-proprietary parts of the dockerfile, but can't promise. What are you especially interested in? You can work around the build issue by simply linking the library:

ln -s /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 <<<YOUR CUDA DIR>>/libnvcuvid.so

@bravma
Copy link

bravma commented Oct 17, 2020

Thank you for your response. Much appreciated. I have created the symbolic link. However, it seems like my docker container still does not mount the library, it is missing. Building the library therefore fails.

-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=/usr/local/cuda/lib64/libnvrtc.so
-- Found CUDA_CUDNN_LIBRARY=/usr/lib/x86_64-linux-gnu/libcudnn.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/lib/x86_64-linux-gnu/libcublas.so
-- Found CUDA_NVIDIA_ML_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
-- Found CUDA_NVCUVID_LIBRARY=CUDA_NVCUVID_LIBRARY-NOTFOUND

I added ENV NVIDIA_DRIVER_CAPABILITIES video,compute,utility to my Dockerfile.

If I enter ldconfig -p | grep libnvcuvid in the host I get the following output:

        libnvcuvid.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1
        libnvcuvid.so.1 (libc6) => /usr/lib/i386-linux-gnu/libnvcuvid.so.1
        libnvcuvid.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so
        libnvcuvid.so (libc6) => /usr/lib/i386-linux-gnu/libnvcuvid.so

However, I don't get any output in the container...

@trifle
Copy link
Author

trifle commented Oct 17, 2020

@bravma Curious. Try this:

sudo docker run --rm -it --gpus all -e NVIDIA_DRIVER_CAPABILITIES=video,compute,utility nvidia/cuda:10.1-cudnn7-devel ldconfig -p | grep libnvcuvid

It should output

	libnvcuvid.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1

If not, then there's something fundamentally broken, perhaps with the nvidia docker runtime?

@bravma
Copy link

bravma commented Oct 17, 2020

You're right, I got exactly this output libnvcuvid.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1.

Now if I link the library correctly, cmake finds it. However, during building it throws the following error:

make[2]: *** No rule to make target '/usr/lib/x86_64-linux-gnu/libnvcuvid.so', needed by 'CMakeFiles/decord.dir/cmake_device_link.o'.  Stop.

This is the full output that I receive:

Submodule path '3rdparty/dlpack': checked out '5c792cef3aee54ad8b7000111c9dc1797f327b59'
Submodule path '3rdparty/dmlc-core': checked out 'd07fb7a443b5db8a89d65a15a024af6a425615a5'
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
-- Unable to find libavdevice, device input API will not work!
-- Found FFMPEG or Libav: /usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavfilter.so;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavutil.so, /usr/include/x86_64-linux-gnu
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
FFMPEG_INCLUDE_DIR = /usr/include/x86_64-linux-gnu
FFMPEG_LIBRARIES = /usr/lib/x86_64-linux-gnu/libavformat.so;/usr/lib/x86_64-linux-gnu/libavfilter.so;/usr/lib/x86_64-linux-gnu/libavcodec.so;/usr/lib/x86_64-linux-gnu/libavutil.so
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
-- Found CUDA_CUDA_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
-- Found CUDA_CUDART_LIBRARY=/usr/local/cuda/lib64/libcudart.so
-- Found CUDA_NVRTC_LIBRARY=/usr/local/cuda/lib64/libnvrtc.so
-- Found CUDA_CUDNN_LIBRARY=/usr/lib/x86_64-linux-gnu/libcudnn.so
-- Found CUDA_CUBLAS_LIBRARY=/usr/lib/x86_64-linux-gnu/libcublas.so
-- Found CUDA_NVIDIA_ML_LIBRARY=/usr/local/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so
-- Found CUDA_NVCUVID_LIBRARY=/usr/lib/x86_64-linux-gnu/libnvcuvid.so
-- Build with CUDA support
-- Configuring done
-- Generating done
-- Build files have been written to: /decord/build
Scanning dependencies of target decord
[  2%] Building CXX object CMakeFiles/decord.dir/src/runtime/c_runtime_api.cc.o
[  5%] Building CXX object CMakeFiles/decord.dir/src/runtime/cpu_device_api.cc.o
[  8%] Building CXX object CMakeFiles/decord.dir/src/runtime/dso_module.cc.o
[ 11%] Building CXX object CMakeFiles/decord.dir/src/runtime/file_util.cc.o
[ 13%] Building CXX object CMakeFiles/decord.dir/src/runtime/module.cc.o
[ 16%] Building CXX object CMakeFiles/decord.dir/src/runtime/module_util.cc.o
[ 19%] Building CXX object CMakeFiles/decord.dir/src/runtime/ndarray.cc.o
[ 22%] Building CXX object CMakeFiles/decord.dir/src/runtime/registry.cc.o
[ 25%] Building CXX object CMakeFiles/decord.dir/src/runtime/str_util.cc.o
[ 27%] Building CXX object CMakeFiles/decord.dir/src/runtime/system_lib_module.cc.o
[ 30%] Building CXX object CMakeFiles/decord.dir/src/runtime/thread_pool.cc.o
[ 33%] Building CXX object CMakeFiles/decord.dir/src/runtime/threading_backend.cc.o
[ 36%] Building CXX object CMakeFiles/decord.dir/src/runtime/workspace_pool.cc.o
[ 38%] Building CXX object CMakeFiles/decord.dir/src/video/logging.cc.o
[ 41%] Building CXX object CMakeFiles/decord.dir/src/video/storage_pool.cc.o
[ 44%] Building CXX object CMakeFiles/decord.dir/src/video/video_interface.cc.o
[ 47%] Building CXX object CMakeFiles/decord.dir/src/video/video_loader.cc.o
[ 50%] Building CXX object CMakeFiles/decord.dir/src/video/video_reader.cc.o
[ 52%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_file_order_sampler.cc.o
[ 55%] Building CXX object CMakeFiles/decord.dir/src/sampler/random_sampler.cc.o
[ 58%] Building CXX object CMakeFiles/decord.dir/src/sampler/sequential_sampler.cc.o
[ 61%] Building CXX object CMakeFiles/decord.dir/src/sampler/smart_random_sampler.cc.o
[ 63%] Building CXX object CMakeFiles/decord.dir/src/video/ffmpeg/filter_graph.cc.o
[ 66%] Building CXX object CMakeFiles/decord.dir/src/video/ffmpeg/threaded_decoder.cc.o
[ 69%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_context.cc.o
[ 72%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_decoder_impl.cc.o
[ 75%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_mapped_frame.cc.o
[ 77%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_parser.cc.o
[ 80%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_stream.cc.o
[ 83%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_texture.cc.o
[ 86%] Building CXX object CMakeFiles/decord.dir/src/video/nvcodec/cuda_threaded_decoder.cc.o
[ 88%] Building CXX object CMakeFiles/decord.dir/src/runtime/cuda/cuda_device_api.cc.o
[ 91%] Building CXX object CMakeFiles/decord.dir/src/runtime/cuda/cuda_module.cc.o
[ 94%] Building CUDA object CMakeFiles/decord.dir/src/improc/improc.cu.o
make[2]: *** No rule to make target '/usr/lib/x86_64-linux-gnu/libnvcuvid.so', needed by 'CMakeFiles/decord.dir/cmake_device_link.o'.  Stop.
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/decord.dir/all' failed
make[1]: *** [CMakeFiles/decord.dir/all] Error 2
make: *** [all] Error 2
Makefile:129: recipe for target 'all' failed

@bravma
Copy link

bravma commented Oct 18, 2020

I was able to build the library by copying libnvcuvid.so.440.100 to libnvcuvid.so. Of course this is more of a hack than a real solution.

Now if I load a video using the following code, I get this strange exception.

reader = VideoReader(video_path, ctx=gpu(0), width=width, height=height)
    frames_to_skip = int(reader.get_avg_fps() / fps)
    indices = list(range(0, len(reader), frames_to_skip))
    frames = reader.get_batch(indices)
[09:52:48] /usr/lib/x86_64-linux-gnu/decord/src/video/nvcodec/cuda_threaded_decoder.cc:36: Using device: Quadro RTX 8000
[09:52:48] /usr/lib/x86_64-linux-gnu/decord/src/video/nvcodec/cuda_threaded_decoder.cc:56: Kernel module version 440.1, so using our own stream.
Traceback (most recent call last):
  File "decord_loader_test.py", line 67, in <module>
    TikTokVideoGeneratorTest().test_loader()
  File "decord_loader_test.py", line 38, in test_loader
    video = load_video_decord(path, settings.fps, settings.width, settings.height, tf_bridge=False)
  File "../preprocessing/video_loader.py", line 17, in load_video_decord
    frames = reader.get_batch(indices)
  File "/usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/video_reader.py", line 163, in get_batch
    arr = _CAPI_VideoReaderGetBatch(self._handle, indices)
  File "/usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/_ffi/_ctypes/function.py", line 175, in __call__
    ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
  File "/usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/_ffi/base.py", line 63, in check_call
    raise DECORDError(py_str(_LIB.DECORDGetLastError()))
decord._ffi.base.DECORDError: [09:52:48] /usr/lib/x86_64-linux-gnu/decord/src/video/video_reader.cc:559: Error seeking keyframe: 250 with total frames: 451

Stack trace returned 10 entries:
[bt] (0) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(dmlc::StackTrace[abi:cxx11](unsigned long)+0x9d) [0x7f394ac35c9d]
[bt] (1) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x45) [0x7f394ac35fdb]
[bt] (2) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(decord::VideoReader::CheckKeyFrame()+0x1c1) [0x7f394ac8af19]
[bt] (3) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(decord::VideoReader::SeekAccurate(long)+0x120) [0x7f394ac88fc8]
[bt] (4) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(decord::VideoReader::GetBatch(std::vector<long, std::allocator<long> >, decord::runtime::NDArray)+0x7a6) [0x7f394ac8bb44]
[bt] (5) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(+0x17472a) [0x7f394ac7672a]
[bt] (6) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(+0x176f56) [0x7f394ac78f56]
[bt] (7) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(std::function<void (decord::runtime::DECORDArgs, decord::runtime::DECORDRetValue*)>::operator()(decord::runtime::DECORDArgs, decord::runtime::DECORDRetValue*) const+0x5a) [0x7f394ac3a216]
[bt] (8) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(decord::runtime::PackedFunc::CallPacked(decord::runtime::DECORDArgs, 
decord::runtime::DECORDRetValue*) const+0x30) [0x7f394ac385c0]
[bt] (9) /usr/local/lib/python3.6/dist-packages/decord-0.4.1-py3.6.egg/decord/libdecord.so(DECORDFuncCall+0x95) [0x7f394ac33381]

If I randomly index some frames it works. Accessing key frames on the other hand throws this exception...

@zhreshold
Copy link
Member

First of all, thanks for digging into the issue.

  • for the build/link error, since there's no fundamental documentation on how libnvcuvid.so is organized in different cuda distributions, fundamentally the logic to find the dylib is flawed: https://github.com/dmlc/decord/blob/master/cmake/util/FindCUDA.cmake#L103

  • for the runtime error, if you can provide more details, e.g. the video you are using, I can probably locate the problem

@zhreshold
Copy link
Member

@trifle The runtime error should be fixed by #103, I also improved the readme in #104, it would be great if you can contribute a clean and working dockerfile to help others!

@trifle
Copy link
Author

trifle commented Oct 19, 2020

@zhreshold Wonderful, many thanks for your help!
I'm pessimistic about my time budget, but yes, a working dockerfile would be great.
If I ever get to making one, I'll let you know!

@bravma
Copy link

bravma commented Oct 20, 2020

Thank you all for your help!

@levan92 levan92 mentioned this issue Jun 29, 2021
@trifle
Copy link
Author

trifle commented Sep 20, 2021

Sorry to resurrect this issue - but this might help some people coming here for assistance.

I've ran across the issue that @bravma encountered ( No rule to make target '/usr/lib/x86_64-linux-gnu/libnvcuvid.so').
The cause is: Lacking access to a GPU during the build phase of the container.

You need to configure the docker daemon to use nvidia's runtime by default, then the issue does not occur.

@prithvinambiar
Copy link

prithvinambiar commented Oct 23, 2021

@trifle - How do you configure the docker daemon to use nvidia's runtime by default? Thanks for your help.

@trifle
Copy link
Author

trifle commented Oct 23, 2021

@prithvinambiar
Add the line "default-runtime": "nvidia" to /etc/docker/daemon.json and restart the docker daemon.
There is some documentation out there that you can find easily.

YuanmingLeee referenced this issue Jan 26, 2022
* dockerfile up

* rename dockerfile to reflectpurpose
@zcunyi
Copy link

zcunyi commented Apr 19, 2022

I have exactly this problem. Could you post your full Dockerfile?

Good to see. Hope my hints above help you. I'll try to find the time to extract the non-proprietary parts of the dockerfile, but can't promise. What are you especially interested in? You can work around the build issue by simply linking the library:

ln -s /usr/lib/x86_64-linux-gnu/libnvcuvid.so.1 <<<YOUR CUDA DIR>>/libnvcuvid.so

I used this method, but I also have the problems below:
after cmake:
image
after make:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants