Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

用在Python程序中,Python运行结束时析构独显设备的VulkanDevice时释放dummy_image失败引发SIGSEGV #2666

Closed
ArchieMeng opened this issue Feb 6, 2021 · 3 comments

Comments

@ArchieMeng
Copy link
Contributor

ArchieMeng commented Feb 6, 2021

问题描述:

waifu2x-ncnn-vulkan-python(封装了waifu2x-ncnn-vulkan所以使用了ncnn)的样例程序在运行结束时,Waifu2x对象析构成功后,析构ncnn::g_default_vkdev的dummy_image的时候会发生Segment fault。在核显设备上(i5 1035G7 Iris Plus)不会有问题,但是在另一台独显设备上(1050Ti)会发生。(两台设备均为单GPU,也就是单核显和单独显)。另外,运行原版waifu2x-ncnn-vulkan程序的时候都没有问题。系统均为Arch linux

复现步骤:

1.编译waifu2x-ncnn-vulkan-python
2.到编译目录中运行waifu2x_ncnn_vulkan.py (如果程序中图片路径不对,就修改)

Backtrace Log的获取方式:

cd waifu2x-ncnn-vulkan-python/src/build
gdb python
(gdb) b Waifu2x::~Waifu2x
(gdb) run waifu2x_ncnn_vulkan.py

运行直至crash

GDB crash backtrace:

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff282ab60 in ?? ()
(gdb) backtrace
#0  0x00007ffff282ab60 in ?? ()
#1  0x00007ffff69572ef in ncnn::VkBlobAllocator::fastFree (this=0x555555e2d400, ptr=0x555555e2ebe0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/allocator.cpp:1045
#2  0x00007ffff6830b1d in ncnn::VkImageMat::release (this=0x555555c94d00) at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/mat.h:2217
#3  0x00007ffff6843830 in ncnn::VulkanDevicePrivate::destroy_dummy_buffer_image (this=0x555555c94bb0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/gpu.cpp:1633
#4  0x00007ffff67bb4ca in ncnn::VulkanDevice::~VulkanDevice (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/gpu.cpp:2007
#5  0x00007ffff684341d in ncnn::destroy_gpu_instance () at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/gpu.cpp:1469
#6  0x00007ffff67b9b93 in ncnn::__ncnn_vulkan_instance_holder::~__ncnn_vulkan_instance_holder (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/gpu.cpp:50
#7  0x00007ffff7a45db7 in __run_exit_handlers () from /usr/lib/libc.so.6
#8  0x00007ffff7a45f5e in exit () from /usr/lib/libc.so.6
#9  0x00007ffff7a2e159 in __libc_start_main () from /usr/lib/libc.so.6
#10 0x000055555555504e in _start ()
@nihui
Copy link
Member

nihui commented Feb 6, 2021

那么,在退出前调用 ncnn::destroy_gpu_instance() 可以避免吗?

@ArchieMeng
Copy link
Contributor Author

ArchieMeng commented Feb 7, 2021

那么,在退出前调用 ncnn::destroy_gpu_instance() 可以避免吗?

这样的话,核显设备也Crash了。就是结束时(Python程序的末尾或者Waifu2x类以及派生类析构时调用ncnn::destroy_gpu_instance()),均会引发Crash.不过这种情况下的Backtrace就不一样了。Crash就发生在析构Waifu2x成员变量ncnn::Net net的过程中了。Waifu2xWrapped是Waifu2x的派生类。

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff2841a20 in ?? ()
(gdb) backtrace
#0  0x00007ffff2841a20 in ?? ()
#1  0x00007ffff69604a0 in ncnn::VkWeightAllocator::clear (this=0x55555799f1e0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/allocator.cpp:1115
#2  0x00007ffff68170f4 in ncnn::VkWeightAllocator::~VkWeightAllocator (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/allocator.cpp:1087
#3  0x00007ffff696037e in ncnn::VkWeightAllocator::~VkWeightAllocator (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/allocator.cpp:1090
#4  0x00007ffff687c48a in ncnn::Net::clear (this=0x55555563a508) at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/net.cpp:2504
#5  0x00007ffff67ca516 in ncnn::Net::~Net (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/ncnn/src/net.cpp:1729
#6  0x00007ffff67c137c in Waifu2x::~Waifu2x (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/waifu2x.cpp:25
#7  0x00007ffff67c0e44 in Waifu2xWrapped::~Waifu2xWrapped (this=<optimized out>, this=<optimized out>)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/./waifu2x_wrapped.h:22
#8  0x00007ffff682b49e in _wrap_delete_Waifu2xWrapped (args=0x7ffff6480db0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/build/CMakeFiles/waifu2x_ncnn_vulkan_wrapper.dir/waifu2xPYTHON_wrap.cxx:4595
#9  0x00007ffff6825537 in SwigPyObject_dealloc (v=0x7ffff6480db0)
    at /home/kodi/OpensourceProjects/waifu2x-ncnn-vulkan-python/src/build/CMakeFiles/waifu2x_ncnn_vulkan_wrapper.dir/waifu2xPYTHON_wrap.cxx:1573
#10 0x00007ffff7cfc286 in ?? () from /usr/lib/libpython3.9.so.1.0
#11 0x00007ffff7d31e83 in ?? () from /usr/lib/libpython3.9.so.1.0

@ArchieMeng
Copy link
Contributor Author

我后来在Windows上成功编译了。Windows上倒没有这个问题。我开始怀疑是Nvidia在Linux上的驱动问题。将来如果拿到更多信息,我再reopen吧。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants