Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudla import external semaphore FAILED #15

Open
WangFengtu1996 opened this issue Jan 22, 2024 · 10 comments
Open

cudla import external semaphore FAILED #15

WangFengtu1996 opened this issue Jan 22, 2024 · 10 comments

Comments

@WangFengtu1996
Copy link

  • I can run deme in hybrid mode successfully.
  • When using standalone mode, the error I got cudla import external semaphore FAILED 1
(base) orin@orin-root:~/workspace/cuDLA-samples$ make run USE_DLA_STANDALONE_MODE=1 -j
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -O2 -c -o build/validate_coco.o src/validate_coco.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -O2 -c -o build/cudla_context_standalone.o src/cudla_context_standalone.cpp
g++ --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -O2 -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include  -o ./build/cudla_yolov5_app build/decode_nms.o build/validate_coco.o build/yolov5.o build/cudla_context_hybrid.o build/cudla_context_standalone.o -l cudla -L/usr/local/cuda/lib64 -l cuda -l cudart -l nvinfer -L /usr/local/lib/ -l opencv_objdetect -l opencv_highgui -l opencv_imgproc -l opencv_core -l opencv_imgcodecs -L ./src/matx_reformat/build/ -l matx_reformat -l jsoncpp -L /usr/lib/aarch64-linux-gnu/tegra -l nvscibuf -l nvscisync
././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8
[standalone mode] create CUDLA device SUCCESS
[standalone mode] load cuDLA module from memory SUCCESS
[standalone mode] get number of input tensors SUCCESS
[standalone mode] numInputTensors = 1
[standalone mode] get number of output tensors SUCCESS
[standalone mode] numOutputTensors = 3
[standalone mode] get input tensor descriptor SUCCESS
[standalone mode] get output tensor descriptor SUCCESS
[standalone mode] Printing inputs tensor descriptor
[standalone mode] Printing output tensor descriptor
[standalone mode] open NvSci buffer module SUCCESS
[standalone mode] -------------------------------------------
[standalone mode] TENSOR NAME : images'
[standalone mode] size: 1806336
[standalone mode] dims: [1, 4, 672, 672]
[standalone mode] data fmt: 2
[standalone mode] data type: 4
[standalone mode] data category: 0
[standalone mode] pixel fmt: 12
[standalone mode] pixel mapping: 0
[standalone mode] stride[0]: 1
[standalone mode] stride[1]: 2688
[standalone mode] stride[2]: 0
[standalone mode] stride[3]: 0
[standalone mode] create NvSci buffer attr list SUCCESS
[standalone mode] set NvSci buffer attr list SUCCESS
[standalone mode] reconcile NvSciBuf attribute list SUCCESS
[standalone mode] alloc NvSciBuf Obj SUCCESS
[standalone mode] import memory to cudla SUCCESS
[standalone mode] import external memory to cuda SUCCESS
[standalone mode] map external memory to cuda buffer SUCCESS
[standalone mode] -------------------------------------------
[standalone mode] -------------------------------------------
[standalone mode] TENSOR NAME : s8'
[standalone mode] size: 3612672
[standalone mode] dims: [1, 255, 84, 84]
[standalone mode] data fmt: 3
[standalone mode] data type: 2
[standalone mode] data category: 2
[standalone mode] pixel fmt: 36
[standalone mode] pixel mapping: 0
[standalone mode] stride[0]: 2
[standalone mode] stride[1]: 2688
[standalone mode] stride[2]: 225792
[standalone mode] stride[3]: 0
[standalone mode] create NvSci buffer attr list SUCCESS
[standalone mode] set NvSci buffer attr list SUCCESS
[standalone mode] reconcile NvSciBuf attribute list SUCCESS
[standalone mode] alloc NvSciBuf Obj SUCCESS
[standalone mode] import memory to cudla SUCCESS
[standalone mode] import external memory to cuda SUCCESS
[standalone mode] map external memory to cuda buffer SUCCESS
[standalone mode] -------------------------------------------
[standalone mode] -------------------------------------------
[standalone mode] TENSOR NAME : s16'
[standalone mode] size: 903168
[standalone mode] dims: [1, 255, 42, 42]
[standalone mode] data fmt: 3
[standalone mode] data type: 2
[standalone mode] data category: 2
[standalone mode] pixel fmt: 36
[standalone mode] pixel mapping: 0
[standalone mode] stride[0]: 2
[standalone mode] stride[1]: 1344
[standalone mode] stride[2]: 56448
[standalone mode] stride[3]: 0
[standalone mode] create NvSci buffer attr list SUCCESS
[standalone mode] set NvSci buffer attr list SUCCESS
[standalone mode] reconcile NvSciBuf attribute list SUCCESS
[standalone mode] alloc NvSciBuf Obj SUCCESS
[standalone mode] import memory to cudla SUCCESS
[standalone mode] import external memory to cuda SUCCESS
[standalone mode] map external memory to cuda buffer SUCCESS
[standalone mode] -------------------------------------------
[standalone mode] -------------------------------------------
[standalone mode] TENSOR NAME : s32'
[standalone mode] size: 225792
[standalone mode] dims: [1, 255, 21, 21]
[standalone mode] data fmt: 3
[standalone mode] data type: 2
[standalone mode] data category: 2
[standalone mode] pixel fmt: 36
[standalone mode] pixel mapping: 0
[standalone mode] stride[0]: 2
[standalone mode] stride[1]: 672
[standalone mode] stride[2]: 14112
[standalone mode] stride[3]: 0
[standalone mode] create NvSci buffer attr list SUCCESS
[standalone mode] set NvSci buffer attr list SUCCESS
[standalone mode] reconcile NvSciBuf attribute list SUCCESS
[standalone mode] alloc NvSciBuf Obj SUCCESS
[standalone mode] import memory to cudla SUCCESS
[standalone mode] import external memory to cuda SUCCESS
[standalone mode] map external memory to cuda buffer SUCCESS
[standalone mode] -------------------------------------------
[standalone mode] create NvSci sync module SUCCESS
[standalone mode] create NvSci waiter attr list SUCCESS
[standalone mode] create NvSci signal attr list SUCCESS
[standalone mode] get NvSci waiter sync attributes SUCCESS
[standalone mode] cuda get NvSci signal list SUCCESS
[standalone mode] reconciled NvSci sync attr list SUCCESS
[standalone mode] allocate NvSci sync object SUCCESS
[standalone mode] cudla import external semaphore FAILED in src/cudla_context_standalone.cpp:312, CUDLA ERR: 13
make: *** [Makefile:80: run] Error 1
@2yjia
Copy link

2yjia commented Jan 22, 2024

I can run both modes, but the inference time for each image is 20ms, which is different from what the experiment says, please ask what is the time of your hybrid mode @WangFengtu1996

@WangFengtu1996
Copy link
Author

@2yjia I can not understand why I can not run successfully in standalone mode. The inference time is about 17ms ~20ms. when warmup is finished, the inference time is shortened. My platform is nvidia jetson AGX ORIN DK. would you give me some guide that inference in standalone mode? thks.

@WangFengtu1996
Copy link
Author

@2yjia 我参考了这个issue #7
但是,我这边遇到新的问题

py310) orin@orin-root:~/workspace/cuDLA-samples$ make validate_cudla_int8 USE_DLA_STANDALONE_MODE=1  USE_DETERMINISTIC_SEMAPHORE=1 -j
/usr/local/cuda/bin/nvcc -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -gencode arch=compute_87,code=sm_87 -c -o build/decode_nms.o src/decode_nms.cu
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/validate_coco.o src/validate_coco.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/yolov5.o src/yolov5.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/cudla_context_hybrid.o src/cudla_context_hybrid.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/cudla_context_standalone.o src/cudla_context_standalone.cpp
src/cudla_context_standalone.cpp: In member function ‘void cuDLAContextStandalone::initialize()’:
src/cudla_context_standalone.cpp:324:19: error: ‘NvSciSyncFenceUpdateFence’ was not declared in this scope; did you mean ‘NvSciSyncObjGenerateFence’?
  324 |     m_nvsci_err = NvSciSyncFenceUpdateFence(m_WaitEventContext.sync_obj, m_WaiterID, m_WaiterValue, m_WaitEventContext.nvsci_fence_ptr);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncObjGenerateFence
src/cudla_context_standalone.cpp:326:19: error: ‘NvSciSyncFenceExtractFence’ was not declared in this scope; did you mean ‘NvSciSyncIpcExportFence’?
  326 |     m_nvsci_err = NvSciSyncFenceExtractFence(m_WaitEventContext.nvsci_fence_ptr,&m_WaiterID,&m_WaiterValue);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncIpcExportFence
src/cudla_context_standalone.cpp: In member function ‘int cuDLAContextStandalone::submitDLATask(cudaStream_t)’:
src/cudla_context_standalone.cpp:443:19: error: ‘NvSciSyncFenceExtractFence’ was not declared in this scope; did you mean ‘NvSciSyncIpcExportFence’?
  443 |     m_nvsci_err = NvSciSyncFenceExtractFence(m_WaitEventContext.nvsci_fence_ptr ,&m_WaiterID, &m_WaiterValue);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncIpcExportFence
src/cudla_context_standalone.cpp:445:19: error: ‘NvSciSyncFenceUpdateFence’ was not declared in this scope; did you mean ‘NvSciSyncObjGenerateFence’?
  445 |     m_nvsci_err = NvSciSyncFenceUpdateFence(m_WaitEventContext.sync_obj,
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncObjGenerateFence
make: *** [Makefile:69: build/cudla_context_standalone.o] Error 1
make: *** Waiting for unfinished jobs....

@WangFengtu1996
Copy link
Author

@2yjia 我这边尝试去根据仓库的readme,然后去finetune 模型,导出新的模型,这个流程你走通了么,我在 qat->ptq 这个遇到点问题,缺了输出的这个尺度信息。

(py310) orin@orin-root:~/workspace/cuDLA-samples$ python export/qdq_translator/qdq_translator.py --input_onnx_models=yolov5_trimmed_qat.onnx --output_dir=data/model/ --infer_concat_scales --infer_mul_scales 
INFO:root:Parsing yolov5_trimmed_qat.onnx...
INFO:root:No tensor scales for /model.24/m.0/Conv's output tensor s8
INFO:root:No tensor scales for /model.24/m.1/Conv's output tensor s16
INFO:root:No tensor scales for /model.24/m.2/Conv's output tensor s32

@WangFengtu1996
Copy link
Author

@2yjia 设备信息, 我们一致么?

(base) orin@orin-root:/usr/lib/aarch64-linux-gnu/tegra$ jetson_release
Software part of jetson-stats 4.2.4 - (c) 2024, Raffaello Bonghi
Model: Jetson AGX Orin Developer Kit - Jetpack 5.1.2 [L4T 35.4.1]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - P-Number: p3701-0005
 - Module: NVIDIA Jetson AGX Orin (64GB ram)
Platform:
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.120-tegra
jtop:
 - Version: 4.2.4
 - Service: Active
Libraries:
 - CUDA: 11.4.315
 - cuDNN: 8.6.0.166
 - TensorRT: 5.1.2
 - VPI: 2.3.9
 - Vulkan: 1.3.204
 - OpenCV: 4.6.0 - with CUDA: YES

@2yjia
Copy link

2yjia commented Jan 22, 2024

Software part of jetson-stats 4.2.4 - (c) 2024, Raffaello Bonghi
Model: Jetson AGX Orin - Jetpack 5.1 [L4T 35.2.1]
NV Power Mode[2]: MODE_30W
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:

  • P-Number: p3701-0005
  • Module: NVIDIA Jetson AGX Orin (64GB ram)
    Platform:
  • Distribution: Ubuntu 20.04 focal
  • Release: 5.10.104-tegra
    jtop:
  • Version: 4.2.4
  • Service: Active
    Libraries:
  • CUDA: 11.4.315
  • cuDNN: 8.6.0.166
  • TensorRT: 5.1
  • VPI: 2.2.4
  • Vulkan: 1.3.204
  • OpenCV: 4.5.4 - with CUDA: NO
    @WangFengtu1996

@2yjia
Copy link

2yjia commented Jan 22, 2024

@2yjia 我尝试去根据仓库的自述文件,然后去微调模型,导出新的模型,这个流程你走通了么,我在qat->ptq这个遇到点问题,缺了输出的这个图形信息。

(py310) orin@orin-root:~/workspace/cuDLA-samples$ python export/qdq_translator/qdq_translator.py --input_onnx_models=yolov5_trimmed_qat.onnx --output_dir=data/model/ --infer_concat_scales --infer_mul_scales 
INFO:root:Parsing yolov5_trimmed_qat.onnx...
INFO:root:No tensor scales for /model.24/m.0/Conv's output tensor s8
INFO:root:No tensor scales for /model.24/m.1/Conv's output tensor s16
INFO:root:No tensor scales for /model.24/m.2/Conv's output tensor s32

同样的问题,运行程序后生成了noqdq.onnx,我用这个onnx进行推理部署有一定的问题,不知道作者的fp16和int8两个onnx怎么生成的

@WangFengtu1996
Copy link
Author

@2yjia 我参考了这个issue #7 但是,我这边遇到新的问题

py310) orin@orin-root:~/workspace/cuDLA-samples$ make validate_cudla_int8 USE_DLA_STANDALONE_MODE=1  USE_DETERMINISTIC_SEMAPHORE=1 -j
/usr/local/cuda/bin/nvcc -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -gencode arch=compute_87,code=sm_87 -c -o build/decode_nms.o src/decode_nms.cu
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/validate_coco.o src/validate_coco.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/yolov5.o src/yolov5.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/cudla_context_hybrid.o src/cudla_context_hybrid.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/cudla_context_standalone.o src/cudla_context_standalone.cpp
src/cudla_context_standalone.cpp: In member function ‘void cuDLAContextStandalone::initialize()’:
src/cudla_context_standalone.cpp:324:19: error: ‘NvSciSyncFenceUpdateFence’ was not declared in this scope; did you mean ‘NvSciSyncObjGenerateFence’?
  324 |     m_nvsci_err = NvSciSyncFenceUpdateFence(m_WaitEventContext.sync_obj, m_WaiterID, m_WaiterValue, m_WaitEventContext.nvsci_fence_ptr);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncObjGenerateFence
src/cudla_context_standalone.cpp:326:19: error: ‘NvSciSyncFenceExtractFence’ was not declared in this scope; did you mean ‘NvSciSyncIpcExportFence’?
  326 |     m_nvsci_err = NvSciSyncFenceExtractFence(m_WaitEventContext.nvsci_fence_ptr,&m_WaiterID,&m_WaiterValue);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncIpcExportFence
src/cudla_context_standalone.cpp: In member function ‘int cuDLAContextStandalone::submitDLATask(cudaStream_t)’:
src/cudla_context_standalone.cpp:443:19: error: ‘NvSciSyncFenceExtractFence’ was not declared in this scope; did you mean ‘NvSciSyncIpcExportFence’?
  443 |     m_nvsci_err = NvSciSyncFenceExtractFence(m_WaitEventContext.nvsci_fence_ptr ,&m_WaiterID, &m_WaiterValue);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncIpcExportFence
src/cudla_context_standalone.cpp:445:19: error: ‘NvSciSyncFenceUpdateFence’ was not declared in this scope; did you mean ‘NvSciSyncObjGenerateFence’?
  445 |     m_nvsci_err = NvSciSyncFenceUpdateFence(m_WaitEventContext.sync_obj,
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncObjGenerateFence
make: *** [Makefile:69: build/cudla_context_standalone.o] Error 1
make: *** Waiting for unfinished jobs....

@2yjia 关于我这个问题,你能在解压nvsci*出来目录,帮我 grep 下着两个函数,看下结果么? 十分感谢哈

# 进入 nvsci_headers.tbz2 解压目录
grep -nr "NvSciSyncFenceUpdateFence"

grep -nr "NvSciSyncObjGenerateFence"

@ou525
Copy link

ou525 commented Jan 26, 2024

image
image
I encountered the same problem, has it been solved?

@mchi-zg
Copy link
Collaborator

mchi-zg commented May 8, 2024

Hi All, could you try this on Jetpack 6.0 DP+. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants