-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run Deeplabv3 (resnet 101 backbone) from pytorch framework on Xilinx at good performance metric #40
Comments
Hi @abdulazizm , it looks like the issue is that the
|
@jtuyls Hope that's a good catch, not getting an assertion error now. But there seem to be some other naming convention issues too. Let me know if I missed anything.
Thanks!! |
@abdulazizm It looks like the regular expressions
|
@jtuyls Thanks for the quick reply. Yeah, I missed copying import into my codebase. Now it builds and exports the library successfully. Will check this on the EDGE device and share the inference metrics with you shortly. And also I am trying to benchmark the results, could you please let me know If there are any special API to calculate FPS, Latency, Throughput, or any other metrics calculation (from TVM or Pyxir)? Currently, I am calculating the time taken for each inference, by the time difference before and after Thanks, |
@jtuyls Happy to share that we can now inference at 0.35s (earlier it was 3.5s - drastic improvement). this is a great drastic improvement from performance point of view. Thanks for the support and is there any other suggestions from your side to improve performance further? Finetuning suggestions?
Reiterating from previous comment: "And also I am trying to benchmark the results, could you please let me know If there are any special API to calculate FPS, Latency, Throughput, or any other metrics calculation (from TVM or Pyxir)? Currently, I am calculating the time taken for each inference, by the time difference before and after inferencesession.run() call". |
@abdulazizm Great to hear that the inference performance is 10x from previously now! Still, unless there are quite some expensive operations that you expect will need to be executed on CPU, I think it might be possible to improve further. For this, I would have a look at what parts of the model are offloaded to the DPU and CPU respectively. You can check this by inspecting the TVM module after the PartitionGraph transformation:
If this
We usually calculate latency the same way for experimentation. If you want to really benchmark I would use the TVM time evaluator function You can use it like this:
For FPS, we calculate it as |
@jtuyls Thanks for the very detailed reply. Can able to use the TVM time evaluator function, this helps a lot. And used mod['main'] to find out some layers running on the CPU. Updated model (removed dropout layer, changed Still, I can see 3 conv2d layers running in the CPU. Checked with DPU constraints, seems to be fine. Not sure why it isn't offloaded to DPU. Any suggestions? mod['main'] - output
Hope we can merge the changes in "pyxir/python/pyxir/contrib/target/components/DPUCZDX8G/dnnc_output.py" file to the master branch. |
@abdulazizm Getting the remaining 3 conv2d operations into the DPU should get you another performance improvement indeed. I think the
And then you will also have to add
|
@jtuyls Thanks for the reply, Jorn. I was using the desired layout for nn.conv2d as 'OIHW' instead of 'default'. Tried with the suggestions for image.resize, but still getting the same mod['main'] and inference time (image.resize at mod['main']). Not sure why, are we missing something? 1,
2, FYI: I am not using the current master branch of Pyxir and TVM. I am in "dev-rf-test-0" pyxir branch commit (485b7c1). Hope this should not be an issue. |
@abdulazizm Using 'OIHW' is fine. With 'default' you will probably have to add this line So, approach 4 Before, image.resize looked like this:
If not being included in the
If this is not the case, then there is an issue with the NCHW -> NHWC transformation. |
@jtuyls Yes, the mod['main'] seems to be the same before and after image.resize change. And I can also notice that the convert_image_resize() function is not called by any means while compilation (is it not registered properly?). While getting into TVM code, it seems that it has 'NCHW' as the default layout (will rebuilding TVM with default layout as "NHWC" helps?). Guess you are right, there is an issue with the NCHW -> NHWC transformation. |
@abdulazizm Found out that we also needed to add a InferCorrectLayout function and created a TVM PR for this: apache/tvm#8205 |
@jtuyls Yeah sure Jorn. Thanks for creating a PR and pushing the feature. Please let me know once it's merged to the master branch, will give you a shot and let you know the inference benchmark. I just started exploring the petalinux workflow, will update if I struck somewhere. |
@jtuyls Hi Jorn, This is on petalinux based support. Seems deeplabv3 model has 3 subgraphs. How to load models with such subgraphs greater than 1?
Noticed some workaround here for CPP with VART (Xilinx/Vitis-AI#153). Not sure how to get it down with Python - TVM |
Hi @jornt-xilinx , Tested your suggestion of changing image.resize2d layout (NHWC to NCHW) with recent TVM and PYXIR versions, it eliminated most layout_transforms in mod["main"], but couldn't make those 3 conv2d layers into mod['vitis_ai_0']. Are there any further suggestions w.r.t. these conv2d layers? mod["main"]
|
@abdulazizm I think those three Conv2D operations are not included in the Vitis AI partition because the |
Trying to build deeplabv3 from PyTorch to deploy on top of Xilinx EDGE device - zcu104. Facing some issues while quantizing. For good metrics as suggested in #33 (comment) updated model with these changes:
<= bank_depth/2) - Resnet 101 backbone has 2048 size conv2d layers (not supported on dpu) - Edited final layer to have 1024 max channels, perfectly eligible to run in dpu.
Every change seems to be not affecting quantization/compilation except changing padding & dilation of (4,4) to (2,2) which is important for good inference metrics.
Attached mod['main'] after mergecompiler ->
mod_main_after_mergecompiler.txt
@jornt-xilinx Need your support. Created a new issue here for easy tracking.
The text was updated successfully, but these errors were encountered: