-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pynq_driver] fix device early return #7
Conversation
@tqchen @tmoreau89 Could you kindly help review this PR? Thanks. |
@zhanghaohit, thanks for the contribution, do we know the root cause for this early return issue? a sleep solution may not always work and would impact performance, we better to avoid that. |
Thanks @zhanghaohit for the fix. I'm curious when the early completion occurs as @huajsj pointed out. Ideally the driver shouldn't rely on polling and instead use a more robust mechanism like triggering an interrupt. |
Thanks @huajsj @tmoreau89 for pointing out the potential performance issue for this fix. The root cause of the issue is because on line 124, we instruct the VTA to start calculation by writing to the VTA_START bit of control register. The FPGA will then respond by changing its VTA_COMPUTE_DONE bit of control register from '1' to '0', indicating a "busy" state of the VTA hardware. And VTA will change it back to '1' after the completion of the calculation. The original code created a racing condition here between the processor code and the FPGA hardware, by attempting to check the VTA_COMPUTE_DONE bit immediately after the issuing of "start" command. If we got a more powerful processor, the checking of VTA_COMPUTE_DONE bit will happen before FPGA could ever change it to '0'. As @tmoreau89 mentioned, we are using a pooling approach here. And in my opinion, a short sleep before checking is quite common in the pooling world. Here we proposed a 1us sleep, which translates to around ~250 clock cycles on FPGA, it would give FPGA sufficient time to change VTA_COMPUTE_DONE from '1' to '0'. As 250 cycles are way shorter than any meaningful calculations, I believe there should not be any performance impact. I agree with @tmoreau89 that interrupt is a more robust solution compare to pooling, but it is quite a huge task involving changes from both HW and driver archs. |
Thank you @remotego for this logical explanation. The fix makes sense, and is highly likely not to affect performance negatively. |
@@ -126,6 +127,10 @@ class VTADevice { | |||
VTAWriteMappedReg(vta_compute_handle_, 0x0, VTA_AUTORESTART); | |||
VTAWriteMappedReg(vta_store_handle_, 0x0, VTA_AUTORESTART); | |||
|
|||
// Allow device to respond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add more explanation here to make sure that this sleep is well understood for posterity (the summarized comment by @remotego should suffice)
@remotego, Thanks for the detailed explain, but for this problem I think you may have a glance on related pynq driver logic, actually there is no "racing condition" between "processor" and "FPGA" , because they are checking on different register. on line 124 the VTA_START get write into register with offset "0", and in line 134 the logic is go to check register with offset VTA_COMPUTE_DONE_RD_OFFSET which is "24", i think the work ground may work in some scenario but if we can not find root cause it may not reliable. |
@huajsj Thank you for the reply. Let me explain more on this issue. The original name of 0x18 (24) register of Compute Module is XCOMPUTE_CONTROL_BUS_ADDR_DONE_O_DATA, it is a output from the FPGA hardware. From the point of view of S/W, it is a read-only register. Thus there is no way for software (i.e. driver) code to change the content of it. And this register is fully controlled by FPGA H/W.
If we trace the controlling of this register in h/w code, we could see the register will be set to '0' at the beginning compute module, and sets to '1' only when a FINISH instruction is encountered.
When we start the VTA hardware by this code block,
The fetch module will dispatch the first compute instruction to compute module, and module will then set the Done register to '0'. However, at the same time, the driver code will attempt to check the value of Done register in a pooling loop. And if the Done register is equal to 1. the driver will break and return.
Thus we have a racing condition here:
|
@remotego , thanks for the detailed explain, the logic make sense, code LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response and explanation. This fix is good to go.
Bug
VTADeviceRun may return early before completion of the device execution
Solution
sleep a while before checking