-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DISCO-L475VG-IOT01A lp_timeout test fails CI on IAR #11545
Comments
cc @ARMmbed/team-st-mcd |
@Tharazi97 since the commit 9858b16 the target will automatically enter Deep sleep when there is nothing to do. I don't think that the lp_timeout test cases takes this into account. When deep sleep is entered, it will induce latencies of several ms which are not considered in the test cases. By adding some more margin to the tests cases, I can run the tests okay. An alternative way forward could be that deep sleep needs to be disabled. Snapshot of the quick diffs I made to have the test pass OK:
Team in charge of tests could propose the best way forward ... |
3 ms delay is quite a big change in case of 10 ms delay. I think disabling deep sleep is a good way to go. |
@Tharazi97 Now I don't think that the Low Power ticker or timeout drivers have implemented a similar safety mechanism, which mean that they will not cope with the deep sleep latency and risk of being unprecise ... but maybe I'm worng here, maybe @kjbracey-arm can correct me if I'm wrong .... |
Internal Jira reference: https://jira.arm.com/browse/MBOCUSTRIA-1804 |
@LMESTM what's the deep sleep wake up time for this target (I recall some targets have it defined as 1-3ms, we set this to 10ms to be sufficient for a target to wake up and app should not expect anything lower than that). We are testing lp ticker in this test case, should it test both - deep sleep and also shallow sleep? If deep sleep involved, timeout (default is 10 + 2 = 12ms - might not be enough for some). I can't find the time reference in the power management header file at the moment (https://os.mbed.com/docs/mbed-os/v5.13/apis/power-management-sleep.html - does not provide this detail). |
@0xc0170 I think that the exit time is about 3ms as defined in "deep-sleep-latency", that was the value last time I made measurements. Neverthless I need to increase the test delay to TEST_DELAY_MS + 5 be able to PASS the test. TEST_DELAY_MS + 4 wouldn't work. I don't know how the semaphore timeouts are managed ... |
The HAL specification for sleep permits wakeup time from deep sleep to be up to 10ms. So if testing the If you're only intending to test the A broader test might try checking whether the delay exceeds the specified |
I can see in that If latency > 2, that means a couple of back-to-back programmings happening on the way into the sleep for the semaphore wait. Still, when the timers go off, the semaphore acquire should not return until either the sem_callback has been run (meaning the HAL timer indicates time > start + 10ms) or the RTOS thinks time is up (meaning the HAL timer indicates time > start + 12ms). It's a struggle to see how you could get I've walked through the code, and think I can see a failure mode. Consider the sequence:
So there's an effective race if due to wake latency issues, we see the 10ms IRQ and the 12ms wake simultaneously; the wake makes the semaphore return timeout, despite there being a semaphore available - the semaphore "release" is not fully processed until the next time we return to a thread, and at that point we're already returning "timeout". I'm thinking that could potentially be called an RTX bug, although I'm not sure exactly how it might be resolved. However, why does that situation arise here? What's the declared latency? How long might we be getting stuck inside timer reprogramming? |
I guess that failure I describe above happens reliably if declared latency is 0, so no RTOS compensation, but the hardware actually takes >2ms to wake, so we program wake for "start + 10" (for the |
That might be argued don't you think ? lpticker driver could be aware of deep-sleep latency and compensate for it, it could even lock deep-sleep in case the next ticker in the pipe is too close ... ? |
Yes, you could certainly put a pile of intelligence into the HAL driver, but I think it would ultimately make the middle layer's job more confusing. Even before considering variance in HAL implementations. I would be open to maybe moving latency compensation down one layer into the In this test though, there's apparently something weird happening with the semaphores and RTX, together with an apparent failure of the RTOS latency compensation - if that was working we should be out of deep sleep before the 12ms is up, so that the race I think I've deduced shouldn't occur. |
I thought the variance would be managed through deep-sleep-latency
That's I guess the idea here ... If we're not going that way, the impact of deep sleep latency needs to be described in the LP Timer/Ticker/timeout APIs as I think it's not enough to document the power management / sleep part. As of now, users may program a LP timeout to fire after 150µs and will end up with a 3ms delay or so ...
I agree there seems to be something to figure out. I fully trust your analysis as I'm not familiar with RTX :-) |
I agree - the 3ms delay actually impacts many IRQs you could register with
What I'd be interested in from your end is why is the latency compensation not sufficient here? If it was, I think we should handle those semaphores fine. Is the JSON underestimating? Or is there more to it? Aside from that, you could experiment - try making the RTX modification I'm proposing in the linked issue. Just add a call to |
@kjbracey-arm ok I take the point, I need to put in place a setup with debug / measurements ... but can't do it this week unfortunately. |
@LMESTM any progress on this? |
@TuomoHautamaki @kjbracey-arm I'm affraid I could not make any measurement yet (and can't in coming days). I agree this can be helpful but I don't think it would not bring the solution to this issue. |
@TuomoHautamaki I just tried reproducing the issue again on my current working tree, master branch from Oct 23, and the test doesn't fail - are you still seeing the issue ? |
@LMESTM I can see it's still failing but only on IAR |
@TuomoHautamaki my mistake I ran timeout tests instead of lp_timeout tests ... I can still reproduce as well |
@kjbracey-arm @TuomoHautamaki I found some time to make measurements and fount out that the total wake-up latency is actually around 3,7ms, this time is used on this board for restoring the complete clock tree: LSE, MSI clocks and PLLs. This may be possible to optimize, but for sake of simplicity and maintenance we go through all the clock settings scheme again. I can increase the deep-sleep latency for this board to 4ms, which actually makes the tests PASS as far as I've seen. That doesn't explain everything, the test makes use of 2ms DELTA so it should be okay with the 1ms extra latency, but this is not the case. Nevertheless the low power ticker fires 2ms later than expected and the below point is still valid :
|
If that 3.7ms is a hardware measurement, maybe it doesn't include extra overhead for reading the timer, etc. (Embedded in It might be worth (temporarily) adding a test assert in |
The 3,7ms is the time it takes to mostly restore clocks, so this is the SW sequence of the deep sleep exit, from the HW interrupt (end of WFI) to the point where the interrupt handler is called. I'm not sure about your next statement. Anyway, I still think that something is wrong with the LP timeout and the test. The test is supposed to verify that the lp_timeout handler when programmed to fire after 10ms will be called within 10 to 12ms max. But in case of deep sleep, the handler will only be called after 13ms to 14ms as the timeout driver does not consider the deep sleep latency. And the test fails to detect it. The only reason why the test is passed is that the system is woken-up early by the semaphore itself (12ms max - deep sleep latency) before the 10ms has expired. But the semaphore is supposed to be used as a test environment, not as the reason for waking-up the system. |
Closed via #11767 |
You're right, the test is flawed, because it doesn't at all consider latency on the low power timer IRQ. My description above was that when considering application timing correctness for things scheduled by the RTOS (eg So the extra work versus your measurement is:
That latency number is currently used for RTOS scheduling, so notionally should incorporate those steps. I note that the slowest step there is "read actual current time from HW". Conceptually, we could skip it if we see that |
@kjbracey-arm Most of the wake-up tme is spent wiaiting for HW clocks (in particular PLLs) to be ready and stable
So that's up to extra 0,3ms on a slow target for the full handling back to the main thread. How shall we proceed with the test flaw and the fact that lp_ticker and related drivers do not take into account the deep-sleep-latency ? |
Thank you for raising this detailed GitHub issue. I am now notifying our internal issue triagers. |
Description
Nightly CI test fails on DISCO-L475VG-IOT01A on lp_timeout test only when compiled on IAR.
It started to fail after this commit: 9858b16
@maciejbocianski @jamesbeyond @LMESTM
Issue request type
The text was updated successfully, but these errors were encountered: