Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npl/riot: Improve timer glue code #753

Merged
merged 2 commits into from
Feb 14, 2020

Conversation

bergzand
Copy link
Contributor

This PR modifies and cleans the RIOT NPL glue code.

The first commit replaces the constants in the milllisecond-to-microsecond and microsecond-to-millisecond conversions to the RIOT defines for this to improve readability.
The second commit mitigates an incorrect remaining_ticks result when the RIOT microsecond timer overflows or when the Nimble thread is interrupted between setting the target_ticks and setting the timer.

@haukepetersen This might be of interest to you.

@@ -215,7 +215,7 @@ ble_npl_callout_is_active(struct ble_npl_callout *c)
static inline ble_npl_time_t
ble_npl_callout_get_ticks(struct ble_npl_callout *co)
{
return co->target_ticks;
return (ble_npl_time_t)co->target_ticks;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks a bit odd, shouldn't ble_npl_time_t in riot port be change to uint64_t ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is odd and there is a bug here.

I don't think it is necessary to change ble_npl_time_t to uint64_t. The main reason to change from uint32_t to uint64_t is in the places where NimBLE provides a time in milliseconds and RIOT expects it in microseconds. The conversion to microseconds should be paired with a conversion to uint64_t to compensate for the additional range provided by value as millisecond compared to microseconds.

The bug is this line is that target_ticks is not in milliseconds but in microseconds. The RIOT port specifies that 1 tick == 1 millisecond, so this function should also properly return a value in milliseconds and not in microseconds.

I prefer to stick to uint32_t for the ble_npl_time_t and make sure that values converted from microseconds to milliseconds wrap around at 2**32 milliseconds and not at 2**32/1000 milliseconds

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, I'll leave final call to Hauke though :)

@sjanc sjanc requested a review from haukepetersen February 12, 2020 08:42
@haukepetersen
Copy link
Member

@bergzand yes, this is indeed interesting :-)

Changes are valid and the code looks good to me, testing it as we speak.

Copy link
Member

@haukepetersen haukepetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more overflow situation we should consider.

Apart from that the changes look good, and all test still run as expected.

@@ -100,7 +100,8 @@ ble_npl_eventq_get(struct ble_npl_eventq *evq, ble_npl_time_t tmo)
} else if (tmo == BLE_NPL_TIME_FOREVER) {
return (struct ble_npl_event *)event_wait(&evq->q);
} else {
return (struct ble_npl_event *)event_wait_timeout(&evq->q, (tmo * 1000));
return (struct ble_npl_event *)event_wait_timeout(&evq->q,
(tmo * US_PER_MS));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not mistaken, a source for error remains for this call: if tmos value is greater than (UINT32T_MAX / 1000 -> 1,19 hours), the timeout will be set incorrectly. Looking at RIOTs µs-based timer APIs it seems this can not be so easily solved. But how about we a) put in an assert that triggers once this overflow happens, or b) return NULL in that case, which subsequently would also in most cases trigger an assertion by the calling code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that I intentionally ignored the edge case here, as you mention, there is no easy way to solve the edge case here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that familiar with Bluetooth timings, do you consider it likely that this assertion will trigger during the lifetime of a node? Are there any timing durations in the Bluetooth spec itself that are longer than this 1.19 hours?

@haukepetersen
Copy link
Member

One more: the overflow issues is also present in ble_npl_time_delay().

Thinking about his a little more, I think that we should actually expose fitting functions in RIOT so that we can cleanly map the effected functions:

  • expose xtimer_usleep64() in RIOT and use it to map ble_npl_time_delay() properly
  • add xtimer_set_timeout_flag64() and event_wait_timeout64 to the corresponding RIOT APIs and use them in ble_npl_eventq_get().

@bergzand what do you think?

@bergzand
Copy link
Contributor Author

One more: the overflow issues is also present in ble_npl_time_delay().

These should show up only when the time in milliseconds causes an overflow in the uint32_t microsecond representation. (right?) Whereas with the ble_npl_callout_remaining_ticks it will always cause issues when the microsecond representation is near enough to UINT32_MAX.

I think adding assertions to ble_npl_time_delay() and ble_npl_eventq_get should be sufficient for now, assuming Nimble doesn't require timeouts longer than 1.19 hours in regular operations.

To mitigate this with a more long-term solution I think we either have to add the functions you've proposed to RIOT or merge and migrate to ztimer. ztimer allows for setting millisecond-based timers and would mitigate most of the issues present here.

@haukepetersen
Copy link
Member

assuming Nimble doesn't require timeouts longer than 1.19 hours in regular operations.

for normal operation I don't think so, but I do not know for sure. I guess the assertions will telll...

I agree, that ztimer shoud be the mid-term goal. However, at least event_wait_timeout() would still prove to be a small challenge.

So how about:

  1. catch overflows using asserts in ble_npl_time_delay() and ble_npl_eventq_get() for now, so we can get this PR merged and benefit from the fixes it provides
  2. add API functions in RIOT to xtimer and event as intermediate
  3. remove asserts from (1.) in favor of new functions
  4. move to ztimer in general, once ztimer has stabilized a little further

@bergzand
Copy link
Contributor Author

bergzand commented Feb 13, 2020

  1. catch overflows using asserts in ble_npl_time_delay() and ble_npl_eventq_get() for now, so we can get this PR merged and benefit from the fixes it provides

Done

@bergzand
Copy link
Contributor Author

And I think I broke something related to the announcements, I'll look into it

@haukepetersen
Copy link
Member

Nice, I am just looking into doing the needed adaptions in RIOT for step 2...

@bergzand
Copy link
Contributor Author

Nice, I am just looking into doing the needed adaptions in RIOT for step 2...

Thanks!

I think my code is fine, but somehow the RTT is broken on my test board. On a nRF52dk everything seems fine I receive the BLE advertisements.

Copy link
Member

@haukepetersen haukepetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the adaption of RIOT has been quicker than anticipated :-)

uint64_t tmo_us64 = tmo * US_PER_MS;
assert(tmo_us64 <= UINT32_MAX);
return (struct ble_npl_event *)event_wait_timeout(&evq->q,
(tmo_us64));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the current state, tmo_us64 should be explicitly casted to uint32_t I'd say

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or we simply wait for RIOT-OS/RIOT#13373 to be merged :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waited for RIOT-OS/RIOT#13373 and modified to event_wait_timeout64

xtimer_usleep(ticks * 1000);
uint64_t us64 = ticks * US_PER_MS;
assert(us64 <= UINT32_MAX);
xtimer_usleep(us64);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getting RIOT ready was easier than expected: RIOT-OS/RIOT#13370

So we can drop the assert again and use xtimer_usleep64() directly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we can drop the assert again and use xtimer_usleep64() directly here.

Done!

Copy link
Member

@haukepetersen haukepetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, code wise just minor details left.

Will give this another set of decent test-runs tomorrow to make sure everything still behaves as expected...

@@ -227,7 +228,7 @@ ble_npl_callout_set_arg(struct ble_npl_callout *co, void *arg)
static inline uint32_t
ble_npl_time_get(void)
{
return xtimer_now_usec() / 1000;
return xtimer_now_usec64() / US_PER_MS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just notice something that could be fixed while we are at it: ble_npl_time_get() should actually return a value of type ble_npl_time_t (see nimble/include/nimble/nimble_npl.h line 139), so we could just adapt the function declaration in this PR.

Furthre: it would be cleaner looking if the return value would be explicitly casted to ble_npl_time_t

return (ble_npl_time_t)(xtimer_now_usec64() / US_PER_MS);

time = xtimer_now_usec64() + ble_npl_time_ticks_to_ms32(timeout) * 1000;
abs.tv_sec = (time_t)(time / 1000000);
abs.tv_nsec = (long)((time % 1000000) * 1000);
time = xtimer_now_usec64() + ble_npl_time_ticks_to_ms32(timeout) * US_PER_MS;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: parenthesis would help readability here :-)

@bergzand
Copy link
Contributor Author

I have 09b619b running on a device here connected to my phone with periodic BLE traffic between the two. I will leave it running overnight and see if anything pops up.

Copy link
Member

@haukepetersen haukepetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good now, and all my tests still run as they are supposed to.

Feel free to squash, and lets hear what your experiment results for tonight were :-)

@bergzand
Copy link
Contributor Author

Feel free to squash

Squashed!

and lets hear what your experiment results for tonight were :-)

Still connected to the phone as of now and I haven't detected any issues with the BLE connection.

@haukepetersen
Copy link
Member

Perfect, then lets go as soon as Travis is happy as well.

@bergzand
Copy link
Contributor Author

@haukepetersen All green!

@haukepetersen
Copy link
Member

all green -> go.

@haukepetersen haukepetersen merged commit 310d6ac into apache:master Feb 14, 2020
@bergzand bergzand deleted the pr/npl_riot/timer branch February 14, 2020 15:27
@bergzand
Copy link
Contributor Author

Thanks!

@haukepetersen
Copy link
Member

Thank you, awesome work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants