Various fixes to how timer differences are calculated #8585

purdeaandrei · 2020-03-28T05:05:35Z

Description

This PR makes 3 main changes

The TIMER_DIFF* macros calculated wrong values after timer overflow (1 less)
Improves code generation efficiency for TIMER_DIFF* macros (smaller, faster code)
The eager_pr, and eager_pk debounce algorithms had a serious problem with how they tracked time.

For details please see each commit message in turn.

Types of Changes

Issues Fixed or Closed by This PR

Checklist

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
I have tested the changes and verified that they work and don't break anything (as well as I can manage).

NOTE: I have tested the changes using a atmega32u4 devboard, with some synthetic tests. I have not tested with an actual keyboard.

…ectly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example.

Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA

… debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit.

purdeaandrei · 2020-04-01T07:49:33Z

I've sent a subset of these changes to TMK as well: tmk/tmk_keyboard#646

alex-ong · 2020-04-01T08:29:35Z

I have arrived!

I can understand why a +1 might be necessary (fenceposting)
But i'm not understanding the difference between

static uint16_t custom_wrap_timer_reference = 0;
static uint8_t custom_wrap_timer_last_value = 0;

static uint8_t custom_wrap_timer_read(void) {
    uint16_t custom_wrap_timer_new_reference = timer_read();
    uint16_t diff = custom_wrap_timer_new_reference - custom_wrap_timer_reference;
    custom_wrap_timer_reference = custom_wrap_timer_new_reference;
    custom_wrap_timer_last_value = (custom_wrap_timer_last_value + diff) % (MAX_DEBOUNCE + 1);
    return custom_wrap_timer_last_value;
}

and timer_read() % (MAX_DEBOUNCE +1 if necessary)

(timer_read() % MAX_DEBOUNCE) simply makes the timer go from 0->max_debounce-1 and loop instead of 0->65535.

With any timer, TIMER_DIFF is guaranteed to give the time difference between two timestamps, assuming of course that you didn't double overflow.

Am I missing something here?

purdeaandrei · 2020-04-01T08:33:34Z

@alex-ong Have you read my commit message from here? 84eb44d

It explains it pretty clearly.
Please read all of my commit messages, they are intended as clear documentation for the reason of the change.

purdeaandrei · 2020-04-01T08:35:26Z

MAX_DEBOUNCE is 250, but the counter only goes up to 35 when the 16-bit timer value overflows (every second or so)

alex-ong · 2020-04-01T08:36:09Z

@alex-ong Have you read my commit message from here? 84eb44d

It explains it pretty clearly.
Please read all of my commit messages, they are intended as clear documentation for the reason of the change.

That diagram is well written and 100% explains it!

For performance reasons, do you think there's a way of changing max_DEBOUNCE to be a friendlier multiple of 65535? That way it will work sans the potentially expensive "finding the real difference and adding it back again"

alex-ong · 2020-04-01T08:40:58Z

divisor 1 numerator 65535
divisor 3 numerator 21845
divisor 5 numerator 13107
divisor 15 numerator 4369
divisor 17 numerator 3855
divisor 51 numerator 1285
divisor 85 numerator 771
divisor 255 numerator 257
divisor 257 numerator 255
divisor 771 numerator 85
divisor 1285 numerator 51
divisor 3855 numerator 17
divisor 4369 numerator 15
divisor 13107 numerator 5
divisor 21845 numerator 3

So (timer_read() % 85(+1?)) could work?
As long as your debounce is sub 85 milliseconds it should be fine.

purdeaandrei · 2020-04-01T08:42:11Z

For performance reasons, do you think there's a way of changing max_DEBOUNCE to be a friendlier multiple of 65535? That way it will work sans the potentially expensive "finding the real difference and adding it back again"

Yes, but that will only work with powers of two.
That is MAX_DEBOUNCE = 127 (with %128 in the formula) is the biggest that would work, and would leave space for the ELAPSED value.
I don't think it's that much performance loss though. This code only executes once per scan,
so it might not even be measureable.

alex-ong · 2020-04-01T08:44:54Z

For performance reasons, do you think there's a way of changing max_DEBOUNCE to be a friendlier multiple of 65535? That way it will work sans the potentially expensive "finding the real difference and adding it back again"

Yes, but that will only work with powers of two.
That is MAX_DEBOUNCE = 127 (with %128 in the formula) is the biggest that would work, and would leave space for the ELAPSED value.
I don't think it's that much performance loss though. This code only executes once per scan,
so it might not even be measureable.

Oh right, i was dumb and did 65535 instead of 65536.

I also think that % MAX_DEBOUNCE of a po2 with a comment explaining it needing to be a power of 2, due to timer overflow alignment

is clearer than getting the timer, calculating the real timer difference and adding back to the modulo'ed counter

purdeaandrei · 2020-04-01T08:45:24Z

matrix scan rate on a 10x10 fake atmega32u4 board:
eager PK with fix: 1171 Hz
eager PK with %128: 1190 Hz

purdeaandrei · 2020-04-01T08:58:13Z

Maybe I would do something like

#if DEBOUNCE > (127 - 50)
....my code from current state of this PR....
#else
#define MAX_DEBOUNCE 127
#define custom_wrap_timer_read() (timer_read() % (MAX_DEBOUNCE + 1))
#end

Note the -50 in there.
MAX_DEBOUNCE doesn't just limit the maximum amount of debounce available, because you also need some padding after the debounce has elapsed (50ms is an arbitrary number I just chose), in case the debounce algorithm doesn't get called in time. Otherwise, you might overflow, and the DIFF will return a value smaller then DEBOUNCE, even if it's larger

purdeaandrei · 2020-04-01T09:05:33Z

@alex-ong Does that sound okay to you? You get the benefit of both compatibility with high debounce values, and the speed of low debounce values.

alex-ong · 2020-04-01T09:12:14Z

eager PK with fix: 1171 Hz
eager PK with %128: 1190 Hz

Fast research!

Yeah, so probably not measurably slower. I can't do the math right now because my math's been failing me all day, the "slow" version takes 0.02ms (or 0.000002s) / cycle?

If i was an approver, I would approve it as is, and also totally put a PR to change it to %128 for speeeeed meeemes. (note i am not an approver on this)

Note the -50 in there.

Do you have an example of why the -50 buffer is required?

The case you're talking about is if between updates, it takes longer than MAX_DEBOUNCE right? so you're saying if the real-time is > 128ms, it will report a number between 0 and 128ms? (i.e double overflow). I'm assuming that no keyboard scans slower than 10hz, which i think is reasonable.

I think it's also reasonable to assume DEBOUNCE would be < 100?, I don't see much reason for having both solutions, since they both work and i doubt there's a use-case with DEBOUNCE > ~50. Someone who isn't me can adjudicate which option is better, just putting forward my side.

purdeaandrei · 2020-04-01T09:43:05Z

Yeah, so probably not measurably slower. I can't do the math right now because my math's been failing me all day, the "slow" version takes 0.02ms (or 0.000002s) / cycle?

It means, that the current PR version is this much slower per scan:

>>> (1/1171. - 1/1190.) * 1000 * 1000
13.634830533408863  (microseconds)

Do you have an example of why the -50 buffer is required?

Let's say you have a keyboard at 10 Hz scan rate, that's 100ms
Let's say you set DEBOUNCE = 120
the times you will read, in order will be:
100
200 % 128 = 72 -- debounce elapsed but to you it looks like time is going backwards
300 % 128 = 44
400 % 128 = 16
500 % 128 = 116
600 % 128 = 88
... 900 % 128 = 4
1000 % 128 = 104
1100 % 128 = 76
Looks like it would never deem debounce to be elapsed, and it would never send keycodes anymore.
I know it's an unlikely situation, but it is support that we would be dropping by just changing the value of MAX_DEBOUNCE

(note i am not an approver on this)

Alright, let's see what an official approver would say. I'm on the side of implementing that #if type code above for speed, but I don't wanna implement it then have it be reverted if an official approver thinks it's way too complicated.

I'm also fine with only using %128, as long as it's not determined to be a breaking change.

So any approvers would like to review?

purdeaandrei · 2020-04-01T09:47:33Z

Example in previous comment edited

alex-ong · 2020-04-01T10:22:44Z

Right i can see how your example works; it only occurs when the scanrate is extremely low / the debounce is high, and a double overflow occurs.

The slowest keyboard in qmk still scans at 300hz, here's the same numbers but with 100hz scanning + 100ms debounce. I don't really forsee something as pitiful as 10hz scanning (remember normal keyboards are in 300-1000 range; mine is 14000 lmao)

Raw Timer	Modulo timer	TIMER_DIFF(modulo,initial_modulo,128)	Timer expired? (TIMER_DIFF > 100)
120	120	0 (start value)	no
130	2	10	no
140	12	20	no
...	...	...	...
220	92	100	yes

I understand that the %128 is not "correct", but it is correct in real use, and simple to digest.

quantum/debounce/eager_pk.c

…ility according to code review.

purdeaandrei · 2020-04-04T19:42:37Z

purdeaandrei removed the request for review from qmk/collaborators 1 minute ago

That's strange, I didn't intend to remove any request to review. I only clicked the re-request review for @vomindoraan , but that for some reason removed qmk/collaborators, and I can't edit at all the list of review requests.

vomindoraan

Looks good! One final change I'd make (sorry for not catching this earlier) would be to rename wrap_timer_read() to wrapping_timer_read(), since “wrap” could be interpreted as an action at a casual glance, whereas the latter name clearly conveys that you're reading something from a timer that wraps (wrapping timer).

…ility according to code review. (2)

purdeaandrei · 2020-04-04T20:14:24Z

@vomindoraan Done

purdeaandrei · 2020-04-11T11:17:00Z

Thanks!

* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)

purdeaandrei added 3 commits March 28, 2020 06:16

tzarc requested a review from a team March 28, 2020 05:09

tzarc added bug core enhancement good first issue optimization labels Mar 28, 2020

purdeaandrei mentioned this pull request Apr 1, 2020

quantum/debounce: Added sym_pk debounce algorithm #8587

Merged

13 tasks

purdeaandrei mentioned this pull request Apr 2, 2020

TIMER_DIFF* bugfix tmk/tmk_keyboard#646

Merged

vomindoraan reviewed Apr 4, 2020

View reviewed changes

quantum/debounce/eager_pk.c Outdated Show resolved Hide resolved

quantum/debounce/eager_pr and eager_pk: modifications for code readab…

206ed1b

…ility according to code review.

purdeaandrei removed the request for review from a team April 4, 2020 19:40

vomindoraan reviewed Apr 4, 2020

View reviewed changes

quantum/debounce/eager_pr and eager_pk: modifications for code readab…

1a93c72

…ility according to code review. (2)

vomindoraan approved these changes Apr 4, 2020

View reviewed changes

noroadsleft requested a review from a team April 5, 2020 05:11

drashna approved these changes Apr 5, 2020

View reviewed changes

drashna requested a review from a team April 5, 2020 06:12

tzarc approved these changes Apr 11, 2020

View reviewed changes

tzarc merged commit 6c2c3c1 into qmk:master Apr 11, 2020

purdeaandrei deleted the timer_diff_fixes branch April 11, 2020 11:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various fixes to how timer differences are calculated #8585

Various fixes to how timer differences are calculated #8585

purdeaandrei commented Mar 28, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020

purdeaandrei commented Apr 1, 2020

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020

alex-ong commented Apr 1, 2020

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020

purdeaandrei commented Apr 4, 2020

vomindoraan left a comment •

edited

Loading

purdeaandrei commented Apr 4, 2020

purdeaandrei commented Apr 11, 2020

Various fixes to how timer differences are calculated #8585

Various fixes to how timer differences are calculated #8585

Conversation

purdeaandrei commented Mar 28, 2020 • edited Loading

Description

Types of Changes

Issues Fixed or Closed by This PR

Checklist

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020

purdeaandrei commented Apr 1, 2020

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020

alex-ong commented Apr 1, 2020

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020 • edited Loading

purdeaandrei commented Apr 1, 2020 • edited Loading

purdeaandrei commented Apr 1, 2020 • edited Loading

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020 • edited Loading

purdeaandrei commented Apr 1, 2020 • edited Loading

purdeaandrei commented Apr 1, 2020

alex-ong commented Apr 1, 2020

purdeaandrei commented Apr 4, 2020

vomindoraan left a comment • edited Loading

Choose a reason for hiding this comment

purdeaandrei commented Apr 4, 2020

purdeaandrei commented Apr 11, 2020

purdeaandrei commented Mar 28, 2020 •

edited

Loading

alex-ong commented Apr 1, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020 •

edited

Loading

alex-ong commented Apr 1, 2020 •

edited

Loading

purdeaandrei commented Apr 1, 2020 •

edited

Loading

vomindoraan left a comment •

edited

Loading