Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various fixes to how timer differences are calculated #8585

Merged
merged 5 commits into from
Apr 11, 2020

Conversation

purdeaandrei
Copy link
Contributor

@purdeaandrei purdeaandrei commented Mar 28, 2020

Description

This PR makes 3 main changes

  1. The TIMER_DIFF* macros calculated wrong values after timer overflow (1 less)
  2. Improves code generation efficiency for TIMER_DIFF* macros (smaller, faster code)
  3. The eager_pr, and eager_pk debounce algorithms had a serious problem with how they tracked time.

For details please see each commit message in turn.

Types of Changes

  • Core
  • Bugfix
  • New feature
  • Enhancement/optimization
  • Keyboard (addition or update)
  • Keymap/layout/userspace (addition or update)
  • Documentation

Issues Fixed or Closed by This PR

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have tested the changes and verified that they work and don't break anything (as well as I can manage).

NOTE: I have tested the changes using a atmega32u4 devboard, with some synthetic tests. I have not tested with an actual keyboard.

…ectly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.
Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA
… debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.
@purdeaandrei
Copy link
Contributor Author

I've sent a subset of these changes to TMK as well: tmk/tmk_keyboard#646

@alex-ong
Copy link
Contributor

alex-ong commented Apr 1, 2020

I have arrived!

I can understand why a +1 might be necessary (fenceposting)
But i'm not understanding the difference between

static uint16_t custom_wrap_timer_reference = 0;
static uint8_t custom_wrap_timer_last_value = 0;

static uint8_t custom_wrap_timer_read(void) {
    uint16_t custom_wrap_timer_new_reference = timer_read();
    uint16_t diff = custom_wrap_timer_new_reference - custom_wrap_timer_reference;
    custom_wrap_timer_reference = custom_wrap_timer_new_reference;
    custom_wrap_timer_last_value = (custom_wrap_timer_last_value + diff) % (MAX_DEBOUNCE + 1);
    return custom_wrap_timer_last_value;
}

and timer_read() % (MAX_DEBOUNCE +1 if necessary)

(timer_read() % MAX_DEBOUNCE) simply makes the timer go from 0->max_debounce-1 and loop instead of 0->65535.

With any timer, TIMER_DIFF is guaranteed to give the time difference between two timestamps, assuming of course that you didn't double overflow.

Am I missing something here?

@purdeaandrei
Copy link
Contributor Author

@alex-ong Have you read my commit message from here? 84eb44d

It explains it pretty clearly.
Please read all of my commit messages, they are intended as clear documentation for the reason of the change.

@purdeaandrei
Copy link
Contributor Author

MAX_DEBOUNCE is 250, but the counter only goes up to 35 when the 16-bit timer value overflows (every second or so)

@alex-ong
Copy link
Contributor

alex-ong commented Apr 1, 2020

@alex-ong Have you read my commit message from here? 84eb44d

It explains it pretty clearly.
Please read all of my commit messages, they are intended as clear documentation for the reason of the change.

That diagram is well written and 100% explains it!

For performance reasons, do you think there's a way of changing max_DEBOUNCE to be a friendlier multiple of 65535? That way it will work sans the potentially expensive "finding the real difference and adding it back again"

@alex-ong
Copy link
Contributor

alex-ong commented Apr 1, 2020

divisor 1 numerator 65535
divisor 3 numerator 21845
divisor 5 numerator 13107
divisor 15 numerator 4369
divisor 17 numerator 3855
divisor 51 numerator 1285
divisor 85 numerator 771
divisor 255 numerator 257
divisor 257 numerator 255
divisor 771 numerator 85
divisor 1285 numerator 51
divisor 3855 numerator 17
divisor 4369 numerator 15
divisor 13107 numerator 5
divisor 21845 numerator 3

So (timer_read() % 85(+1?)) could work?
As long as your debounce is sub 85 milliseconds it should be fine.

@purdeaandrei
Copy link
Contributor Author

For performance reasons, do you think there's a way of changing max_DEBOUNCE to be a friendlier multiple of 65535? That way it will work sans the potentially expensive "finding the real difference and adding it back again"

Yes, but that will only work with powers of two.
That is MAX_DEBOUNCE = 127 (with %128 in the formula) is the biggest that would work, and would leave space for the ELAPSED value.
I don't think it's that much performance loss though. This code only executes once per scan,
so it might not even be measureable.

@alex-ong
Copy link
Contributor

alex-ong commented Apr 1, 2020

For performance reasons, do you think there's a way of changing max_DEBOUNCE to be a friendlier multiple of 65535? That way it will work sans the potentially expensive "finding the real difference and adding it back again"

Yes, but that will only work with powers of two.
That is MAX_DEBOUNCE = 127 (with %128 in the formula) is the biggest that would work, and would leave space for the ELAPSED value.
I don't think it's that much performance loss though. This code only executes once per scan,
so it might not even be measureable.

Oh right, i was dumb and did 65535 instead of 65536.

I also think that % MAX_DEBOUNCE of a po2 with a comment explaining it needing to be a power of 2, due to timer overflow alignment

is clearer than getting the timer, calculating the real timer difference and adding back to the modulo'ed counter

@purdeaandrei
Copy link
Contributor Author

purdeaandrei commented Apr 1, 2020

matrix scan rate on a 10x10 fake atmega32u4 board:
eager PK with fix: 1171 Hz
eager PK with %128: 1190 Hz

@purdeaandrei
Copy link
Contributor Author

purdeaandrei commented Apr 1, 2020

Maybe I would do something like

#if DEBOUNCE > (127 - 50)
....my code from current state of this PR....
#else
#define MAX_DEBOUNCE 127
#define custom_wrap_timer_read() (timer_read() % (MAX_DEBOUNCE + 1))
#end

Note the -50 in there.
MAX_DEBOUNCE doesn't just limit the maximum amount of debounce available, because you also need some padding after the debounce has elapsed (50ms is an arbitrary number I just chose), in case the debounce algorithm doesn't get called in time. Otherwise, you might overflow, and the DIFF will return a value smaller then DEBOUNCE, even if it's larger

@purdeaandrei
Copy link
Contributor Author

@alex-ong Does that sound okay to you? You get the benefit of both compatibility with high debounce values, and the speed of low debounce values.

@alex-ong
Copy link
Contributor

alex-ong commented Apr 1, 2020

eager PK with fix: 1171 Hz
eager PK with %128: 1190 Hz

Fast research!

Yeah, so probably not measurably slower. I can't do the math right now because my math's been failing me all day, the "slow" version takes 0.02ms (or 0.000002s) / cycle?

If i was an approver, I would approve it as is, and also totally put a PR to change it to %128 for speeeeed meeemes. (note i am not an approver on this)

Note the -50 in there.

Do you have an example of why the -50 buffer is required?

The case you're talking about is if between updates, it takes longer than MAX_DEBOUNCE right? so you're saying if the real-time is > 128ms, it will report a number between 0 and 128ms? (i.e double overflow). I'm assuming that no keyboard scans slower than 10hz, which i think is reasonable.

I think it's also reasonable to assume DEBOUNCE would be < 100?, I don't see much reason for having both solutions, since they both work and i doubt there's a use-case with DEBOUNCE > ~50. Someone who isn't me can adjudicate which option is better, just putting forward my side.

@purdeaandrei
Copy link
Contributor Author

purdeaandrei commented Apr 1, 2020

Yeah, so probably not measurably slower. I can't do the math right now because my math's been failing me all day, the "slow" version takes 0.02ms (or 0.000002s) / cycle?

It means, that the current PR version is this much slower per scan:

>>> (1/1171. - 1/1190.) * 1000 * 1000
13.634830533408863  (microseconds)

Do you have an example of why the -50 buffer is required?

Let's say you have a keyboard at 10 Hz scan rate, that's 100ms
Let's say you set DEBOUNCE = 120
the times you will read, in order will be:
100
200 % 128 = 72 -- debounce elapsed but to you it looks like time is going backwards
300 % 128 = 44
400 % 128 = 16
500 % 128 = 116
600 % 128 = 88
... 900 % 128 = 4
1000 % 128 = 104
1100 % 128 = 76
Looks like it would never deem debounce to be elapsed, and it would never send keycodes anymore.
I know it's an unlikely situation, but it is support that we would be dropping by just changing the value of MAX_DEBOUNCE

(note i am not an approver on this)

Alright, let's see what an official approver would say. I'm on the side of implementing that #if type code above for speed, but I don't wanna implement it then have it be reverted if an official approver thinks it's way too complicated.

I'm also fine with only using %128, as long as it's not determined to be a breaking change.

So any approvers would like to review?

@purdeaandrei
Copy link
Contributor Author

Example in previous comment edited

@alex-ong
Copy link
Contributor

alex-ong commented Apr 1, 2020

Right i can see how your example works; it only occurs when the scanrate is extremely low / the debounce is high, and a double overflow occurs.

The slowest keyboard in qmk still scans at 300hz, here's the same numbers but with 100hz scanning + 100ms debounce. I don't really forsee something as pitiful as 10hz scanning (remember normal keyboards are in 300-1000 range; mine is 14000 lmao)

Raw Timer Modulo timer TIMER_DIFF(modulo,initial_modulo,128) Timer expired? (TIMER_DIFF > 100)
120 120 0 (start value) no
130 2 10 no
140 12 20 no
... ... ... ...
220 92 100 yes

I understand that the %128 is not "correct", but it is correct in real use, and simple to digest.

@purdeaandrei purdeaandrei removed the request for review from a team April 4, 2020 19:40
@purdeaandrei
Copy link
Contributor Author

purdeaandrei removed the request for review from qmk/collaborators 1 minute ago

That's strange, I didn't intend to remove any request to review. I only clicked the re-request review for @vomindoraan , but that for some reason removed qmk/collaborators, and I can't edit at all the list of review requests.

Copy link
Contributor

@vomindoraan vomindoraan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! One final change I'd make (sorry for not catching this earlier) would be to rename wrap_timer_read() to wrapping_timer_read(), since “wrap” could be interpreted as an action at a casual glance, whereas the latter name clearly conveys that you're reading something from a timer that wraps (wrapping timer).

@purdeaandrei
Copy link
Contributor Author

@vomindoraan Done

@noroadsleft noroadsleft requested a review from a team April 5, 2020 05:11
@drashna drashna requested a review from a team April 5, 2020 06:12
@tzarc tzarc merged commit 6c2c3c1 into qmk:master Apr 11, 2020
@purdeaandrei
Copy link
Contributor Author

Thanks!

@purdeaandrei purdeaandrei deleted the timer_diff_fixes branch April 11, 2020 11:17
violet-fish pushed a commit to violet-fish/qmk_firmware that referenced this pull request Apr 12, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
Quarren42 pushed a commit to Quarren42/qmk_firmware that referenced this pull request Apr 15, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
kylekuj pushed a commit to kylekuj/qmk_firmware that referenced this pull request Apr 21, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
bitherder pushed a commit to bitherder/qmk_firmware that referenced this pull request May 15, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
drashna pushed a commit to zsa/qmk_firmware that referenced this pull request May 24, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
sowbug pushed a commit to sowbug/qmk_firmware that referenced this pull request May 24, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
fdidron pushed a commit to zsa/qmk_firmware that referenced this pull request Jun 12, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
turky pushed a commit to turky/qmk_firmware that referenced this pull request Jun 13, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
jakobaa pushed a commit to jakobaa/qmk_firmware that referenced this pull request Jul 7, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
thorstenweber83 pushed a commit to thorstenweber83/qmk_firmware that referenced this pull request Sep 2, 2020
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
BorisTestov pushed a commit to BorisTestov/qmk_firmware that referenced this pull request May 23, 2024
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps.

Let's go through an example, using the following macro:

If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped.
If the timer would have had more bits, it's new value would have been 0x132,
and the correct difference in time is 0x132 - 0xe4 = 0x4e

old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong.
new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct.

This also gives a chance for a smart compiler to optimize the code using normal
integer overflow.

For example on AVR, the following C code:
uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
With the original code, it gets translated to the following list of instructions:
00004c6e <test>:
    4c6e:       98 2f           mov     r25, r24
    4c70:       86 1b           sub     r24, r22
    4c72:       96 17           cp      r25, r22
    4c74:       08 f4           brcc    .+2             ; 0x4c78 <test+0xa>
    4c76:       81 50           subi    r24, 0x01       ; 1
    4c78:       08 95           ret
But with this commit, it gets translated to a single instruction:
00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       08 95           ret

This unfortunately doesn't always work so nicely, for example the following C code:
int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}
(Note: return type changed to int)
With the original code it gets translated to:
00004c6e <test>:
    4c6e:       28 2f           mov     r18, r24
    4c70:       30 e0           ldi     r19, 0x00       ; 0
    4c72:       46 2f           mov     r20, r22
    4c74:       50 e0           ldi     r21, 0x00       ; 0
    4c76:       86 17           cp      r24, r22
    4c78:       20 f0           brcs    .+8             ; 0x4c82 <test+0x14>
    4c7a:       c9 01           movw    r24, r18
    4c7c:       84 1b           sub     r24, r20
    4c7e:       95 0b           sbc     r25, r21
    4c80:       08 95           ret
    4c82:       c9 01           movw    r24, r18
    4c84:       84 1b           sub     r24, r20
    4c86:       95 0b           sbc     r25, r21
    4c88:       81 50           subi    r24, 0x01       ; 1
    4c8a:       9f 4f           sbci    r25, 0xFF       ; 255
    4c8c:       08 95           ret
Wth this commit it gets translated to:
00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret
There is not much performance improvement in this case, however at least with this
commit it functions correctly.

Note: The following commit will improve compiler output for the latter example.

* tmk_core/common: Improve code generation for TIMER_DIFF* macros

Because of integer promotion the compiler is having a hard time generating
efficient code to calculate TIMER_DIFF* macros in some situations.
In the below example, the return value is "int", and this is causing the
trouble.

Example C code:

int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer)
{
    return TIMER_DIFF_8(current_timer, start_timer);
}

BEFORE: (with -Os)

00004c40 <test>:
    4c40:       28 2f           mov     r18, r24
    4c42:       30 e0           ldi     r19, 0x00       ; 0
    4c44:       46 2f           mov     r20, r22
    4c46:       50 e0           ldi     r21, 0x00       ; 0
    4c48:       86 17           cp      r24, r22
    4c4a:       20 f0           brcs    .+8             ; 0x4c54 <test+0x14>
    4c4c:       c9 01           movw    r24, r18
    4c4e:       84 1b           sub     r24, r20
    4c50:       95 0b           sbc     r25, r21
    4c52:       08 95           ret
    4c54:       c9 01           movw    r24, r18
    4c56:       84 1b           sub     r24, r20
    4c58:       95 0b           sbc     r25, r21
    4c5a:       93 95           inc     r25
    4c5c:       08 95           ret

AFTER: (with -Os)

00004c40 <test>:
    4c40:       86 1b           sub     r24, r22
    4c42:       90 e0           ldi     r25, 0x00       ; 0
    4c44:       08 95           ret

Note: the example is showing -Os but improvements can be seen at all optimization levels,
including -O0. We never use -O0, but I tested it to make sure that no extra code is
generated in that case.OA

* quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms

Please see the below simulated sequence of events:
Column A is the 16-bit value returned by read_timer();
Column B is the value returned by custom_wrap_timer_read();
Column C is the original code: (timer_read() % MAX_DEBOUNCE)

    A,     B,     C
65530,    19,    30
65531,    20,    31
65532,    21,    32
65533,    22,    33
65534,    23,    34
65535,    24,    35
    0     25,     0
    1,    26,     1
    2,    27,     2
    3,    28,     3
    4,    29,     4
    5,    30,     5

read_timer() wraps about every 1.09 seconds, and so debouncing might
fail at these times without this commit.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review.

* quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants