-
-
Notifications
You must be signed in to change notification settings - Fork 39.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various fixes to how timer differences are calculated #8585
Conversation
…ectly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example.
Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA
… debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit.
I've sent a subset of these changes to TMK as well: tmk/tmk_keyboard#646 |
I have arrived! I can understand why a +1 might be necessary (fenceposting)
and (timer_read() % MAX_DEBOUNCE) simply makes the timer go from 0->max_debounce-1 and loop instead of 0->65535. With any timer, TIMER_DIFF is guaranteed to give the time difference between two timestamps, assuming of course that you didn't double overflow. Am I missing something here? |
MAX_DEBOUNCE is 250, but the counter only goes up to 35 when the 16-bit timer value overflows (every second or so) |
That diagram is well written and 100% explains it! For performance reasons, do you think there's a way of changing max_DEBOUNCE to be a friendlier multiple of 65535? That way it will work sans the potentially expensive "finding the real difference and adding it back again" |
So (timer_read() % 85(+1?)) could work? |
Yes, but that will only work with powers of two. |
Oh right, i was dumb and did 65535 instead of 65536. I also think that is clearer than getting the timer, calculating the real timer difference and adding back to the modulo'ed counter |
matrix scan rate on a 10x10 fake atmega32u4 board: |
Maybe I would do something like
Note the -50 in there. |
@alex-ong Does that sound okay to you? You get the benefit of both compatibility with high debounce values, and the speed of low debounce values. |
Fast research! Yeah, so probably not measurably slower. I can't do the math right now because my math's been failing me all day, the "slow" version takes 0.02ms (or 0.000002s) / cycle? If i was an approver, I would approve it as is, and also totally put a PR to change it to %128 for speeeeed meeemes. (note i am not an approver on this)
Do you have an example of why the -50 buffer is required? The case you're talking about is if between updates, it takes longer than MAX_DEBOUNCE right? so you're saying if the real-time is > 128ms, it will report a number between 0 and 128ms? (i.e double overflow). I'm assuming that no keyboard scans slower than 10hz, which i think is reasonable. I think it's also reasonable to assume DEBOUNCE would be |
It means, that the current PR version is this much slower per scan:
Let's say you have a keyboard at 10 Hz scan rate, that's 100ms
Alright, let's see what an official approver would say. I'm on the side of implementing that #if type code above for speed, but I don't wanna implement it then have it be reverted if an official approver thinks it's way too complicated. I'm also fine with only using %128, as long as it's not determined to be a breaking change. So any approvers would like to review? |
Example in previous comment edited |
Right i can see how your example works; it only occurs when the scanrate is extremely low / the debounce is high, and a double overflow occurs. The slowest keyboard in qmk still scans at 300hz, here's the same numbers but with 100hz scanning + 100ms debounce. I don't really forsee something as pitiful as 10hz scanning (remember normal keyboards are in 300-1000 range; mine is 14000 lmao)
I understand that the %128 is not "correct", but it is correct in real use, and simple to digest. |
…ility according to code review.
That's strange, I didn't intend to remove any request to review. I only clicked the re-request review for @vomindoraan , but that for some reason removed qmk/collaborators, and I can't edit at all the list of review requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One final change I'd make (sorry for not catching this earlier) would be to rename wrap_timer_read()
to wrapping_timer_read()
, since “wrap” could be interpreted as an action at a casual glance, whereas the latter name clearly conveys that you're reading something from a timer that wraps (wrapping timer).
…ility according to code review. (2)
@vomindoraan Done |
Thanks! |
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
* tmk_core/common: Fixing TIMER_DIFF macro to calculate difference correctly after the timer wraps. Let's go through an example, using the following macro: If the first timer read is 0xe4 and the second one is 0x32, the timer wrapped. If the timer would have had more bits, it's new value would have been 0x132, and the correct difference in time is 0x132 - 0xe4 = 0x4e old code TIMER_DIFF_8(0x32, 0xe4) = 0xff - 0xe4 + 0x32 = 0x4d, which is wrong. new code TIMER_DIFF_8(0x32, 0xe4) = 0xff + 1 - 0xe4 + 0x32 = 0x4e, which is correct. This also gives a chance for a smart compiler to optimize the code using normal integer overflow. For example on AVR, the following C code: uint8_t __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } With the original code, it gets translated to the following list of instructions: 00004c6e <test>: 4c6e: 98 2f mov r25, r24 4c70: 86 1b sub r24, r22 4c72: 96 17 cp r25, r22 4c74: 08 f4 brcc .+2 ; 0x4c78 <test+0xa> 4c76: 81 50 subi r24, 0x01 ; 1 4c78: 08 95 ret But with this commit, it gets translated to a single instruction: 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 08 95 ret This unfortunately doesn't always work so nicely, for example the following C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } (Note: return type changed to int) With the original code it gets translated to: 00004c6e <test>: 4c6e: 28 2f mov r18, r24 4c70: 30 e0 ldi r19, 0x00 ; 0 4c72: 46 2f mov r20, r22 4c74: 50 e0 ldi r21, 0x00 ; 0 4c76: 86 17 cp r24, r22 4c78: 20 f0 brcs .+8 ; 0x4c82 <test+0x14> 4c7a: c9 01 movw r24, r18 4c7c: 84 1b sub r24, r20 4c7e: 95 0b sbc r25, r21 4c80: 08 95 ret 4c82: c9 01 movw r24, r18 4c84: 84 1b sub r24, r20 4c86: 95 0b sbc r25, r21 4c88: 81 50 subi r24, 0x01 ; 1 4c8a: 9f 4f sbci r25, 0xFF ; 255 4c8c: 08 95 ret Wth this commit it gets translated to: 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret There is not much performance improvement in this case, however at least with this commit it functions correctly. Note: The following commit will improve compiler output for the latter example. * tmk_core/common: Improve code generation for TIMER_DIFF* macros Because of integer promotion the compiler is having a hard time generating efficient code to calculate TIMER_DIFF* macros in some situations. In the below example, the return value is "int", and this is causing the trouble. Example C code: int __attribute__ ((noinline)) test(uint8_t current_timer, uint8_t start_timer) { return TIMER_DIFF_8(current_timer, start_timer); } BEFORE: (with -Os) 00004c40 <test>: 4c40: 28 2f mov r18, r24 4c42: 30 e0 ldi r19, 0x00 ; 0 4c44: 46 2f mov r20, r22 4c46: 50 e0 ldi r21, 0x00 ; 0 4c48: 86 17 cp r24, r22 4c4a: 20 f0 brcs .+8 ; 0x4c54 <test+0x14> 4c4c: c9 01 movw r24, r18 4c4e: 84 1b sub r24, r20 4c50: 95 0b sbc r25, r21 4c52: 08 95 ret 4c54: c9 01 movw r24, r18 4c56: 84 1b sub r24, r20 4c58: 95 0b sbc r25, r21 4c5a: 93 95 inc r25 4c5c: 08 95 ret AFTER: (with -Os) 00004c40 <test>: 4c40: 86 1b sub r24, r22 4c42: 90 e0 ldi r25, 0x00 ; 0 4c44: 08 95 ret Note: the example is showing -Os but improvements can be seen at all optimization levels, including -O0. We never use -O0, but I tested it to make sure that no extra code is generated in that case.OA * quantum/debounce: Fix custom wrapping timers in eager_pr and eager_pk debounce algorithms Please see the below simulated sequence of events: Column A is the 16-bit value returned by read_timer(); Column B is the value returned by custom_wrap_timer_read(); Column C is the original code: (timer_read() % MAX_DEBOUNCE) A, B, C 65530, 19, 30 65531, 20, 31 65532, 21, 32 65533, 22, 33 65534, 23, 34 65535, 24, 35 0 25, 0 1, 26, 1 2, 27, 2 3, 28, 3 4, 29, 4 5, 30, 5 read_timer() wraps about every 1.09 seconds, and so debouncing might fail at these times without this commit. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. * quantum/debounce/eager_pr and eager_pk: modifications for code readability according to code review. (2)
Description
This PR makes 3 main changes
For details please see each commit message in turn.
Types of Changes
Issues Fixed or Closed by This PR
Checklist
NOTE: I have tested the changes using a atmega32u4 devboard, with some synthetic tests. I have not tested with an actual keyboard.