-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relax OS_Time IRQHandler #2833
base: master
Are you sure you want to change the base?
Relax OS_Time IRQHandler #2833
Conversation
537ccee
to
4111f8e
Compare
Brilliant improvement for updatePrintTime, but why did you make void loopTouchScreen 2 to 3 times slower than before? Sabotage? |
Sorry, my sense of humor might be very different from yours, I really cannot decide it was meant as a joke or you are asking for real. |
I'm asking for real, your implementation is 2-3 times slower than before, the sabotage part is obviously a joke. |
Can you pleased share the benchmark you used to measure the performance decrease? |
Show me... I theoretically can see your implementation is much slower |
Show what?
Are we talking about these 2 lines? |
Yes, I'm talking about loopTouchScreen. |
How did you come to the conclusion it is 2-3 slower? Just to clarify: there's no sabotage intention, I spent quite some time to try to increase the performance of that function. |
Like I said, theoretically I can see that. But you said:
So the question is... how did you see performance improvement? |
I have the same question, how do you see performance decrease? |
There is "an issue" in your code, similar to the issue in "ADVANCED_OK", that stops it from being efficient. |
You said there's no need for benchmarks, but you just intrigued me. I will make some benchmarks and will come back with the results. It might take an hour or so. |
OK, take it easy. |
I am already on it. ;) |
How did you test? |
Actually the two codes compiled do perform the same, the compiler does its job quite well. The difference above could be a rounding issue as the timer is counting microseconds (I did more tests and the results were identical for both code versions). The previous test were done on a Cortex-M4 MCU, I have the results on a Cortex-M3 MCU also: I'm aware that it is harder to understand the code I wrote, I have no issue undoing it and keeping the original code since I know now that the compiler knows how to optimize it. This test was interesting for me, I wasn't aware that a Cortex-M4 MCU is so much faster than a Cortex-M3. |
There's no secret, I am Hungarian. You could already tell that since I made many additions/changes to the Hungarian language pack. |
My tests show that your loopTouchScreen is only 25% slower than the original code (STM32F207), not the 2-3 times I theoretically expected. Only code readability has decreased by a factor of 2-3. You can make loopTouchScreen a tiny bit faster by dropping the guard inversion, and swapping the normal and else case. This trick can also be used elsewhere in the code, but the benefit is very small.
|
What speed increase did your test show for your proposed code? |
A few percent only. Your modulo calculation replacement using a counter is faster. Usually you should count down if you do this, not up, but for modern compilers it might be irrelevant. For the TFT performance it has virtually no impact. So overall, the conclusions in this PR:
|
Opinions are always welcome but taken with a grain of salt. |
03ba16b
to
2bc45e4
Compare
Where is the opinion in this? I only see facts.
This is a fact, even your own tests show that at best loopTouchScreen has not become faster, only more unreadable. That your code is slower was obvious to me immediately when I saw it, although I overestimated the speed decrease that it causes. Look again at the code... it's obvious! If you still don't see it then I can show it to you.
Fact, you also agree to this above.
Very likely, or perhaps you came up with the exact same idea a day after @digant73 did
Fact, like I showed you above, you need to count down, not up to be faster.
Fact (+43 −293), but if you deny it I'm willing to take your side. Your code should not be merged in it's current state! I don't understand that you keep "force pushing" a merge while being aware of the issues in your code. BTW: Force-pushing slower and less readable code is sabotage in my book |
Can somebody please review this code change force-pushed by @kisslorand? ORIGINAL CODE:
@kisslorand VERSION: (comments removed that hold the original version, I wonder why those were left in the code)
UPDATE: @kisslorand has seen the light and updated
Eliminating a variable ( |
Actually what I've seen is the rookie mistake I made benchmarking the original function and mine. In the benchmark I didn't use the variables from the functions benchmarked, the compiler optimized the code and eliminated the unused variables. After the realization of my mistake I redid the tests, the results are the following: Where "My new code" is:
"My old code" is:
"Ori code" is:
"Dumb code" is same as "Ori code" but variables not being used, it can be observed how the optimization of compiler the kicks in. So @rondlh was right, my old code was 20% slower than the original code (not 25% as he stated, but who knows, maybe my tests are still inaccurate). I have yet to verify the up/down count speed theory, although it is about the comparison to 0 (zero) and not the direction of the count. The specialized literature mentions that it was a thing in the 1980s but modern compilers and MCUs do not exhibit this behaviour but better test it, right? Note: negative criticism is good as long as it is done in a civilized manner. Github also have this rule: "criticize the idea, not the person". |
Up/Down count and comparison to zero results: Upcount: count from 0 to X The difference is clearly from comparison to 0 (zero), not from the count direction. The results of @rondlh for down count being more faster than up count is probably because he tested a down count to 0 (zero). It is the comparison to 0 (zero) that makes down count faster. Anyhow I can only see a 3% difference, not 15%, but in any case it is a free trick for speed. Everyday we learn something... |
What do you aim to achieve by moving `TIMER_INTF(TIMER6) &= (uint16_t)~(1<<0); to the back of the ISR? |
In many microcontroller architectures, including those that use the STM32 series with the STM32 HAL library, clearing the interrupt flag at the end of the ISR is the recommended and standard practice (check TIM3, the buzzer interrupt). This is to ensure that the ISR completes its main tasks before allowing the processor to respond to new interrupts, minimizing interrupt latency and potential issues with nested interrupts. |
I also think it is irrelevant in this case because the ISR will terminate long before the next interrupt arrives. If you have experience with this kind of thing then perhaps you can have a look at the serial write DMA interrupt handler I'm testing. I still have incidental issues (typically 1 byte lost, not received by the host). It's build on top of #2824, not sure if it will work for the current source base. (STM32F2_4 platform only!). I have been testing this with the serial line idle interrupt disabled (manual update of wIndex in Serial_Get). That seems to work fine (still testing), but when I enable the serial line idle interrupt then I see some rare issues (1 byte lost, not received by the host). Note that I use the same ISR for the serial idle interrupt (reading) and the DMA serial writing (transfer complete interrupt). I don't use the DMA transfer complete interrupt because that is very complex, but probably it's the better choice. UPDATE: Nice to see you like it. I think the code is actually fine. I slightly improved it in a search to find an issue I am having. But the issue is not causes by the DMA transfer. I have some data corruption at very high print speeds (> 500%), sometimes a byte is missed at the host, about 1 byte in 6MB. But I see the same issue when using Interrupt based serial writing and unbuffered writing. Serial speed is at 250k Baud, now I'm testing at 115200 baud to see if that solves the issue. |
#2824 now contains a performance benchmark that you can enable in Configuration.h (DEBUG_MONITORING). When enabled, in the notification menu you will find another button to bring you to the monitoring screen. The scan rate will show you how many times the loopProcess is executed. This way you can test the impact of your improvement easily. |
Thanks for the info, I did that. |
So you mean that this PR increases the scan rate by about 2%? |
Yes. |
What concrete numbers do you get? Before and after scan rates? What hardware do you use? |
The checking was done on a BTT TFT35 V3.0 with STM32F207 The scan rate on average was:
|
I think you are doing something very wrong, nothing you state here makes any sense. |
It's the opinion of the same person who has the strong belief that counting down is faster than counting up. :)
A real professional argument brought to the table! :) I guess it's true what they say, a circus is not a circus without a clown. |
0f41e73
to
298df96
Compare
This PR has been automatically marked as stale because it has had no activity for the last 60 days. It will be closed in 7 days if no further activity occurs. Thank you for your contribution. |
Requirements
BTT or MKS TFT.
Description
This PR is an attempt to ease up the OS timer IRQ handler. Major credit goes to @rondlh who brought it to our attention here: #2832. More than that @rondlh made several suggestions to make the IRQ handler lighter on the MCU, suggestions that are included in this PR.
The extra changes are:
remove the usage of any conditional and ternary from one of the functions called in that interrupt handler and replaced it with bare arithmetic operationsBenefits
In theory it speeds up the TFT.