Relax OS_Time IRQHandler #2833

kisslorand · 2023-09-10T13:18:25Z

Requirements

BTT or MKS TFT.

Description

This PR is an attempt to ease up the OS timer IRQ handler. Major credit goes to @rondlh who brought it to our attention here: #2832. More than that @rondlh made several suggestions to make the IRQ handler lighter on the MCU, suggestions that are included in this PR.
The extra changes are:

remove modulo operation which is expensive and replace it with bare simple arithmetic ones
~~remove the usage of any conditional and ternary from one of the functions called in that interrupt handler and replaced it with bare arithmetic operations~~
moved the interrupt flag reset to the end of the operations to avoid any unpredictable behaviour of the IRQ handler

Benefits

In theory it speeds up the TFT.

rondlh · 2023-09-10T14:42:21Z

Brilliant improvement for updatePrintTime, but why did you make void loopTouchScreen 2 to 3 times slower than before? Sabotage?

kisslorand · 2023-09-10T15:00:42Z

Sorry, my sense of humor might be very different from yours, I really cannot decide it was meant as a joke or you are asking for real.

rondlh · 2023-09-10T15:04:11Z

Sorry, my sense of humor might be very different from yours, I really cannot decide it was meant as a joke or you are asking for real.

I'm asking for real, your implementation is 2-3 times slower than before, the sabotage part is obviously a joke.

kisslorand · 2023-09-10T15:05:31Z

Can you pleased share the benchmark you used to measure the performance decrease?
All I see is performance increase there.

rondlh · 2023-09-10T15:06:26Z

Can you pleased share the benchmark you used to measure the performance decrease?
All I see is performance increase there.

Show me... I theoretically can see your implementation is much slower

kisslorand · 2023-09-10T15:07:19Z

Show what?

  touch = !XPT2046_Read_Pen() * (touch + !touchScreenIsPress);
  touchScreenIsPress = !(TOUCH_DEBOUNCE_MS - touch);

Are we talking about these 2 lines?

rondlh · 2023-09-10T15:10:50Z

Yes, I'm talking about loopTouchScreen.
You say you see performance increase, what did you see/find? What test? Any numbers?

kisslorand · 2023-09-10T15:17:25Z

How did you come to the conclusion it is 2-3 slower?

Just to clarify: there's no sabotage intention, I spent quite some time to try to increase the performance of that function.

rondlh · 2023-09-10T15:20:22Z

How did you came to the conclusion it is 2-3 slower?

Like I said, theoretically I can see that. But you said:

All I see is performance increase there.

So the question is... how did you see performance improvement?

kisslorand · 2023-09-10T15:21:05Z

I have the same question, how do you see performance decrease?

rondlh · 2023-09-10T15:23:43Z

There is "an issue" in your code, similar to the issue in "ADVANCED_OK", that stops it from being efficient.

kisslorand · 2023-09-10T15:26:19Z

You said there's no need for benchmarks, but you just intrigued me. I will make some benchmarks and will come back with the results. It might take an hour or so.

rondlh · 2023-09-10T15:32:59Z

OK, take it easy.
May I ask what nationality you are? (I'm Dutch)

kisslorand · 2023-09-10T15:33:17Z

I am already on it. ;)

kisslorand · 2023-09-10T15:54:37Z

I have the results. :)

rondlh · 2023-09-10T16:06:10Z

How did you test?
Can you show me the code?

kisslorand · 2023-09-10T16:33:40Z

Actually the two codes compiled do perform the same, the compiler does its job quite well. The difference above could be a rounding issue as the timer is counting microseconds (I did more tests and the results were identical for both code versions). The previous test were done on a Cortex-M4 MCU, I have the results on a Cortex-M3 MCU also:

I'm aware that it is harder to understand the code I wrote, I have no issue undoing it and keeping the original code since I know now that the compiler knows how to optimize it.

This test was interesting for me, I wasn't aware that a Cortex-M4 MCU is so much faster than a Cortex-M3.

kisslorand · 2023-09-10T19:27:03Z

May I ask what nationality you are? (I'm Dutch)

There's no secret, I am Hungarian. You could already tell that since I made many additions/changes to the Hungarian language pack.

rondlh · 2023-09-11T01:39:44Z

My tests show that your loopTouchScreen is only 25% slower than the original code (STM32F207), not the 2-3 times I theoretically expected. Only code readability has decreased by a factor of 2-3.

You can make loopTouchScreen a tiny bit faster by dropping the guard inversion, and swapping the normal and else case. This trick can also be used elsewhere in the code, but the benefit is very small.
The DMA serial writing I have under test has significantly more impact.

Slightly faster code:

void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  if (XPT2046_Read_Pen())
  {
    touchScreenIsPress = false;
    touch = 0;
  }
  else
  {
    if (touch >= 20)  // 20ms
    {
      touchScreenIsPress = true;
    }
    else
    {
      touch++;
    }
  }
}

kisslorand · 2023-09-11T04:47:35Z

What speed increase did your test show for your proposed code?

rondlh · 2023-09-11T05:27:58Z

What speed increase did your test show for your proposed code?

A few percent only.

Your modulo calculation replacement using a counter is faster. Usually you should count down if you do this, not up, but for modern compilers it might be irrelevant. For the TFT performance it has virtually no impact.
Confirmed: Counting down is about 15% faster than counting up (only considering the counting part).

So overall, the conclusions in this PR:
Cons:

loopTouchScreen has become about 25% slower
loopTouchScreen is completely unreadable and unmaintainable
most of the changes are just copied from Added Marlin ADVANCED_OK + static ADVANCED_OK support #2824
Pros
TIM7_IRQHandler is a bit faster (still not optimal)
Some unused code was removed

kisslorand · 2023-09-11T19:59:05Z

Opinions are always welcome but taken with a grain of salt.

rondlh · 2023-09-12T13:46:38Z

Where is the opinion in this? I only see facts.

loopTouchScreen has become about 25% slower

This is a fact, even your own tests show that at best loopTouchScreen has not become faster, only more unreadable. That your code is slower was obvious to me immediately when I saw it, although I overestimated the speed decrease that it causes. Look again at the code... it's obvious! If you still don't see it then I can show it to you.

loopTouchScreen is completely unreadable and unmaintainable

Fact, you also agree to this above.

most of the changes are just copied from

Very likely, or perhaps you came up with the exact same idea a day after @digant73 did

TIM7_IRQHandler is a bit faster (still not optimal)

Fact, like I showed you above, you need to count down, not up to be faster.

Some unused code was removed

Fact (+43 −293), but if you deny it I'm willing to take your side.

Your code should not be merged in it's current state! I don't understand that you keep "force pushing" a merge while being aware of the issues in your code.

BTW: Force-pushing slower and less readable code is sabotage in my book

rondlh · 2023-09-12T14:15:34Z

Can somebody please review this code change force-pushed by @kisslorand?
Please consider execution speed and code readability. In my view it's obvious, but I still did a benchmark which confirmed my suspicions. This is a routine handling the touch screen presses of the TFT and is executed 1000x a second.

ORIGINAL CODE:

static volatile bool touchScreenIsPress = false;
void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  if (!XPT2046_Read_Pen())
  {
    if (touch >= 20)  // 20ms
    {
      touchScreenIsPress = true;
	  lastTouchTime = OS_GetTimeMs(); // IRON, ADDED TO RECORD TOUCH TIME
    }
    else
    {
      if (OS_GetTimeMs() - lastTouchTime > 250) // IRON, PREVENT TOO FAST TOUCHING
      touch++;
    }
  }
  else
  {
    touchScreenIsPress = false;
    touch = 0;
  }
}

@kisslorand VERSION: (comments removed that hold the original version, I wonder why those were left in the code)

static volatile bool touchScreenIsPress = false;
void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  touch = !XPT2046_Read_Pen() * (touch + !touchScreenIsPress);
  touchScreenIsPress = !(TOUCH_DEBOUNCE_MS - touch);
}

UPDATE: @kisslorand has seen the light and updated loopTouchScreen to:

void loopTouchScreen(void)  // Handle in interrupt
{
  if (XPT2046_Read_Pen() == LOW)
    touchCountdown -= !!touchCountdown;
  else
    touchCountdown = TOUCH_DEBOUNCE_MS;
}

Eliminating a variable (touchScreenIsPress) makes sense as it basically represented the same thing as touch was. It's a bit of a dirty trick with touchCountdown, but this should work fine and is a bit faster, good job.

kisslorand · 2023-09-13T20:22:44Z

@kisslorand has seen the light

Actually what I've seen is the rookie mistake I made benchmarking the original function and mine. In the benchmark I didn't use the variables from the functions benchmarked, the compiler optimized the code and eliminated the unused variables.
A lesson for everybody, you are not benchmarking the code you wrote, you are benchmarking the code the compiler generates!

After the realization of my mistake I redid the tests, the results are the following:

Where "My new code" is:

void loopTouchScreen(void)  // Handle in interrupt
{
  if (XPT2046_Read_Pen() == LOW)
    touchCountdown -= !!touchCountdown;
  else
    touchCountdown = TOUCH_DEBOUNCE_MS;
}

"My old code" is:

void loopTouchScreen(void)  // Handle in interrupt
{
  static uint8_t touch;
  touch = !XPT2046_Read_Pen() * (touch + !touchScreenIsPress);
  touchScreenIsPress = !(TOUCH_DEBOUNCE_MS - touch);
}

"Ori code" is:

void loopTouchScreen(void)  // Handle in interrupt
{
    static uint8_t touch;
    if (!XPT2046_Read_Pen())
    {
      if (touch >= 20)  // 20ms
      {
        touchScreenIsPress = true;
      }
      else
      {
        touch++;
      }
    }
    else
    {
      touchScreenIsPress = false;
      touch = 0;
    }
}

"Dumb code" is same as "Ori code" but variables not being used, it can be observed how the optimization of compiler the kicks in.

So @rondlh was right, my old code was 20% slower than the original code (not 25% as he stated, but who knows, maybe my tests are still inaccurate).

I have yet to verify the up/down count speed theory, although it is about the comparison to 0 (zero) and not the direction of the count. The specialized literature mentions that it was a thing in the 1980s but modern compilers and MCUs do not exhibit this behaviour but better test it, right?

Note: negative criticism is good as long as it is done in a civilized manner. Github also have this rule: "criticize the idea, not the person".

kisslorand · 2023-09-13T20:50:48Z

Up/Down count and comparison to zero results:

Upcount: count from 0 to X
Downcount: count from X + 10 down to 10
CompToZero: count from X down to 0

The difference is clearly from comparison to 0 (zero), not from the count direction. The results of @rondlh for down count being more faster than up count is probably because he tested a down count to 0 (zero). It is the comparison to 0 (zero) that makes down count faster. Anyhow I can only see a 3% difference, not 15%, but in any case it is a free trick for speed.

Everyday we learn something...

rondlh · 2023-09-14T19:16:30Z

What do you aim to achieve by moving `TIMER_INTF(TIMER6) &= (uint16_t)~(1<<0); to the back of the ISR?
Generally that's not the right thing to do, but I'm curios to hear your explanation.

kisslorand · 2023-09-14T19:54:15Z

In many microcontroller architectures, including those that use the STM32 series with the STM32 HAL library, clearing the interrupt flag at the end of the ISR is the recommended and standard practice (check TIM3, the buzzer interrupt). This is to ensure that the ISR completes its main tasks before allowing the processor to respond to new interrupts, minimizing interrupt latency and potential issues with nested interrupts.
In our case in reality it is irrelevant where's the interrupt reset done in the ISR for TIM7. I guess I did it out of habit.

rondlh · 2023-09-15T07:59:02Z

I also think it is irrelevant in this case because the ISR will terminate long before the next interrupt arrives.

If you have experience with this kind of thing then perhaps you can have a look at the serial write DMA interrupt handler I'm testing. I still have incidental issues (typically 1 byte lost, not received by the host). It's build on top of #2824, not sure if it will work for the current source base. (STM32F2_4 platform only!).
src-user-HAL-DMA Writing.zip
srt-user-API-SerialConnection.zip

I have been testing this with the serial line idle interrupt disabled (manual update of wIndex in Serial_Get). That seems to work fine (still testing), but when I enable the serial line idle interrupt then I see some rare issues (1 byte lost, not received by the host). Note that I use the same ISR for the serial idle interrupt (reading) and the DMA serial writing (transfer complete interrupt). I don't use the DMA transfer complete interrupt because that is very complex, but probably it's the better choice.
Any insight would be appreciate.

UPDATE: Nice to see you like it. I think the code is actually fine. I slightly improved it in a search to find an issue I am having. But the issue is not causes by the DMA transfer. I have some data corruption at very high print speeds (> 500%), sometimes a byte is missed at the host, about 1 byte in 6MB. But I see the same issue when using Interrupt based serial writing and unbuffered writing. Serial speed is at 250k Baud, now I'm testing at 115200 baud to see if that solves the issue.

rondlh · 2023-09-21T17:21:41Z

#2824 now contains a performance benchmark that you can enable in Configuration.h (DEBUG_MONITORING). When enabled, in the notification menu you will find another button to bring you to the monitoring screen. The scan rate will show you how many times the loopProcess is executed. This way you can test the impact of your improvement easily.

kisslorand · 2023-09-21T20:01:37Z

Thanks for the info, I did that.
The speed increase is at best 2% compared to the original code. I even tested modulo with 1024 (os_timer & 0x03ff == 0) which gave occasionally at best 3% speed increase. It's amazing how practice beats theory. :)
I think this PR can be entirely ignored given the new data.
Yes, the TIM7 is 10% faster than it was originally but in the big scheme of the entire FW it gives at best 2% speed increase.

rondlh · 2023-09-21T21:00:34Z

So you mean that this PR increases the scan rate by about 2%?

kisslorand · 2023-09-22T03:43:17Z

So you mean that this PR increases the scan rate by about 2%?

Yes.

rondlh · 2023-09-22T05:47:02Z

What concrete numbers do you get? Before and after scan rates? What hardware do you use?

kisslorand · 2023-09-22T06:16:03Z

The checking was done on a BTT TFT35 V3.0 with STM32F207

The scan rate on average was:

master branch -> 29400~29600
this PR -> 29900~30100

rondlh · 2023-09-22T08:39:18Z

So you mean that this PR increases the scan rate by about 2%?

Yes.

The checking was done on a BTT TFT35 V3.0 with STM32F207

The scan rate on average was:
1. master branch -> 29400~29600

2. this PR -> 29900~30100

I think you are doing something very wrong, nothing you state here makes any sense.
The speed improvement is a solid 0.0% (BTT TFT35 V3.0)
2% is complete non-sense, anybody can see that when looking at the tiny source code changes.

kisslorand · 2023-09-24T20:29:42Z

I think you are doing something very wrong, nothing you state here makes any sense.

It's the opinion of the same person who has the strong belief that counting down is faster than counting up. :)

2% is complete non-sense, anybody can see that when looking at the tiny source code changes.

A real professional argument brought to the table! :)

I guess it's true what they say, a circus is not a circus without a clown.

github-actions · 2024-04-20T01:48:44Z

This PR has been automatically marked as stale because it has had no activity for the last 60 days. It will be closed in 7 days if no further activity occurs. Thank you for your contribution.

kisslorand force-pushed the OS_Time branch 5 times, most recently from 537ccee to 4111f8e Compare September 10, 2023 13:53

rondlh mentioned this pull request Sep 10, 2023

[BUG] Buggy time handling code (doesn't consider timer rollover) #2832

Closed

kisslorand force-pushed the OS_Time branch 2 times, most recently from 03ba16b to 2bc45e4 Compare September 11, 2023 22:11

kisslorand force-pushed the OS_Time branch from 2bc45e4 to cdc59b5 Compare September 26, 2023 07:52

rondlh mentioned this pull request Sep 29, 2023

[Q] @bigtreetech Faulty/buggy PR #2844 by Kisslorand, PLEASE DO NOT MERGE #2847

Closed

kisslorand mentioned this pull request Oct 8, 2023

[FR] New Artillery not flashing LCD on GD32F305 #2391

Closed

kisslorand force-pushed the OS_Time branch 4 times, most recently from 0f41e73 to 298df96 Compare November 7, 2023 05:06

Relax OS_Time IRQHandler

db9c0d6

kisslorand force-pushed the OS_Time branch from 298df96 to db9c0d6 Compare November 7, 2023 19:17

github-actions bot added the Stale label Apr 20, 2024

Relax OS_Time IRQHandler #2833

Are you sure you want to change the base?

Relax OS_Time IRQHandler #2833

Conversation

kisslorand commented Sep 10, 2023 • edited Loading

Requirements

Description

Benefits

rondlh commented Sep 10, 2023

kisslorand commented Sep 10, 2023

rondlh commented Sep 10, 2023

kisslorand commented Sep 10, 2023

rondlh commented Sep 10, 2023

kisslorand commented Sep 10, 2023

rondlh commented Sep 10, 2023

kisslorand commented Sep 10, 2023 • edited Loading

rondlh commented Sep 10, 2023

kisslorand commented Sep 10, 2023

rondlh commented Sep 10, 2023

kisslorand commented Sep 10, 2023

rondlh commented Sep 10, 2023 • edited Loading

kisslorand commented Sep 10, 2023

kisslorand commented Sep 10, 2023

rondlh commented Sep 10, 2023

kisslorand commented Sep 10, 2023 • edited Loading

kisslorand commented Sep 10, 2023

rondlh commented Sep 11, 2023 • edited Loading

kisslorand commented Sep 11, 2023

rondlh commented Sep 11, 2023 • edited Loading

kisslorand commented Sep 11, 2023 • edited Loading

rondlh commented Sep 12, 2023 • edited Loading

rondlh commented Sep 12, 2023 • edited Loading

kisslorand commented Sep 13, 2023 • edited Loading

kisslorand commented Sep 13, 2023 • edited Loading

rondlh commented Sep 14, 2023

kisslorand commented Sep 14, 2023

rondlh commented Sep 15, 2023 • edited Loading

rondlh commented Sep 21, 2023

kisslorand commented Sep 21, 2023

rondlh commented Sep 21, 2023

kisslorand commented Sep 22, 2023

rondlh commented Sep 22, 2023 • edited Loading

kisslorand commented Sep 22, 2023

rondlh commented Sep 22, 2023

kisslorand commented Sep 24, 2023

github-actions bot commented Apr 20, 2024

kisslorand commented Sep 10, 2023 •

edited

Loading

kisslorand commented Sep 10, 2023 •

edited

Loading

rondlh commented Sep 10, 2023 •

edited

Loading

kisslorand commented Sep 10, 2023 •

edited

Loading

rondlh commented Sep 11, 2023 •

edited

Loading

rondlh commented Sep 11, 2023 •

edited

Loading

kisslorand commented Sep 11, 2023 •

edited

Loading

rondlh commented Sep 12, 2023 •

edited

Loading

rondlh commented Sep 12, 2023 •

edited

Loading

kisslorand commented Sep 13, 2023 •

edited

Loading

kisslorand commented Sep 13, 2023 •

edited

Loading

rondlh commented Sep 15, 2023 •

edited

Loading

rondlh commented Sep 22, 2023 •

edited

Loading