Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added improvements to color scaling and blurring #3904

Merged
merged 3 commits into from
Apr 16, 2024

Conversation

DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Apr 15, 2024

  • made blurring faster by not writing the color and then reading it back but keeping it as a variable: on a C3, FX black hole goes from 55FPS to 71FPS
  • added optional parameter to blur function (smear) that can be used in combination with SEGMENT.clear(), blurring the frame without dimming the current display (repeated calls without clearing will flood the segment). this is useful to blur without 'motion blurring' being added.
  • scale8 is inlined and repeated calls uses flash, plus it is slower than native 32bit, so I added 'color_scale' function which is native 32bit and scales 32bit colors (RGBW).
  • bonus: changes save roughly 600bytes of flash memory

-changes save roughly 600bytes of flash
-made blurring faster by not writing the color and then reading it back but keeping it as a variable: on a C3, FX black hole goes from 55FPS to 71FPS
-added optional parameter to blur (smear) that can be used in combination with SEGMENT.clear(), blurring the frame without dimming the current frame (repeated calls without clearing will result in white). this is useful to blur without 'motion blurring' being added
-scale8 is inlined and repeated calls uses flash, plus it is slower than native 32bit, so I added 'color_scale' function which is native 32bit and scales 32bit colors (RGBW).
Copy link
Collaborator

@blazoncek blazoncek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I approve of the changes with minor issues regarding formatting and unnecessary modifications.

I would avoid introducing new function color_scale() when color_fade() is intended to do the same. If you find the fade misleading (as opposed to scale) then please change the original function name instead. There are only a handful of references used, but I would recommend against it.

wled00/FX_2Dfcn.cpp Outdated Show resolved Hide resolved
wled00/FX_2Dfcn.cpp Outdated Show resolved Hide resolved
wled00/FX_2Dfcn.cpp Outdated Show resolved Hide resolved
wled00/FX_fcn.cpp Outdated Show resolved Hide resolved
wled00/FX_fcn.cpp Outdated Show resolved Hide resolved
wled00/colors.cpp Outdated Show resolved Hide resolved
wled00/fcn_declare.h Outdated Show resolved Hide resolved
@DedeHai
Copy link
Collaborator Author

DedeHai commented Apr 16, 2024

Thanks for the review (and for spotting that bug in fx_fcn, it should indeed be 255, leftover from testing). I will fix the unnecessary changes, that was my auto-format plugin that sometimes does this if I am not careful. The new function should indeed be integrated into color_fade, this resulting code is what came from a lot of code changes that did not make it into the final version.

also replaced scale8_video with 32bit calculation in color_fade for consistency and speed.
Undo indent
@blazoncek blazoncek merged commit d126611 into Aircoookie:0_15 Apr 16, 2024
18 checks passed
@blazoncek
Copy link
Collaborator

FYI I do not see any performance gain on ESP32 or ESP8266.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Apr 16, 2024

You can test it using 'Black Hole' FX. Here are my test result:
ESP C3, 16x16: 55FPS -> 71FPS
ESP8266 (160MHz), 16x16: 29FPS -> 33FPS
ESP32, one output: 33FPS (32x16) or 62FPS (16x16) -> 55FPS (32x16) or 76FPS (16x16)
except on the ESP8266 this is a significant improvement.

@blazoncek
Copy link
Collaborator

blazoncek commented Apr 16, 2024

Well, in my case it is:

  • 46 FPS 20x20 ESP32
  • 55 FPS 16x16 ESP32-C3
  • 43 FPS 16x16 ESP32-S2
  • 58 FPS 16x16 ESP32-S3

Limit was set to 60FPS.

softhack007 pushed a commit to MoonModules/WLED-MM that referenced this pull request Apr 18, 2024
added improvements to color scaling and blurring
@softhack007
Copy link
Collaborator

softhack007 commented Apr 21, 2024

FYI I do not see any performance gain on ESP32 or ESP8266.

@DedeHai thanks for this PR 😃 I've also merged it into the MoonModules fork. It does give me the performance gain you describe, at least on larger fixtures (tested with 32x32 on 4 led pins). The main speedup comes from your optimizations of 2D blur, looks like color_add only has a minor part in the speedup. Good improvement, as blur() and fade_out() are indeed on a very critical path👍

@DedeHai
Copy link
Collaborator Author

DedeHai commented Apr 22, 2024

@softhack007 you are welcome. I discovered this 'insufficiency' in my particle system ventures where I noticed that accessing getPixelColorXY() and setPixelColorXY() are quite slow due to the many checks (and conversions) that need to be performed and blurring did it twice per pixel which I changed to only once per pixel, making it much faster.
The other changes give minor speed boosts, removing access to scale8() is not that much faster but uses way less code (scale8 is inlined each time). I did not change color_add() but color_fade(), saving some CPU cycles on all color fade functions.
there are many other places where the fastled functions that are optimized for 8bit ATMEGAs could be replaced with native 32bit functions which would save some more code but it is probably not worth the effort.
Further speed improvements could be made by rendering blur/fade to a local RGB buffer and then transferring that to the outputbuffer (which I am doing in the ParticleSystem, roughly cutting blur computation time in half).

@softhack007
Copy link
Collaborator

softhack007 commented Apr 22, 2024

Further speed improvements could be made by rendering blur/fade to a local RGB buffer and then transferring that to the outputbuffer (which I am doing in the ParticleSystem, roughly cutting blur computation time in half).

@DedeHai I think this is already happening in 0_15 if you set "Use global LED buffer" in LED settings - Pixels are rendered into a buffer, and GetPixelColor will return what's in the buffer. The complete buffer is transferred to the NPB driver on busses.show() in one shot.

🤔 Maybe conversions and several layers of abstractions - that happen between Segment::setPixelColor() and BusDigital::setPixelColor() - are still slowing down the process, so you see a 2X speedup when adding your own buffer on top...


(off-topic)

just for comparison, you could try to run your particle code in the MoonModules fork
We are a few months behind the mainstream "Aircoookie" code, however we have a number of experimental optimizations to improve speed on larger setups. A few differences you would need to consider:

  • disable global LED buffer, to allow buffering on segment level
  • your effects must explicitly request buffering with if (SEGENV.call == 0) SEGENV.setUpLeds();
  • use a buildenv like env:esp32_4MB_S which uses optimization for speed (i.e. "gcc -O2" instead of "gcc -Os").
  • back up your cfg.json and presets.json - some MM builds have a different partition table, so your old filesystem might get erased.
  • experiment with enabling this line - it prevents re-writing pixels with the same color (kind of dangerous, and some effects will not like it)

I would be interested to know if the MM code performs better for you. maybe we find the pieces that could be brought back into "upstream AC" to further optimize speed.

@blazoncek
Copy link
Collaborator

FYI a future version of NPB does away with double/tripple buffering and GetPixelColor() will return unmodified original SetPixelColor() value which will remove the need for global LED buffer. I am in talks with @Makuna for testing that. There are several optimisations in NPB as a result of this transition.

If we reintroduce my infamous setUpLeds() is debatable and quite possible as a remedy for numerous translations along the way of setPixelColor(). It will not be CRGB though.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Apr 22, 2024

Maybe conversions and several layers of abstractions - that happen between Segment::setPixelColor() and BusDigital::setPixelColor() - are still slowing down the process, so you see a 2X speedup when adding your own buffer on top...

the issue is not the buffers themselves but the checking that setPixelColor() (rightfully) does. Using a local buffer with a known size can get away without all those checks and just straight forward copy the data, which is a lot faster. the blur function acts on a known segment size and could safely do so.

@blazoncek
Copy link
Collaborator

Using a local buffer with a known size can get away without all those checks and just straight forward copy the data, which is a lot faster. the blur function acts on a known segment size and could safely do so.

That's what setUpLeds() was all about. But it was incompatible with Harry Baas segment set-up so it had to go.

@Makuna
Copy link

Makuna commented Apr 22, 2024

Maybe conversions and several layers of abstractions - that happen between Segment::setPixelColor() and BusDigital::setPixelColor() - are still slowing down the process, so you see a 2X speedup when adding your own buffer on top...

the issue is ... the checking that setPixelColor() (rightfully) does.

the Pixels() method returns a pointer to buffer directly, You can use that to bypass the range checking.

@blazoncek
Copy link
Collaborator

blazoncek commented Apr 22, 2024

@Makuna we have translation from Segment (a virtual canvas) into pixel strip. Segments can overlap and can also encompass inexistent pixels so quite a few translations have to be made to set/get exact pixel.

EDIT: @DedeHai overlapping Segments will wreck havoc if you do not blend local buffers correctly. Which you can't from within Segment environment (effect function).

@DedeHai
Copy link
Collaborator Author

DedeHai commented Apr 23, 2024

overlapping Segments will wreck havoc if you do not blend local buffers correctly. Which you can't from within Segment environment (effect function).

how so? OR: how are overlapping segments handled in general? What I am wondering: with overlapping segments setPixelColor() will overwrite what the segment put there or is there a check for that and it will blend?

@blazoncek
Copy link
Collaborator

Segments are individual and do not share any data between themselves - except actual LED color returned by getPixelColor() if they overlap and non-local LED buffer is used.

Non-local LED buffer is either global buffer or NPB's edit buffer. The problem with (current) NPB's edit buffer is the modified LED data if brightness is set to anything other than 255 (or 65535 for 16bit) as when you apply setBrightness() every setPixelColor() is scaled according to brightness and stored in edit buffer as such. When retrieved with getPixelColor() it is no longer equal to what was set with setPixelColor(). That's why global buffer was introduced.

setUpLeds() introduced local LED buffer (prior to introducing global buffer) so that effect that relied on previous pixel values worked correctly (several Audioreactive/FastLED effects). With local buffer every Segment was totally independent from any other Segment. This was removed when global buffer was added as Harry Baas segment set-up didn't work with local buffer. This was a demand from SR team (and MM fork implemented several of their own solutions to tackle this).

How overlapping segments behave (with non-local buffer) very much depends on how the effects are written. Some will take into account previous/underlying pixel values some will not. So you can expect anything from only one effect showing to uncontrolled flickering and since segments are stored in a vector their order may get shifted when vectors are reallocated in memory (though that should be rare the case).

@DedeHai
Copy link
Collaborator Author

DedeHai commented Apr 23, 2024

Ok, I think I understand how it's done. I just tested overlapping segments and FX that use additive colors do overlap nicely and FX that use setPixelColor just overwrite the lower numbered segment.
I don't see a good solution that would always work except using a local persistent buffer per segment and then blending the overlapping segments by adding the colors before transferring it to the NPB or global buffer. At least for RGB this would work but since I do not have a full picture on all the variation you already tried this may just be what you had with setUpLeds().

Edit: was just thinking if I could somehow make the particle system render on overlapping segments and I think it is currently not possible. It relies on a black frame and each particle will add its color to it, so I cannot use getPixelColor to blend with other FX. A global rendering function would have to take care of blending the segments, as described above.

@Makuna
Copy link

Makuna commented Apr 23, 2024

A global rendering function would have to take care of blending the segments, as described above.

Out of band question. When you render your buffers are you thinking it would be multiple being blended into a destination? How many source buffers? And the destination buffer by something like a GetPixelColor? I was just experimenting with a two source buffers for a render (NeoPixelBus terms which are similar to PC game graphics terms), and was considering how many sources would be useful.

@blazoncek
Copy link
Collaborator

blazoncek commented Apr 23, 2024

FastLED implemented + operator (and +=) to simplify blending of two colors (it just added respective RGB channels) so many FastLED examples will exhibit something like:

CRGB leds[200];
CRGB col1 = CRGB::RED;
CRGB col2 = CRGB::GREEN;
CRGB yellow = col1 + col2;
...
leds[100] = CRGB::BLUE;
leds[100] += yellow;

So the + operator was actually SetPixelColor(x, blend(GetPixelColor(x), somenewcolor)); in NPB terms. Where blend() would be a function that somehow combines two RgbxColor objects.

Currently in WLED we don't do any special blending (though I did have a branch that blended two local buffers) as Segments are rendered one after the other and effect functions are all over the place regarding how they treat existing pixels (some will fill(BLACK), etc). As said, mostly FastLED ported effects relied on previous pixel values but not exclusively.

@Makuna
Copy link

Makuna commented Apr 23, 2024

Thanks, enough with my side bar conversations.
JFYI: In NPB, blending is more flexible in the amounts of each source to blend by supplying a progress value.

    static RgbColor LinearBlend(const RgbColor& left, const RgbColor& right, float progress);
   static RgbColor LinearBlend(const RgbColor& left, const RgbColor& right, uint8_t progress);

I wonder if that snippet you provided is a bilinear blend?

    static RgbColor BilinearBlend(const RgbColor& c00, 
        const RgbColor& c01, 
        const RgbColor& c10, 
        const RgbColor& c11, 
        float x, 
        float y);

@DedeHai DedeHai deleted the FX_fcn_improvements branch May 14, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants