Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve image decompression speed and handling #2414

Closed
matejcik opened this issue Jul 26, 2022 · 6 comments · Fixed by #2983
Closed

Improve image decompression speed and handling #2414

matejcik opened this issue Jul 26, 2022 · 6 comments · Fixed by #2983
Assignees
Labels
code Code improvements

Comments

@matejcik
Copy link
Contributor

comment thread from a different PR: #2339 (comment)

measurement results:

Some results from the testing:

  • the fastest is decompress as a whole, without window usage
  • slowest is to decompress byte by byte

This is where intuitive results end though. When decompressing by chunks, fastest seem to be using very small chunks of 2 or 4 bytes, after that, with increasing chunk size the speed actually goes down. I was testing this on two images, the best chunk size differ in each. I also tried comiling the uzlib with -O3 for one image, which sped up the decompressing quite nicely, but didn't change the dependence on the chunk size. Even when decompressing in one chunk when using window, its way slower than without window usage or small chunks.

Just to have some idea what increases we can get:

  • Decompressing by best chunk was 17% to 65% faster than per byte
  • Decompressing by 128 byte chunk was 7% to 40% faster than per byte.
  • Decompressing whole image without window was 70% to 110% faster than per byte.

This will differ a lot by image. Maybe testing on more images would bring better results.

So based on this, i would suggest using chunks of 4 bytes for decompression of large images (that means modify display_icon, display_image functions) and implementing new display_small_icon with some reasonable limits (loader using 64*64 so perhaps that) where we could do the whole decompression at once.

semantic issues:
B&W models (T1, TR) use display_icon to render the homescreen. The homescreen is certainly not an icon, and the important feature of "icon" is that its format is 4bpp alpha and so can be blended into different colored foregrounds/backgrounds -- unlike a "homescreen" which is an image at the display's native color resolution.
This is also incompatible with actual T1 homescreens which are stored as raw 1bpp data stream.

We probably need a separate function for rendering TR/T1 homescreens.
When we have that, we can limit display_icon accepted size: largest icons currently used are 24x24, so 576 pixels, or 288 bytes. That allows us to use window-less decompression into a static buffer.

Loaders are using a larger 64x64 center glyph, which is 4096 pixels or 2048 bytes. This could still be acceptable for window-less decompression.


possibly worth folding in: the new popups are using relatively large full-color icons. These could be split into smaller shape+glyph combinations: #2287 (comment)

@matejcik matejcik added the code Code improvements label Jul 26, 2022
@TychoVrahe
Copy link
Contributor

ad the 2287 issue: agree that splitting could be a way, though the shape is just circle and rounded triangle for which procedural implementation would make more sense. The glyph could then be greyscale, though it wouldn't fit the discussed 24*24 limit.

Maybe the display_image and display_icon functions are also named badly, wouldn't it be more fitting to call them display_16bpp_image and display_4bpp_image? The color is the real difference and neither does care about the size in current impl. Then it would actually make sense to also introduce display_1bpp_image function and small variants (either separate or hidden inside as discussed) with optimized decompression.

@matejcik
Copy link
Contributor Author

display_16bpp_image and display_4bpp_image?

That is an option -- but the real difference IMO is that display_icon takes fg and bg color and uses a blending table instead of just dumping pixel values on screen. I don't think we ever want to display a 4bpp image as-is, but we do want to display the T1/TR homescreen as-is.

FWIW, display_icon does essentially the same thing as rendering a single glyph in display_text_render right?

the shape is just circle and rounded triangle for which procedural implementation would make more sense.

could be, but would the procedural implementation actually be smaller? :)

@TychoVrahe
Copy link
Contributor

TychoVrahe commented Jul 26, 2022

ever want to display a 4bpp image as-is

yeah that doesn't seem to be very useful. but this blending could be similar to 1bpp image usage on TT, in T1/TR homescreens the FG and BG color are just implicit, and theoretically you could want them flipped for some reason

display_icon does essentially the same thing as rendering a single glyph in display_text_render right?

essentially, but the text has more complexity due to the bearing and it also handles the different BPP values internally (in compile time)

could be, but would the procedural implementation actually be smaller? :)

who knows without trying, but procedural impl of basic geometric shapes can be easily reused in different context, sizes etc, which, ultimately, should lead to smaller flash footprint

@TychoVrahe
Copy link
Contributor

One more thing to consider when optimizing decompression in animations like loader with icon is to do the decompression only once instead of during every frame. The decompressed data would(should) be stored on heap/gc allocated if used from rust.

@TychoVrahe TychoVrahe moved this to 🏃‍♀️ In progress in Firmware Aug 9, 2022
@TychoVrahe TychoVrahe self-assigned this Aug 9, 2022
@sime sime moved this from 🏃‍♀️ In progress to 🎯 To do in Firmware Oct 13, 2022
@matejcik
Copy link
Contributor Author

what remains to be done here? could we close, or transfer knowledge to Notion and then close?

@TychoVrahe
Copy link
Contributor

waiting for micropython update #2341 , after that i want to set optimization of the deflate function to -03.

@matejcik matejcik added the blocked Blocked by external force. Third party inputs required. label Jan 19, 2023
@Hannsek Hannsek removed the blocked Blocked by external force. Third party inputs required. label Apr 25, 2023
@TychoVrahe TychoVrahe moved this from 🎯 To do to 🏃‍♀️ In progress in Firmware Apr 25, 2023
@Hannsek Hannsek closed this as completed Apr 26, 2023
@Hannsek Hannsek reopened this Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code Code improvements
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants