BC4/5 fixes and performance improvements #18

cwoffenden · 2022-09-04T17:23:57Z

This fixes #17 but goes further:

Lots of text snipped, jump down to the next paragraph. Originally this expanded the internal endpoints to 14-bit, but in testing the RMSE and PSNR were always slightly worse even though the max error was reduced. These errors were higher due being calculated from the 8-bit PNG file, not the hardware's representation. Ryg's blog entry has a good explanation of the hardware.

I simplified this commit to address the main issue, which was blocks with two (or few) values having errors in hardware due to one endpoint always being interpolated (which doesn't occur with an 8-bit software decoder). This is achieved by starting the search radius at zero and working outwards (0, -1, 1, -2, 2, etc.). Further, once we have zero error we take this block as the best available and exit early.

This fixes the original issue, keeps the max error, RMSE and PSNR exactly the same, and improves performance. Some timings, using the default -hr5 radius:

Original code:

BC4

flowers-2048x2048
Total encoding time: 0.599000 secs
Total processing time: 0.656000 secs

quenza-2048x2048
Total encoding time: 0.825000 secs
Total processing time: 0.883000 secs

BC5

bunny-nmap-2048x2048
Total encoding time: 0.446000 secs
Total processing time: 0.510000 secs

can-nmap-2048x2048
Total encoding time: 0.342000 secs
Total processing time: 0.398000 secs

This commit:

BC4

flowers-2048x2048
Total encoding time: 0.476000 secs
Total processing time: 0.534000 secs

quenza-2048x2048
Total encoding time: 0.725000 secs
Total processing time: 0.784000 secs

BC5

bunny-nmap-2048x2048
Total encoding time: 0.214000 secs
Total processing time: 0.271000 secs

can-nmap-2048x2048
Total encoding time: 0.212000 secs
Total processing time: 0.268000 secs

All timings were from the best of four runs. The biggest improvement was in normal maps since there are large areas with 2-3 values hovering around 127, and since the search radius is now growing outwards these are found early on.

This fixes richgel999#17 but goes further, since it provides higher accuracy for other blocks with few values. Two value blocks are special-cased to use the two endpoints. An early out is taken when the error reaches zero.

As 16-bit we couldn't accumulate the worst-case error without overflowing. Also fixed a bug whereby the values6 were truncated to 8-bit, therefore mostly favouring values8. The return from encode_bc4_hq() is now scaled to the same range from before the changes.

richgel999 · 2022-09-05T14:51:42Z

Thank you, this looks very valuable. Have you tested these changes on a large amount of content to verify the output encoding hasn't changed? That's one of my primary concerns, initially.

cwoffenden · 2022-09-05T17:44:48Z

I'll try to find the time to throw a few thousand grayscale and normal maps at it and verify the error metrics and times. The encoded output may differ (e.g. two-value BC4 will always use selectors 0 and 1 instead of a single endpoint plus an interpolation) but the decoded output at 8-bit should be the same, so I could hash the decoded PNG. I don't think I'll be able to do that in the next few weeks though.

cwoffenden · 2022-10-02T14:54:59Z

I have some initial results. I wrote this (rather sprawling) test runner to verify everything:

https://gist.github.com/cwoffenden/98780e9009a2d4f62433ea9f77ef4113

You can give it a directory of PNGs and it'll compress them then collect the metrics in a CSV file. For example:

./runbc7enc.py -b 4 -o /Volumes/Temp -x ./bc7enc -l nfproj-grey-orig.csv -t /Volumes/Work/Assets/Test/Numfum/grey

This ran the BC4 encoder on 450-ish greyscale files and recorded the max error, RMSE and PSNR (but ignored the time, just so I could do a quick diff). Here are the results: the original and changed code.

The RMSE and PSNR don't change (probably not enough digits) but the max error does, in an interesting way. There are five differences in this set of files, with four of the five having a lower-by-one max error in the new code. It's interesting because it highlights a potential accidental improvement which I'll look at in the week (better selection of the best block).

I'll cover the processing time later when I've thrown more files at it (short version: it's faster, about 20% average when fed 100s of normal maps). On Mac it doesn't build with OpenMP (it's not supported out of the box) so I want to wait until I'm back at work to test on other OSes.

I can share the test files with you so you can verify if you like? I have a classifier go through internal projects and pull out different texture types.

cwoffenden · 2022-10-02T15:47:41Z

I ran the same on approx 1400 other greyscale files and recorded two more from them all where the max error is lower in the changed code. CSV files here.

It's totally accidental that it swings this way, since I've seen a few normal maps where the max lower is in the original code. It's to do with taking the summed error and calling the lowest value the best, rather than looking at which equal summed errors have lower averages or maximums. Specifically here:

bc7enc_rdo/rgbcx.cpp

Line 2859 in e6990bc

if (trial_err < best_err)

trial_err needs to have further refinement.

cwoffenden added 3 commits September 4, 2022 18:45

BC4/5 fixes and performance improvements

457c74f

This fixes richgel999#17 but goes further, since it provides higher accuracy for other blocks with few values. Two value blocks are special-cased to use the two endpoints. An early out is taken when the error reaches zero.

Simplified 2-value search, removed 8-bit expansion

72b8ea0

cwoffenden force-pushed the 16-bit-palette+shortcuts branch from 9526d5b to 72b8ea0 Compare September 5, 2022 14:08

Match original style

280fe72

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BC4/5 fixes and performance improvements #18

BC4/5 fixes and performance improvements #18

cwoffenden commented Sep 4, 2022 •

edited

Loading

richgel999 commented Sep 5, 2022

cwoffenden commented Sep 5, 2022

cwoffenden commented Oct 2, 2022 •

edited

Loading

cwoffenden commented Oct 2, 2022

BC4/5 fixes and performance improvements #18

Are you sure you want to change the base?

BC4/5 fixes and performance improvements #18

Conversation

cwoffenden commented Sep 4, 2022 • edited Loading

richgel999 commented Sep 5, 2022

cwoffenden commented Sep 5, 2022

cwoffenden commented Oct 2, 2022 • edited Loading

cwoffenden commented Oct 2, 2022

cwoffenden commented Sep 4, 2022 •

edited

Loading

cwoffenden commented Oct 2, 2022 •

edited

Loading