-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BC4/5 fixes and performance improvements #18
base: master
Are you sure you want to change the base?
Conversation
This fixes richgel999#17 but goes further, since it provides higher accuracy for other blocks with few values. Two value blocks are special-cased to use the two endpoints. An early out is taken when the error reaches zero.
As 16-bit we couldn't accumulate the worst-case error without overflowing. Also fixed a bug whereby the values6 were truncated to 8-bit, therefore mostly favouring values8. The return from encode_bc4_hq() is now scaled to the same range from before the changes.
9526d5b
to
72b8ea0
Compare
Thank you, this looks very valuable. Have you tested these changes on a large amount of content to verify the output encoding hasn't changed? That's one of my primary concerns, initially. |
I'll try to find the time to throw a few thousand grayscale and normal maps at it and verify the error metrics and times. The encoded output may differ (e.g. two-value BC4 will always use selectors |
I have some initial results. I wrote this (rather sprawling) test runner to verify everything: https://gist.github.com/cwoffenden/98780e9009a2d4f62433ea9f77ef4113 You can give it a directory of PNGs and it'll compress them then collect the metrics in a CSV file. For example:
This ran the BC4 encoder on 450-ish greyscale files and recorded the max error, RMSE and PSNR (but ignored the time, just so I could do a quick diff). Here are the results: the original and changed code. The RMSE and PSNR don't change (probably not enough digits) but the max error does, in an interesting way. There are five differences in this set of files, with four of the five having a lower-by-one max error in the new code. It's interesting because it highlights a potential accidental improvement which I'll look at in the week (better selection of the best block). I'll cover the processing time later when I've thrown more files at it (short version: it's faster, about 20% average when fed 100s of normal maps). On Mac it doesn't build with OpenMP (it's not supported out of the box) so I want to wait until I'm back at work to test on other OSes. I can share the test files with you so you can verify if you like? I have a classifier go through internal projects and pull out different texture types. |
I ran the same on approx 1400 other greyscale files and recorded two more from them all where the max error is lower in the changed code. CSV files here. It's totally accidental that it swings this way, since I've seen a few normal maps where the max lower is in the original code. It's to do with taking the summed error and calling the lowest value the best, rather than looking at which equal summed errors have lower averages or maximums. Specifically here: Line 2859 in e6990bc
|
This fixes #17 but goes further:
Lots of text snipped, jump down to the next paragraph. Originally this expanded the internal endpoints to 14-bit, but in testing the RMSE and PSNR were always slightly worse even though the max error was reduced. These errors were higher due being calculated from the 8-bit PNG file, not the hardware's representation. Ryg's blog entry has a good explanation of the hardware.
I simplified this commit to address the main issue, which was blocks with two (or few) values having errors in hardware due to one endpoint always being interpolated (which doesn't occur with an 8-bit software decoder). This is achieved by starting the search radius at zero and working outwards (0, -1, 1, -2, 2, etc.). Further, once we have zero error we take this block as the best available and exit early.
This fixes the original issue, keeps the max error, RMSE and PSNR exactly the same, and improves performance. Some timings, using the default
-hr5
radius:Original code:
This commit:
All timings were from the best of four runs. The biggest improvement was in normal maps since there are large areas with 2-3 values hovering around 127, and since the search radius is now growing outwards these are found early on.