Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Support for encodings using floating-point values #197

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JeromeMartinez
Copy link
Contributor

This pull request expands FFV1 to provide lossless compression to additional pixel formats by adding floating-point values support, and permits the lossless encoding and decoding of the following FFmpeg currently existing floating-point values “pix_fmt” (AV_PIX_FMT_GBRPF32, AV_PIX_FMT_GBRAPF32, AV_PIX_FMT_GRAYF32) as well as their (not yet existing) 16-bit counterparts.
Video formats such as EXR can use floating-point values.

Note about the implementation in FFmpeg: As FFmpeg does not (yet) support 16-bit floating point RGB pixel formats, I plan to send a patch for ffv1dec using exactly the same method of decoding as FFmpeg handles EXR (using of the lossy conversion to integer function from EXR implementation after FFV1 decoding, no encoding), with a decoding message about the lossy decoding from the conversion. The additional complexity is minimal (a test on colorspace_type in order to apply the float to int conversion after decoding).

Note about the YCbCr part: as FFmpeg has AV_PIX_FMT_GRAYF32, I prefer to anticipate the support of such pix_fmt in order to have a coherent specification, by just increasing the colorspace_type value by 2 for each previous colorspace_type values.

Potential optimizations: In practice for 16-bit content not all bits are used (bit 15 is the sign and so always 0 and bit 14 is for more than 1.0 and so always 0); however, in theory it is possible to have values that are negative or greater than 1 (see AllHalfValues.exr description
in https://github.com/AcademySoftwareFoundation/openexr-images/tree/master/TestImages ) so we can not simply omit these bits in the Parameters. Optimization about reducing the bit depth requires more complex changes (similar to how we could reduce Y bit depth to bit_depth instead of bit_depth+1 with colorspace_type of 1, as Only Cb and Cr have a range of bit_depth+1 bits) which would be implemented in version 4. The idea of the implementation in version 3 is to keep the decoder nearly untouched.

This link has some sample files as a proof of concept (the colorspace_type value in the FFV1 bitstream is wrong for both the DPX header and FFV1 bitstream but permits to decode the floating-point numbers from current FFmpeg; FFmpeg patches are hacks for moving the float to int algo from EXR decoder to FFV1 decoder).

@michaelni
Copy link
Member

Note about the implementation in FFmpeg: As FFmpeg does not (yet) support 16-bit floating point RGB pixel formats, I plan to send a patch for ffv1dec using exactly the same method of decoding as FFmpeg handles EXR (using of the lossy conversion to integer function from EXR implementation after FFV1 decoding, no encoding), with a decoding message about the lossy decoding from the conversion. The additional complexity is minimal (a test on colorspace_type in order to apply the float to int conversion after decoding).

Please dont. Instead add a pixel format. If adding support in some part like swscale is too hard then just skip this but do not add more hacks like a downconvert to 16bit integers

Note about the YCbCr part: as FFmpeg has AV_PIX_FMT_GRAYF32, I prefer to anticipate the support of such pix_fmt in order to have a coherent specification, by just increasing the colorspace_type value by 2 for each previous colorspace_type values.

colorspace_type is the wrong field to indicate float formats
why do you not add a new field ? It will not decode with existing decoders anyway

also i would not drop the "integer" from the transform, whatever is done it needs to be a integer transform else precission and rounding becomes a issue with floats and lossless

@JeromeMartinez
Copy link
Contributor Author

Please dont. Instead add a pixel format.

I was imagining to split the work, in order to reduce the count of changes of the corresponding patch, but I am fine with adding 16-bit float support in FFmpeg at the same time if I don't have to do swcale stuff at the same time, so:

If adding support in some part like swscale is too hard then just skip this

here it makes the patch easier, I was afraid about this task and have my patch rejected if I don't do that, reason I was planning to do as it is in EXR.

why do you not add a new field ?

I don't get that, how? IIUC, adding a new field will make old decoders just skip it, not what we would want. using colorspace_type is exactly for forbidding old decoders to decode the bitstream as integer values.
Please hint about where you would add this new field and at the same time having the bitstream rejected by older decoders.

also i would not drop the "integer" from the transform, whatever is done it needs to be a integer transform else precission and rounding becomes a issue with floats and lossless

The idea is definitely the opposite, I'll keep the word and add a "MUST consider float values as integers" somewhere else.

@michaelni
Copy link
Member

why do you not add a new field ?

I don't get that, how? IIUC, adding a new field will make old decoders just skip it, not what we would want. using colorspace_type is exactly for forbidding old decoders to decode the bitstream as integer values.
Please hint about where you would add this new field and at the same time having the bitstream rejected by older decoders.

does a version 3 decoder decode it to something meaningfull ?
if not it is not a version 3 feature. So can be added only to later versions.

version 4 is not final yet so any decoder attempting to decode that accepts potential failure.
Anyone implementing a decoder can choose not to decode not yet final versions to avoid attempting to decode a file and then failing.
we could bump versions more often to get finer grained behavior, not sure thats a good idea or not

If you want to add a method of more fine grained "file support" detection thats fine, wouldnt be a bad thing. But please dont hack new features into semantically wrong fields

@michaelni
Copy link
Member

Potential optimizations: In practice for 16-bit content not all bits are used (bit 15 is the sign and so always 0 and bit 14 is for more than 1.0 and so always 0); however, in theory it is possible to have values that are negative or greater than 1 (see AllHalfValues.exr description
in https://github.com/AcademySoftwareFoundation/openexr-images/tree/master/TestImages ) so we can not simply omit these bits in the Parameters.

Instead of one float flag in the header we could add fields that specify the number of bits in the mantisse, if theres a sign bit and the range of exponents (larger than 1 and how much detail around 0). This would be a superset for single and double precission IEEE floats. And should also improve speed as fewer "always 0" planes would be stored

@retokromer
Copy link
Contributor

I strongly support the idea to implement floating-point formats in version 4 only, avoiding to try a “hack” for version 3, and to do it consistently from scratch.

@JeromeMartinez JeromeMartinez changed the title Support for encodings using floating-point values [WIP] Support for encodings using floating-point values May 25, 2020
@michaelni
Copy link
Member

Any update on this ?
I can maybe work on this if noone else has

@richardpl
Copy link

16bit float to 32bit float in exr decoder with tables is not lossy, its lossless - no data is lost.

@JeromeMartinez
Copy link
Contributor Author

I don't have any specific update on this, on our side we went to an awful but working hack using EXR 16-bit float as integer and signaling EXR 16-bit float with a side channel (compression is in practice like 16 bit integer, average of 50% compression), but still interested in moving to something less hack.

In my opinion the discussion is more about how to signal the pix_fmt and maybe avoiding to compress only 0 bit planes rather than having a complex and new path for compressing and decompressing. It could also be reused for integers as sometimes some bits are 0 padding or very dark (so highest bit always 0 in a slice).

State of my thoughts about a superset of changes for v4:

  • Keep SliceContent spec untouched, no additional complexity in this part without any argument that it would help compression specifically for float.
  • Consider the bit depth in ConfigurationRecord as the maximum bit depth, and just integer/float flag (or more precise with mantissa bit count or any other tip about mapping of 16 bit values to 32 bit values etc), for the decoded output configuration
  • Whatever is the type (so including integer, no specific code depending of the pixel type, we keep the bistream simple), tip in SliceHeader about the count of higher and lower bits with 0 in all pixels, and no compression of theses bits.

Rationale is that we don't know in advance the content which could be e.g. negative or more than 1 in only one frame at the end, so we don't prevent this possibility at encoder and decoder init but we permit speed optimization by limiting the count of bits managed by the range coder, on the decoder side it is only an extra bit shift in the case of lower bits having only 0 and nothing for higher bits.
In practice for float it permits a 15-bit (or less) encoding for 99.999% of frames (the ones with non negative content which are so uncommon that it is not useful to optimize for them) so using 16-bit after RCT (no need of 32-bit intermediate storage) and sometimes it may also help for integers (sometimes 16-bit intermediate storage only, possibility to have the range coder to "overflow" so sometimes better compression, to be demonstrated).

What other optimizations do you see for this topic?

@michaelni
Copy link
Member

In my opinion the discussion is more about how to signal the pix_fmt and maybe avoiding to compress only 0 bit planes rather than having a complex and new path for compressing and decompressing. It could also be reused for integers as sometimes some bits are 0 padding or very dark (so highest bit always 0 in a slice).

I think there are 3 different things here.

  1. signaling floats (16,32,64)
  2. improvments in how symbols are compressed which may apply to both integer or floats
  3. a totally new coder for floats

we can do 1+2 and treat each independantly or look at 3 first and then decide

Rationale is that we don't know in advance the content which could be e.g. negative or more than 1 in only one frame at the end, so we don't prevent this possibility at encoder and decoder init but we permit speed optimization by limiting the count of bits managed by the range coder, on the decoder side it is only an extra bit shift in the case of lower bits having only 0 and nothing for higher bits. In practice for float it permits a 15-bit (or less) encoding for 99.999% of frames (the ones with non negative content which are so uncommon that it is not useful to optimize for them) so using 16-bit after RCT (no need of 32-bit intermediate storage) and sometimes it may also help for integers (sometimes 16-bit intermediate storage only, possibility to have the range coder to "overflow" so sometimes better compression, to be demonstrated).

What other optimizations do you see for this topic?

we have at least 3 things.

  1. RCT
  2. quantization
  3. predictor

All 3 are wrong for floats, in the sense of not being "homomorphic" That is if you take a few integers and a few floats that are equivalent in some sense then these opertions do not do the same thing to both.
You can see this if you consider that the predictor will always increase by x if all its inputs are offset by x in integers. But you will not see this effect in floats. It will be more chaotic.
We should test the correct corresponding operations to better understand their performance difference before simply using the "wrong" integer ones

@richardpl
Copy link

Treating floats as integers when compressing? I doubt one can get any big compression gain that way, at least for audio case it is bad...

@JeromeMartinez
Copy link
Contributor Author

Treating floats as integers when compressing? I doubt one can get any big compression gain that way, at least for audio case it is bad...

I can not share the files but real use case by a RAWcooked user (non relevant things removed, same FFV1 config with v3 and 576 slices):

$ rawcooked 00000.exr
$ rawcooked 00000.exr.mkv
$ lzma --keep --extreme 00000.exr
$ ffmpeg -i 00000.exr -c:v ffv1 -slices 576 00000.rgb48.mkv
$ ls -l
49801411 00000.exr
20294634 00000.exr.mkv
23972454 00000.exr.lzma
49801411 00000.RAWcooked.exr
29595221 00000.rgb48.mkv

00000.exr.mkv is with a hack handling float as int then FFV1
00000.rgb48.mkv is FFmpeg (lossy) converting float to int then FFV1, for example about how it is encoded if integer (I know, not same content, just saying that I would not like to convert to int then compress...)
00000.RAWcooked.exr is the reverting process to EXR from FFV1 by RAWcooked and is bit-by-bit identical to 00000.exr.
00000.exr.lzma is for reference about what can do a (very slow) generic compression with one of the best compressors, usually FFV1 does with 10 or 16 bit integer content something a bit smaller than this compression, so FFV1 behavior with float as int is really similar to what it does with int.

TLDR, FFV1 compresses this file by 60%! More generally we have an average compression ratio like that, better than our 16-bit int (easy, lot of MSB at 0... But still good!) which is ~50% compression.

Users appreciate a lot to have this compression ratio and prefer to have this one rather than storing EXR files as is, and I have doubt we could really do a lot better without a lot of changes in FFV1, current issue is not the compression ratio but the fact that there is no standard signaling of float.

@michaelni
Copy link
Member

Treating floats as integers when compressing? I doubt one can get any big compression gain that way, at least for audio case it is bad...

I can not share the files but real use case by a RAWcooked user (non relevant things removed, same FFV1 config with v3 and 576 slices):

I think we should switch to files that can be shared.

00000.exr.mkv is with a hack handling float as int then FFV1 00000.rgb48.mkv is FFmpeg (lossy) converting float to int then FFV1, for example about how it is encoded if integer (I know, not same content, just saying that I would not like to convert to int then compress...) 00000.RAWcooked.exr is the reverting process to EXR from FFV1 by RAWcooked and is bit-by-bit identical to 00000.exr. 00000.exr.lzma is for reference about what can do a (very slow) generic compression with one of the best compressors, usually FFV1 does with 10 or 16 bit integer content something a bit smaller than this compression, so FFV1 behavior with float as int is really similar to what it does with int.

TLDR, FFV1 compresses this file by 60%! More generally we have an average compression ratio like that, better than our 16-bit int (easy, lot of MSB at 0... But still good!) which is ~50% compression.

Users appreciate a lot to have this compression ratio and prefer to have this one rather than storing EXR files as is, and I have doubt we could really do a lot better without a lot of changes in FFV1, current issue is not the compression ratio but the fact that there is no standard signaling of float.

Theres a chance FFv1 maintaince work this year and especially development of float support will be funded. If thats the case i intend to investigate more completely how to optimally handle floats. The variant of simply treating them as integers isnt bad and i suggest we support that too as it adds 0 complexity but i agree with paul that it should be possible to do better than that.

@richardpl
Copy link

Are these real 32bit float EXR files - with natural (camera footage) content and one with synthetic (blender rendered ones) non trivial content? EXR have just bad lossless 32-bit float compressions IIRC.

If current/future coder in FFv1 can make extra reductions with mantissa and exp bits (with no need for separate coding of two of them) that would be major win.

@JeromeMartinez
Copy link
Contributor Author

I think we should switch to files that can be shared.

I wish I have that... And I am interested in such files, because it seems very hard to get such content, I developed my hack "blind" and it was enough for my needs.

In the meantime, 16-bit float non real use cases e.g.:

  • filesamples.com, -44% with float, -40% lossy conversion to int, -58% with lzma (well... It is a synthetical picture... Relatively classic, and the gain there is good to have but it is so slow...).
  • ACES_ODT_SampleFrames has ~100 different frames with different content but seems all not real shooting, -33%.

FYI tests are made with this ugly patch for FFmpeg and the lossless compression is confirmed when the added option is used.

as it adds 0 complexity but i agree with paul that it should be possible to do better than that.

It would be great if there is a demonstration that the additional complexity is worth it, I like FFV1 also because of its "low" complexity (very small code size compared to some other lossless formats).

@retokromer
Copy link
Contributor

retokromer commented Feb 13, 2024

I used as a starting point: https://openexr.com/en/latest/_test_images/index.html

Yet I will check if some of our clients are willing to share publicly examples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants