Document performance caveats of RGB image formats versus RGBA #79771

Calinou · 2023-07-21T23:19:40Z

For context, @lyuma ran this benchmark on a 4096×4096 texture:

func _process(delta):
	var time_start: int = Time.get_ticks_usec()
	image_texture.update(image)
	var time_end: int = Time.get_ticks_usec()
	print(str(0.001 * (time_end - time_start)) + " ms")

These are the results they got depending on the Image format used:

Format	Time taken to update
RF	21ms
RGF	48ms
RGBF	650ms
RGBAF	100ms

RH	10ms
RGH	20ms
RGBH	800ms
RGBAH	48ms

R8	6.0ms
RG8	12ms
RGB8	51ms
RGBA8	21ms

Counterintuitively, the more memory-intensive RGBA formats always outperform the RGB format by a significant margin (by a 5× factor on average on this test). This is due to memory alignment, as RGB uses 3 components whereas other formats use 1, 2 or 4 components (powers of 2).

Considering how important the difference is, I think it's better to add notes everywhere it's relevant, even if it's a bit redundant.

See GDExtension: passing Image to ImageTexture with ImageTexture::update is too slow #76994.

torcado194 · 2023-07-22T07:35:31Z

Most of these changes say "Converting to [constant FORMAT_RGBA8] is slow, ... consider using [constant FORMAT_RGBA8]", instead of "[constant FORMAT_RGB8] is slow". Is this correct or am I misunderstanding?

aaronfranke · 2023-07-22T17:41:41Z

doc/classes/Image.xml

@@ -107,6 +107,7 @@
 			<param index="0" name="format" type="int" enum="Image.Format" />
 			<description>
 				Converts the image's format. See [enum Format] constants.
+				[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.


Suggested change

[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.

[b]Note:[/b] Converting to [constant FORMAT_RGB8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or for images without a blue channel, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBH] and [constant FORMAT_RGBF] versus their RGBA/RG/R counterparts.

I think we can at least do without the bit-alignment piece of knowledge.

Suggested change

[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.

[b]Note:[/b] Converting to [constant FORMAT_RGB8] is [i]very[/i] slow. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], [constant FORMAT_RG8], or [constant FORMAT_R8], which are faster. The same applies to [constant FORMAT_RGBH] and [constant FORMAT_RGBF] compared to their RGBA/RG/R counterparts.

Or

Suggested change

[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.

[b]Note:[/b] Converting to [constant FORMAT_RGB8] is [i]very[/i] slow. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8]. If possible, [constant FORMAT_RG8] or [constant FORMAT_R8] are faster. The same applies to [constant FORMAT_RGBH] and [constant FORMAT_RGBF] compared to their RGBA/RG/R counterparts.

Or similar.

I feel like we can make the text both informative and user-friendly by taking your suggestion but prepending "Due to memory alignment reasons," or appending "This slowness happens due to memory alignment reasons."

clayjohn · 2023-07-22T18:10:36Z

Before documenting this, I'd really like to see a performance analysis of what exactly is slow. I don't see an image conversion in the texture update codepath. We need to understand if the slowness is in the GPU driver (in which case we need to document it) or if the slowness is from something we are doing (in which case we may be able to fix the underlying issue)

bitsawer · 2023-07-22T19:00:52Z

@clayjohn I did some digging during #74238 when we noticed that sometimes custom image mipmaps were wiped out in the renderer/storage: #66848 (comment) There are a few places in texture storage where images are converted, for example this one (same also for FORMAT_RGBF and FORMAT_RGBH):

godot/servers/rendering/renderer_rd/storage_rd/texture_storage.cpp

Lines 1467 to 1481 in 1c1524a

    
           case Image::FORMAT_RGB8: { 
        
           	//this format is not mandatory for specification, check if supported first 
        
           	if (false && RD::get_singleton()->texture_is_format_supported_for_usage(RD::DATA_FORMAT_R8G8B8_UNORM, RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_CAN_UPDATE_BIT) && RD::get_singleton()->texture_is_format_supported_for_usage(RD::DATA_FORMAT_R8G8B8_SRGB, RD::TEXTURE_USAGE_SAMPLING_BIT | RD::TEXTURE_USAGE_CAN_UPDATE_BIT)) { 
        
           		r_format.format = RD::DATA_FORMAT_R8G8B8_UNORM; 
        
           		r_format.format_srgb = RD::DATA_FORMAT_R8G8B8_SRGB; 
        
           	} else { 
        
           		//not supported, reconvert 
        
           		r_format.format = RD::DATA_FORMAT_R8G8B8A8_UNORM; 
        
           		r_format.format_srgb = RD::DATA_FORMAT_R8G8B8A8_SRGB; 
        
           		image->convert(Image::FORMAT_RGBA8); 
        
           	} 
        
           	r_format.swizzle_r = RD::TEXTURE_SWIZZLE_R; 
        
           	r_format.swizzle_g = RD::TEXTURE_SWIZZLE_G; 
        
           	r_format.swizzle_b = RD::TEXTURE_SWIZZLE_B; 
        
           	r_format.swizzle_a = RD::TEXTURE_SWIZZLE_ONE;

And a few other places:

godot/servers/rendering/renderer_rd/storage_rd/texture_storage.cpp

Lines 1267 to 1271 in 1c1524a

    
           Ref<Image> image = Image::create_from_data(tex->width, tex->height, tex->mipmaps > 1, tex->validated_format, data); 
        
           ERR_FAIL_COND_V(image->is_empty(), Ref<Image>()); 
        
           if (tex->format != tex->validated_format) { 
        
           	image->convert(tex->format); 
        
           }

clayjohn · 2023-07-22T19:31:00Z

@bitsawer Thanks for pointing that out. Looks like the condition was set to always be false in 42b44f4. Although, it probably doesn't matter much as actual GPU support is really poor
(linear tiling)

I think we still need to check whether the call to image->convert is to blame though. If so, we can likely optimize that code

sakrel · 2023-07-22T21:13:45Z

This is a ~~frame~~ flame graph for RGB8 using the benchmark from the OP with a custom build:

6588a4a - Merge pull request #79661 from sepTN/fix-typo-batch
scons platform=windows arch=x86_64 verbose=yes warnings=no progress=no production=yes debug_symbols=yes target=editor
Found MSVC version 14.3, arch x86_64

RenderingDeviceVulkan:_texture_update spends most time in _copy_region

godot/drivers/vulkan/rendering_device_vulkan.cpp

Line 2598 in 6588a4a

_copy_region(read_ptr, write_ptr, x, y, region_w, region_h, width, pixel_size);

Image::convert

godot/core/io/image.cpp

Line 552 in 6588a4a

Image new_img(width, height, mipmaps, p_new_format);

godot/core/io/image.cpp

Line 640 in 6588a4a

_convert<3, false, 3, true, false, false>(mip_width, mip_height, rptr, wptr);

Something to note: When I run the benchmark with an official 4.1.1 build, a single iteration takes about 70 ms. With my custom build I get around 70 to 80 ms. I wonder if the difference can be explained by me using MSVC to compile the build? I copied the scons arguments from https://github.com/godotengine/godot-build-scripts/blob/main/build-windows/build.sh
Are there debug symbols for the official builds I can download? I want to rerun the Visual Studio diagnostic tools with the official build.

Calinou · 2023-07-23T16:56:50Z

I wonder if the difference can be explained by me using MSVC to compile the build? I

Official builds are compiled with a recent MinGW-GCC using optimize=speed and full LTO (lto=full). These options are aliased as production=yes. However, production=yes does not enable LTCG¹ on MSVC because it has known issues.

Are there debug symbols for the official builds I can download? I want to rerun the Visual Studio diagnostic tools with the official build.

Unfortunately, no. This is planned for a future release, but since we use MinGW, you'd have to convert the DWARF debug symbols to PDB format (there's a tool for that out there).

Link-time code generation (MSVC's LTO equivalent). ↩

Mickeon · 2024-11-12T21:21:59Z

Is this still relevant now? I feel like the note in the current PR is particularly excessive.

aaronfranke · 2024-11-12T21:31:59Z

@Mickeon Considering how extreme the performance disparity is, a strongly worded note is important.

clayjohn · 2024-11-13T02:42:19Z

@Mickeon Considering how extreme the performance disparity is, a strongly worded note is important.

Both of the problematic functions have been massively optimized in the past year. So this needs to be re-evaluated

Calinou · 2024-11-19T22:53:24Z

Both of the problematic functions have been massively optimized in the past year. So this needs to be re-evaluated

I ran the benchmark again using an updated MRP on 4.4.dev fd4c29a: test_image_performance.zip

PC specifications

CPU: Intel Core i9-13900K
GPU: NVIDIA GeForce RTX 4090
RAM: 64 GB (2×32 GB DDR5-5800 C30)
SSD: Solidigm P44 Pro 2 TB
OS: Linux (Fedora 41)

Using an optimized editor build for the tests.

Format	Time taken to update
RF	7.59 ms
RGF	13.93 ms
RGBF	136.30 ms
RGBAF	28.75 ms

RH	4.63 ms
RGH	7.84 ms
RGBH	62.48 ms
RGBAH	17.39 ms

R8	3.11 ms
RG8	3.76 ms
RGB8	29.02 ms
RGBA8	7.85 ms

The RGB functions are on average 4 times slower than the RGBA ones, despite storing less data.

Mickeon · 2024-11-19T23:29:48Z

The optimization is incredible, but the stark difference in performance with RGB is still way too noticeable to ignore, yeah.

clayjohn · 2024-11-20T00:45:47Z

@Calinou I found your results surprising, so I tested on 4.4 dev4 on my device (i7-1165G7). I suspect the difference comes from the graphics drivers

Format	Time taken to update
RF	14.67 ms
RGF	38.26 ms
RGBF	48.72 ms
RGBAF	63.42 ms

RH	5.90 ms
RGH	14.91 ms
RGBH	25.17 ms
RGBAH	39.74 ms

R8	4.43 ms
RG8	5.97 ms
RGB8	54.72 ms
RGBA8	15.48 ms

Document performance caveats of RGB image formats versus RGBA

73c6b8d

Calinou requested a review from a team as a code owner July 21, 2023 23:19

Calinou added enhancement documentation cherrypick:4.0 cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release labels Jul 21, 2023

Calinou added this to the 4.2 milestone Jul 21, 2023

aaronfranke reviewed Jul 22, 2023

View reviewed changes

YuriSizov added discussion performance and removed cherrypick:4.0 labels Oct 30, 2023

YuriSizov modified the milestones: 4.2, 4.x Oct 30, 2023

YuriSizov removed the cherrypick:4.1 Considered for cherry-picking into a future 4.1.x release label Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document performance caveats of RGB image formats versus RGBA #79771

Document performance caveats of RGB image formats versus RGBA #79771

Calinou commented Jul 21, 2023 •

edited

Loading

torcado194 commented Jul 22, 2023 •

edited

Loading

aaronfranke Jul 22, 2023

Mickeon Nov 12, 2024

aaronfranke Nov 13, 2024 •

edited

Loading

clayjohn commented Jul 22, 2023

bitsawer commented Jul 22, 2023 •

edited

Loading

clayjohn commented Jul 22, 2023

sakrel commented Jul 22, 2023 •

edited

Loading

Calinou commented Jul 23, 2023 •

edited

Loading

Mickeon commented Nov 12, 2024

aaronfranke commented Nov 12, 2024

clayjohn commented Nov 13, 2024

Calinou commented Nov 19, 2024 •

edited

Loading

Mickeon commented Nov 19, 2024

clayjohn commented Nov 20, 2024

	[b]Note:[/b] Converting to [constant FORMAT_RGBA8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or better, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBAH] and [constant FORMAT_RGBAF] versus their RGB/RG/R counterparts.
	[b]Note:[/b] Converting to [constant FORMAT_RGB8] is slow, as it is not aligned to 1, 2 or 4 bytes. If you need to frequently convert an image, consider using [constant FORMAT_RGBA8], or for images without a blue channel, [constant FORMAT_RG8] or [constant FORMAT_R8] if possible. The same applies to [constant FORMAT_RGBH] and [constant FORMAT_RGBF] versus their RGBA/RG/R counterparts.

Document performance caveats of RGB image formats versus RGBA #79771

Are you sure you want to change the base?

Document performance caveats of RGB image formats versus RGBA #79771

Conversation

Calinou commented Jul 21, 2023 • edited Loading

torcado194 commented Jul 22, 2023 • edited Loading

aaronfranke Jul 22, 2023

Choose a reason for hiding this comment

Mickeon Nov 12, 2024

Choose a reason for hiding this comment

aaronfranke Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

clayjohn commented Jul 22, 2023

bitsawer commented Jul 22, 2023 • edited Loading

clayjohn commented Jul 22, 2023

sakrel commented Jul 22, 2023 • edited Loading

Calinou commented Jul 23, 2023 • edited Loading

Footnotes

Mickeon commented Nov 12, 2024

aaronfranke commented Nov 12, 2024

clayjohn commented Nov 13, 2024

Calinou commented Nov 19, 2024 • edited Loading

Mickeon commented Nov 19, 2024

clayjohn commented Nov 20, 2024

Calinou commented Jul 21, 2023 •

edited

Loading

torcado194 commented Jul 22, 2023 •

edited

Loading

aaronfranke Nov 13, 2024 •

edited

Loading

bitsawer commented Jul 22, 2023 •

edited

Loading

sakrel commented Jul 22, 2023 •

edited

Loading

Calinou commented Jul 23, 2023 •

edited

Loading

Calinou commented Nov 19, 2024 •

edited

Loading