Use shelf allocation for font texture atlases #723

leduyquang753 · 2025-01-18T02:56:13Z

This pull request aims to resolve the following comment in FontFaceLayer.cpp:

@performance: We could be much smarter about this, e.g. such as adding new glyphs to the existing texture layout and textures. Right now we re-generate the whole thing, including textures.

My approach is to use the shelf packing algorithm, which is employed in Firefox. (For perspective, Skia in Chromium divides the atlas into plots and uses the skyline algorithm for each plot.)

My further plan is to also conserve memory by removing glyphs and fonts that have not been used for a while.

This is currently my first iteration of the solution which brings in the shelf allocator I have implemented previously and so the code style and convention have not matched RmlUi's yet. I would like to receive some thoughts about the design, as well as behavior and performance testing in the meantime.

mikke89 · 2025-01-21T19:43:57Z

First of all, very cool! I have been wanting to improve the font engine for some time, so it is great to see efforts in this direction.

We do already have our texture packer, which I'm sure you're aware of. I wonder if it makes sense to integrate the feature into that existing code, or if we might as well start over like it seems you have done here?

By the way, did you write this from scratch?

I'd be really interested in some numbers, that's really decisive for an improvement like this. Do you think you could make some realistic benchmarks with before and after. For example, how long does it take to add one new character to a font texture? With a minimal existing number of characters, and with a large number of existing characters. And for a large and small texture / font size.

Removing unused glyphs is also something I've been thinking of. That would be a great addition, and especially important for CJK and many other languages. One thing I've also considered is whether we could share texture atlases between font sizes, or even font faces/families.

leduyquang753 · 2025-01-22T10:31:20Z

We do already have our texture packer, which I'm sure you're aware of. I wonder if it makes sense to integrate the feature into that existing code, or if we might as well start over like it seems you have done here?

By the way, did you write this from scratch?

This is indeed my own from-scratch implementation. The existing texture packer is quite primitive in that it relayouts all glyphs on each glyph set change. My implementation will be later refined to blend in with the codebase but, as it is so radically different, it will effectively be a full on rewrite of the original packer.

I'd be really interested in some numbers, that's really decisive for an improvement like this. Do you think you could make some realistic benchmarks with before and after. For example, how long does it take to add one new character to a font texture? With a minimal existing number of characters, and with a large number of existing characters. And for a large and small texture / font size.

I plan to introduce proper benchmarks when the full solution somewhat takes shape. But as it is quite similar to what Firefox is using, I would expect it to be similarly fast at least when finding out where to place a glyph: within each atlas page, the shelf allocator iterates through only unallocated areas, which are usually quite large as long as the page is not too fragmented.

Also, further potential performance improvements could be achieved by only uploading the updated portion of the texture atlas.

One thing I've also considered is whether we could share texture atlases between font sizes, or even font faces/families.

If this is a good idea to you then I would be glad to implement it; Firefox and Skia also employ such a strategy. I anticipate it would affect a larger portion of the codebase, however, so it would be of great help if I could receive some assistance in the design of this feature.

mikke89 · 2025-01-26T20:51:28Z

This is indeed my own from-scratch implementation. The existing texture packer is quite primitive in that it relayouts all glyphs on each glyph set change. My implementation will be later refined to blend in with the codebase but, as it is so radically different, it will effectively be a full on rewrite of the original packer.

Great, sounds good, I think it's reasonable to start from scratch here. Let's just make sure to also clean up old code (i.e. remove it or refactor) so that we don't have any duplicate code doing similar things. I of course understand this is just a draft for now, just wanted to say that up-front.

I plan to introduce proper benchmarks when the full solution somewhat takes shape. But as it is quite similar to what Firefox is using, I would expect it to be similarly fast at least when finding out where to place a glyph:

What I'm am mostly interested in here is a baseline of the current implementation. I am sure things can be done a lot more performant, but I'm not even sure what the current condition is. I.e., whether or not this is a significant bottleneck at all. I think that should be established first before we make a large effort in this direction.

If this is a good idea to you then I would be glad to implement it; Firefox and Skia also employ such a strategy. I anticipate it would affect a larger portion of the codebase, however, so it would be of great help if I could receive some assistance in the design of this feature.

Great, I'll be glad to help out later on here. I think we should do it in steps though, so essentially start with what you're doing here, with just one font face at a time. Then once that work is done and merged, we can start expanding towards a more global solution.

leduyquang753 · 2025-02-05T14:39:23Z

I have just added a benchmark suite that involves cycling through a number of Chinese characters, the same suite is also applied to the original implementation in my fontTextureAtlasBenchmarks branch. The results on my machine are as follows:

Original implementation

relative	ns/op	op/s	err%	total	Font texture atlas
100.0%	1,463,500.00	683.29	4.5%	0.02	`Size 12 with 10 glyphs`
15.4%	9,496,000.00	105.31	5.0%	0.11	〰️ `Size 12 with 100 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
0.5%	270,057,000.00	3.70	9.1%	3.05	〰️ `Size 12 with 1000 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
77.1%	1,899,400.00	526.48	12.0%	0.02	〰️ `Size 16 with 10 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
14.8%	9,893,500.00	101.08	7.2%	0.11	〰️ `Size 16 with 100 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
0.3%	423,145,600.00	2.36	1.8%	4.74	`Size 16 with 1000 glyphs`
71.2%	2,055,000.00	486.62	9.0%	0.02	〰️ `Size 24 with 10 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
9.0%	16,296,900.00	61.36	1.5%	0.18	`Size 24 with 100 glyphs`
0.2%	872,547,100.00	1.15	1.5%	9.66	`Size 24 with 1000 glyphs`
26.8%	5,470,000.00	182.82	6.9%	0.06	〰️ `Size 48 with 10 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
2.5%	58,260,700.00	17.16	2.2%	0.64	`Size 48 with 100 glyphs`
0.1%	2,166,604,600.00	0.46	0.9%	23.81	`Size 48 with 1000 glyphs`
8.4%	17,323,600.00	57.72	2.4%	0.19	`Size 96 with 10 glyphs`
0.8%	188,429,800.00	5.31	1.6%	2.08	`Size 96 with 100 glyphs`
0.0%	3,080,031,700.00	0.32	1.0%	34.03	`Size 96 with 1000 glyphs`

New implementation

relative	ns/op	op/s	err%	total	Font texture atlas
100.0%	14,082,400.00	71.01	2.2%	0.16	`Size 12 with 10 glyphs`
14.4%	97,583,700.00	10.25	1.9%	1.08	`Size 12 with 100 glyphs`
1.8%	802,651,300.00	1.25	1.8%	8.89	`Size 12 with 1000 glyphs`
109.7%	12,841,200.00	77.87	3.9%	0.14	`Size 16 with 10 glyphs`
16.6%	84,954,500.00	11.77	1.5%	0.94	`Size 16 with 100 glyphs`
1.7%	813,553,700.00	1.23	2.0%	9.19	`Size 16 with 1000 glyphs`
122.7%	11,475,100.00	87.15	2.2%	0.13	`Size 24 with 10 glyphs`
17.1%	82,121,200.00	12.18	1.1%	0.91	`Size 24 with 100 glyphs`
1.7%	851,945,600.00	1.17	5.5%	9.90	〰️ `Size 24 with 1000 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
97.3%	14,468,900.00	69.11	5.7%	0.17	〰️ `Size 48 with 10 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
14.8%	95,021,400.00	10.52	3.4%	1.12	`Size 48 with 100 glyphs`
1.2%	1,191,679,700.00	0.84	21.5%	14.18	〰️ `Size 48 with 1000 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
75.3%	18,713,600.00	53.44	3.6%	0.21	`Size 96 with 10 glyphs`
13.3%	105,910,400.00	9.44	1.0%	1.17	`Size 96 with 100 glyphs`
1.4%	1,014,052,200.00	0.99	1.5%	11.12	`Size 96 with 1000 glyphs`

It appears that my implementation is slower, but this is because the processing time is absolutely dominated by the operation of copying the whole texture atlas each time it is regenerated. My implementation uses a fixed 1024 × 1024 atlas size, while the original implementation determines a size that is just enough for the current glyphs and takes less time to copy the texture data. Because my implementation maintains the texture data at all times, performance could be dramatically improved by directly referring to that texture data while uploading, and possibly only uploading dirty regions within the atlas.

Update: I removed the unnecessary copy and it is indeed much faster now. It beats the original implementation at higher glyph counts thanks to not having to rearrange all glyphs.

New benchmark results

relative	ns/op	op/s	err%	total	Font texture atlas
100.0%	2,970,200.00	336.68	4.4%	0.03	`Size 12 with 10 glyphs`
56.3%	5,273,900.00	189.61	4.8%	0.06	`Size 12 with 100 glyphs`
9.1%	32,638,600.00	30.64	3.2%	0.36	`Size 12 with 1000 glyphs`
93.7%	3,169,900.00	315.47	5.7%	0.04	〰️ `Size 16 with 10 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
55.2%	5,385,200.00	185.69	4.2%	0.06	`Size 16 with 100 glyphs`
6.9%	43,134,600.00	23.18	2.3%	0.48	`Size 16 with 1000 glyphs`
71.4%	4,157,300.00	240.54	7.0%	0.05	〰️ `Size 24 with 10 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
43.6%	6,812,600.00	146.79	6.6%	0.08	〰️ `Size 24 with 100 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
6.4%	46,560,900.00	21.48	4.1%	0.51	`Size 24 with 1000 glyphs`
59.1%	5,027,900.00	198.89	4.2%	0.06	`Size 48 with 10 glyphs`
27.9%	10,648,500.00	93.91	6.0%	0.12	〰️ `Size 48 with 100 glyphs` (Unstable with ~1.0 iters. Increase `minEpochIterations` to e.g. 10)
3.4%	88,539,300.00	11.29	1.9%	0.97	`Size 48 with 1000 glyphs`
29.6%	10,019,900.00	99.80	1.6%	0.11	`Size 96 with 10 glyphs`
13.2%	22,514,700.00	44.42	0.5%	0.25	`Size 96 with 100 glyphs`
1.5%	204,386,200.00	4.89	1.3%	2.27	`Size 96 with 1000 glyphs`

leduyquang753 added 2 commits February 4, 2025 15:45

First experimental iteration of shelf allocation for font atlases.

81ac069

Reformatted code of SpriteSet class.

a21a1a2

leduyquang753 force-pushed the fontAtlasShelfAllocation branch 3 times, most recently from 8c06684 to 949fef2 Compare February 5, 2025 14:16

Added font texture atlas benchmarks.

e1dc4b2

leduyquang753 force-pushed the fontAtlasShelfAllocation branch from 949fef2 to e1dc4b2 Compare February 5, 2025 14:19

Removed unnecessary copy of the texture atlas when uploading.

41e5053

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use shelf allocation for font texture atlases #723

Use shelf allocation for font texture atlases #723

leduyquang753 commented Jan 18, 2025 •

edited

Loading

mikke89 commented Jan 21, 2025

leduyquang753 commented Jan 22, 2025 •

edited

Loading

mikke89 commented Jan 26, 2025

leduyquang753 commented Feb 5, 2025 •

edited

Loading

Use shelf allocation for font texture atlases #723

Are you sure you want to change the base?

Use shelf allocation for font texture atlases #723

Conversation

leduyquang753 commented Jan 18, 2025 • edited Loading

mikke89 commented Jan 21, 2025

leduyquang753 commented Jan 22, 2025 • edited Loading

mikke89 commented Jan 26, 2025

leduyquang753 commented Feb 5, 2025 • edited Loading

leduyquang753 commented Jan 18, 2025 •

edited

Loading

leduyquang753 commented Jan 22, 2025 •

edited

Loading

leduyquang753 commented Feb 5, 2025 •

edited

Loading