Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use shelf allocation for font texture atlases #723

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

leduyquang753
Copy link
Contributor

@leduyquang753 leduyquang753 commented Jan 18, 2025

This pull request aims to resolve the following comment in FontFaceLayer.cpp:

@performance: We could be much smarter about this, e.g. such as adding new glyphs to the existing texture layout and textures. Right now we re-generate the whole thing, including textures.

My approach is to use the shelf packing algorithm, which is employed in Firefox. (For perspective, Skia in Chromium divides the atlas into plots and uses the skyline algorithm for each plot.)

My further plan is to also conserve memory by removing glyphs and fonts that have not been used for a while.

This is currently my first iteration of the solution which brings in the shelf allocator I have implemented previously and so the code style and convention have not matched RmlUi's yet. I would like to receive some thoughts about the design, as well as behavior and performance testing in the meantime.

@mikke89
Copy link
Owner

mikke89 commented Jan 21, 2025

First of all, very cool! I have been wanting to improve the font engine for some time, so it is great to see efforts in this direction.

We do already have our texture packer, which I'm sure you're aware of. I wonder if it makes sense to integrate the feature into that existing code, or if we might as well start over like it seems you have done here?

By the way, did you write this from scratch?

I'd be really interested in some numbers, that's really decisive for an improvement like this. Do you think you could make some realistic benchmarks with before and after. For example, how long does it take to add one new character to a font texture? With a minimal existing number of characters, and with a large number of existing characters. And for a large and small texture / font size.

Removing unused glyphs is also something I've been thinking of. That would be a great addition, and especially important for CJK and many other languages. One thing I've also considered is whether we could share texture atlases between font sizes, or even font faces/families.

@leduyquang753
Copy link
Contributor Author

leduyquang753 commented Jan 22, 2025

We do already have our texture packer, which I'm sure you're aware of. I wonder if it makes sense to integrate the feature into that existing code, or if we might as well start over like it seems you have done here?

By the way, did you write this from scratch?

This is indeed my own from-scratch implementation. The existing texture packer is quite primitive in that it relayouts all glyphs on each glyph set change. My implementation will be later refined to blend in with the codebase but, as it is so radically different, it will effectively be a full on rewrite of the original packer.

I'd be really interested in some numbers, that's really decisive for an improvement like this. Do you think you could make some realistic benchmarks with before and after. For example, how long does it take to add one new character to a font texture? With a minimal existing number of characters, and with a large number of existing characters. And for a large and small texture / font size.

I plan to introduce proper benchmarks when the full solution somewhat takes shape. But as it is quite similar to what Firefox is using, I would expect it to be similarly fast at least when finding out where to place a glyph: within each atlas page, the shelf allocator iterates through only unallocated areas, which are usually quite large as long as the page is not too fragmented.

Also, further potential performance improvements could be achieved by only uploading the updated portion of the texture atlas.

One thing I've also considered is whether we could share texture atlases between font sizes, or even font faces/families.

If this is a good idea to you then I would be glad to implement it; Firefox and Skia also employ such a strategy. I anticipate it would affect a larger portion of the codebase, however, so it would be of great help if I could receive some assistance in the design of this feature.

@mikke89
Copy link
Owner

mikke89 commented Jan 26, 2025

This is indeed my own from-scratch implementation. The existing texture packer is quite primitive in that it relayouts all glyphs on each glyph set change. My implementation will be later refined to blend in with the codebase but, as it is so radically different, it will effectively be a full on rewrite of the original packer.

Great, sounds good, I think it's reasonable to start from scratch here. Let's just make sure to also clean up old code (i.e. remove it or refactor) so that we don't have any duplicate code doing similar things. I of course understand this is just a draft for now, just wanted to say that up-front.

I plan to introduce proper benchmarks when the full solution somewhat takes shape. But as it is quite similar to what Firefox is using, I would expect it to be similarly fast at least when finding out where to place a glyph:

What I'm am mostly interested in here is a baseline of the current implementation. I am sure things can be done a lot more performant, but I'm not even sure what the current condition is. I.e., whether or not this is a significant bottleneck at all. I think that should be established first before we make a large effort in this direction.

If this is a good idea to you then I would be glad to implement it; Firefox and Skia also employ such a strategy. I anticipate it would affect a larger portion of the codebase, however, so it would be of great help if I could receive some assistance in the design of this feature.

Great, I'll be glad to help out later on here. I think we should do it in steps though, so essentially start with what you're doing here, with just one font face at a time. Then once that work is done and merged, we can start expanding towards a more global solution.

@leduyquang753 leduyquang753 force-pushed the fontAtlasShelfAllocation branch 3 times, most recently from 8c06684 to 949fef2 Compare February 5, 2025 14:16
@leduyquang753 leduyquang753 force-pushed the fontAtlasShelfAllocation branch from 949fef2 to e1dc4b2 Compare February 5, 2025 14:19
@leduyquang753
Copy link
Contributor Author

leduyquang753 commented Feb 5, 2025

I have just added a benchmark suite that involves cycling through a number of Chinese characters, the same suite is also applied to the original implementation in my fontTextureAtlasBenchmarks branch. The results on my machine are as follows:

Original implementation
relative ns/op op/s err% total Font texture atlas
100.0% 1,463,500.00 683.29 4.5% 0.02 Size 12 with 10 glyphs
15.4% 9,496,000.00 105.31 5.0% 0.11 〰️ Size 12 with 100 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
0.5% 270,057,000.00 3.70 9.1% 3.05 〰️ Size 12 with 1000 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
77.1% 1,899,400.00 526.48 12.0% 0.02 〰️ Size 16 with 10 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
14.8% 9,893,500.00 101.08 7.2% 0.11 〰️ Size 16 with 100 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
0.3% 423,145,600.00 2.36 1.8% 4.74 Size 16 with 1000 glyphs
71.2% 2,055,000.00 486.62 9.0% 0.02 〰️ Size 24 with 10 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
9.0% 16,296,900.00 61.36 1.5% 0.18 Size 24 with 100 glyphs
0.2% 872,547,100.00 1.15 1.5% 9.66 Size 24 with 1000 glyphs
26.8% 5,470,000.00 182.82 6.9% 0.06 〰️ Size 48 with 10 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
2.5% 58,260,700.00 17.16 2.2% 0.64 Size 48 with 100 glyphs
0.1% 2,166,604,600.00 0.46 0.9% 23.81 Size 48 with 1000 glyphs
8.4% 17,323,600.00 57.72 2.4% 0.19 Size 96 with 10 glyphs
0.8% 188,429,800.00 5.31 1.6% 2.08 Size 96 with 100 glyphs
0.0% 3,080,031,700.00 0.32 1.0% 34.03 Size 96 with 1000 glyphs
New implementation
relative ns/op op/s err% total Font texture atlas
100.0% 14,082,400.00 71.01 2.2% 0.16 Size 12 with 10 glyphs
14.4% 97,583,700.00 10.25 1.9% 1.08 Size 12 with 100 glyphs
1.8% 802,651,300.00 1.25 1.8% 8.89 Size 12 with 1000 glyphs
109.7% 12,841,200.00 77.87 3.9% 0.14 Size 16 with 10 glyphs
16.6% 84,954,500.00 11.77 1.5% 0.94 Size 16 with 100 glyphs
1.7% 813,553,700.00 1.23 2.0% 9.19 Size 16 with 1000 glyphs
122.7% 11,475,100.00 87.15 2.2% 0.13 Size 24 with 10 glyphs
17.1% 82,121,200.00 12.18 1.1% 0.91 Size 24 with 100 glyphs
1.7% 851,945,600.00 1.17 5.5% 9.90 〰️ Size 24 with 1000 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
97.3% 14,468,900.00 69.11 5.7% 0.17 〰️ Size 48 with 10 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
14.8% 95,021,400.00 10.52 3.4% 1.12 Size 48 with 100 glyphs
1.2% 1,191,679,700.00 0.84 21.5% 14.18 〰️ Size 48 with 1000 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
75.3% 18,713,600.00 53.44 3.6% 0.21 Size 96 with 10 glyphs
13.3% 105,910,400.00 9.44 1.0% 1.17 Size 96 with 100 glyphs
1.4% 1,014,052,200.00 0.99 1.5% 11.12 Size 96 with 1000 glyphs

It appears that my implementation is slower, but this is because the processing time is absolutely dominated by the operation of copying the whole texture atlas each time it is regenerated. My implementation uses a fixed 1024 × 1024 atlas size, while the original implementation determines a size that is just enough for the current glyphs and takes less time to copy the texture data. Because my implementation maintains the texture data at all times, performance could be dramatically improved by directly referring to that texture data while uploading, and possibly only uploading dirty regions within the atlas.

Update: I removed the unnecessary copy and it is indeed much faster now. It beats the original implementation at higher glyph counts thanks to not having to rearrange all glyphs.

New benchmark results
relative ns/op op/s err% total Font texture atlas
100.0% 2,970,200.00 336.68 4.4% 0.03 Size 12 with 10 glyphs
56.3% 5,273,900.00 189.61 4.8% 0.06 Size 12 with 100 glyphs
9.1% 32,638,600.00 30.64 3.2% 0.36 Size 12 with 1000 glyphs
93.7% 3,169,900.00 315.47 5.7% 0.04 〰️ Size 16 with 10 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
55.2% 5,385,200.00 185.69 4.2% 0.06 Size 16 with 100 glyphs
6.9% 43,134,600.00 23.18 2.3% 0.48 Size 16 with 1000 glyphs
71.4% 4,157,300.00 240.54 7.0% 0.05 〰️ Size 24 with 10 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
43.6% 6,812,600.00 146.79 6.6% 0.08 〰️ Size 24 with 100 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
6.4% 46,560,900.00 21.48 4.1% 0.51 Size 24 with 1000 glyphs
59.1% 5,027,900.00 198.89 4.2% 0.06 Size 48 with 10 glyphs
27.9% 10,648,500.00 93.91 6.0% 0.12 〰️ Size 48 with 100 glyphs (Unstable with ~1.0 iters. Increase minEpochIterations to e.g. 10)
3.4% 88,539,300.00 11.29 1.9% 0.97 Size 48 with 1000 glyphs
29.6% 10,019,900.00 99.80 1.6% 0.11 Size 96 with 10 glyphs
13.2% 22,514,700.00 44.42 0.5% 0.25 Size 96 with 100 glyphs
1.5% 204,386,200.00 4.89 1.3% 2.27 Size 96 with 1000 glyphs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants