Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder structure fields to put commonly-used fields first: quick wins #4747

Closed
gilles-peskine-arm opened this issue Jul 1, 2021 · 4 comments · Fixed by #5189 or #5268
Closed

Reorder structure fields to put commonly-used fields first: quick wins #4747

gilles-peskine-arm opened this issue Jul 1, 2021 · 4 comments · Fixed by #5189 or #5268
Assignees
Labels
enhancement good-first-issue Good for newcomers size-s Estimated task size: small (~2d)

Comments

@gilles-peskine-arm
Copy link
Contributor

gilles-peskine-arm commented Jul 1, 2021

There are advantages to grouping commonly-used fields in structures together. On Cortex-M0, an access to the first 128 elements of a structure (p->x when offsetof(t, x) / sizeof(x) < 128 where sizeof(x) is 1, 2 or 4) uses less code than an access beyond this boundary. On platforms with a cache, putting commonly-used fields in the same cache line optimizes cache use.

Anecdotal evidence suggests that in some structures, such as the SSL context, the placement of a new field can make hundreds of bytes of difference in the size of the library. In terms of effort per byte, this is a very nice win.

Note that when reordering fields, we should avoid creating holes due to padding on typical architectures, since these holes are both wasted RAM and wasted space in the “easy access” region.

Hanno writes:

it was r649. The logic applied there, however, was not to move commonly used fields to the top, but rather sort them in size from smallest to biggest. This helps because the maximum immediate offset is 127 * the size of the access, so 32-bit accesses have a large max. offset than byte accesses. E.g. if you have a struct with 128 byte fields, 64 halfword fields and 32 word fields, you can access all of them from the base of the structure through an immediate offset if and only if you order them in precisely this way.

The goal of this issue is to identify and perform some quick wins. I expect to at least move a few fields around in mbedtls_ssl_context and mbedtls_ssl_config. The wins should be visible in the sizes reported by tests/scripts/all.sh build_arm_none_eabi_gcc_m0plus.

I also expect some ideas on how to progress further: how might we measure which fields are the most commonly used? Which structures are large enough for this to matter? Observations and suggestions should be recorded and filed for a follow-up that would do some less quick wins.

This should be done before the 2.2x LTS since we try not to change the ABI in an LTS.

@hanno-becker
Copy link

Note: A first step should be ordering fields by size to get as many of them as possible accessible through an immediate offset from the base. Only if that leads to the conclusion that a single immediate offset cannot cover the whole structure it is necessary to resort to reasoning about frequency of use.

@gilles-peskine-arm gilles-peskine-arm added the size-s Estimated task size: small (~2d) label Jul 4, 2021
@mpg
Copy link
Contributor

mpg commented Jul 5, 2021

This should be done before the 2.2x LTS since we try not to change the ABI in an LTS.

I'm not sure I agree with this reasoning. I mean, I obviously agree that this is not something we want to do in an LTS, but I disagree that a consequence is we want to do it in 2.2x before it becomes an LTS - we can also simply not do it in 2.2x and only do it in the 3.x line.

  1. Looking at this table, we can see that r649 "optimize structure" is near the middle of the table at 344 bytes saved (in the reference configuration of the baremetal branch, would probably be more in the default configuration to be fair). Why prioritize it over other changes that are higher in the table and that we also can't do post-LTS, such as making more things compile-time optional (AES decrypt, resumption, TLS as opposed to DTLS...) or other change which allow larger savings?
  2. Currently the EPIC description reads "Things we want to do to facilitate maintenance, that would be too disruptive for an LTS branch." Again, I agree this issue satisfies the second criterion, but I don't think it satisfies the first at all. I don't think we should start adding improvements to the pre-LTS EPIC just because they seem nice to have, easy to do and we can't do them post-LTS. This is a slippery slope that could easily result in the EPIC growing endlessly. I think for pre-LTS work we should really focus on making maintenance easier, and keep improvements for the 3.x line.

@gilles-peskine-arm
Copy link
Contributor Author

Compared to other higher wins from #3535 (comment), this one breaks the ABI (many don't because they add new compile-time options), is broadly applicable (applies to every TLS user, as opposed to making features optional which only help users who don't need these features), doesn't add a maintenance burden (many of the items on that list are things we probably won't ever do because they make the code too complex), requires little prior knowledge (most of the others require surgical intervention inside the SSL stack), and has a result that's easy to review. So it's a good candidate for a new joiner, hobbyist or Friday afternoon activity. In terms of effort vs win, it beats everything else on that list.

@mpg
Copy link
Contributor

mpg commented Jul 6, 2021

Those are all good points indeed. Considering these reasons I agree it makes sense to have this issue in the pre-LTS EPIC without it being the start of an endless feature creep.

@mpg mpg closed this as completed in #5268 Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment