Optimize copies of large enums #54360

Manishearth · 2018-09-19T14:56:43Z

For types like SmallVec<[T; 1000]>, or in general an enum where the variants have a huge difference in size, we should probably try to optimize the copies better.

Basically, for enums with some large-enough difference between variant sizes, we should use a branch when codegenning copies/moves.

I'm not sure how common this pattern is, but it's worth looking into!

cc @rust-lang/wg-codegen

The text was updated successfully, but these errors were encountered:

leonardo-m · 2018-09-19T16:15:20Z

Discussed here:
https://old.reddit.com/r/programming/comments/9gzppu/falling_in_love_with_rust/e68tlfw/?st=jm8z60p1

eddyb · 2018-09-19T18:52:17Z

I have two theories/ideas for heuristics (without investigating anything first):

a branch would cost you less than what memcpy would take to copy one chunk, of whatever best-case-scenario size it can support - and that best-case might be related to platform SIMD sizes
branching is worth it, if it saves you from accessing more data cache lines - then, you want to take into account common cache line sizes (usually 64 bytes IIUC)

cc @rkruppe

eternaleye · 2018-09-19T19:12:59Z

One way of tackling this might be:

For each enum type, emit an rodata array mapping discriminants to variant sizes
For any enum that is "sufficiently heterogeneous" in size, emit memcpys of the result of a lookup in the relevant array

Since the lookup happening is deterministic, and the indices (and thus arrays) are likely to be small for hugely-heterogeneous enums, the arrays are both 1.) easily prefetched by the CPU and 2.) likely to fit in a cacheline. As a result, most cases may avoid having any observable latency penalty unless the memory bus is saturated.

EDIT: Also, this pollutes the D$ but not the I$ or the branch predictor, and the D$ is going to be prodded by the memcpy anyway.

steveklabnik · 2020-06-09T22:29:31Z

Triage: I'm not aware of any changes here

oli-obk · 2020-06-10T07:32:58Z

This should probably be written as a MIR optimization. The mentioned array can be created as a const directly in the mir (mentioned via Rvalue::Use(Operand::Const(...))). If we add a query that takes a Ty<'tcx> and emits said constant, we'll even get deduplication for free.

oli-obk added the A-mir-opt Area: MIR optimizations label Jun 10, 2020

oli-obk mentioned this issue Jun 10, 2020

Const prop creates assignments from large constants #73203

Closed

JulianKnodt mentioned this issue Aug 15, 2020

Add fn on types for layout of variants #75552

Closed

camelid mentioned this issue May 28, 2021

Mir-Opt for copying enums with large discrepancies #85158

Merged

workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize copies of large enums #54360

Optimize copies of large enums #54360

Manishearth commented Sep 19, 2018

leonardo-m commented Sep 19, 2018

eddyb commented Sep 19, 2018 •

edited

Loading

eternaleye commented Sep 19, 2018 •

edited

Loading

steveklabnik commented Jun 9, 2020

oli-obk commented Jun 10, 2020

Optimize copies of large enums #54360

Optimize copies of large enums #54360

Comments

Manishearth commented Sep 19, 2018

leonardo-m commented Sep 19, 2018

eddyb commented Sep 19, 2018 • edited Loading

eternaleye commented Sep 19, 2018 • edited Loading

steveklabnik commented Jun 9, 2020

oli-obk commented Jun 10, 2020

eddyb commented Sep 19, 2018 •

edited

Loading

eternaleye commented Sep 19, 2018 •

edited

Loading