Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize copies of large enums #54360

Open
Manishearth opened this issue Sep 19, 2018 · 5 comments
Open

Optimize copies of large enums #54360

Manishearth opened this issue Sep 19, 2018 · 5 comments
Labels
A-codegen Area: Code generation A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Manishearth
Copy link
Member

For types like SmallVec<[T; 1000]>, or in general an enum where the variants have a huge difference in size, we should probably try to optimize the copies better.

Basically, for enums with some large-enough difference between variant sizes, we should use a branch when codegenning copies/moves.

I'm not sure how common this pattern is, but it's worth looking into!

cc @rust-lang/wg-codegen

@leonardo-m
Copy link

@eddyb
Copy link
Member

eddyb commented Sep 19, 2018

I have two theories/ideas for heuristics (without investigating anything first):

  1. a branch would cost you less than what memcpy would take to copy one chunk, of whatever best-case-scenario size it can support - and that best-case might be related to platform SIMD sizes
  2. branching is worth it, if it saves you from accessing more data cache lines - then, you want to take into account common cache line sizes (usually 64 bytes IIUC)

cc @rkruppe

@eternaleye
Copy link
Contributor

eternaleye commented Sep 19, 2018

One way of tackling this might be:

  1. For each enum type, emit an rodata array mapping discriminants to variant sizes
  2. For any enum that is "sufficiently heterogeneous" in size, emit memcpys of the result of a lookup in the relevant array

Since the lookup happening is deterministic, and the indices (and thus arrays) are likely to be small for hugely-heterogeneous enums, the arrays are both 1.) easily prefetched by the CPU and 2.) likely to fit in a cacheline. As a result, most cases may avoid having any observable latency penalty unless the memory bus is saturated.

EDIT: Also, this pollutes the D$ but not the I$ or the branch predictor, and the D$ is going to be prodded by the memcpy anyway.

@jonas-schievink jonas-schievink added I-slow Issue: Problems and improvements with respect to performance of generated code. A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jan 27, 2019
@steveklabnik
Copy link
Member

Triage: I'm not aware of any changes here

@oli-obk oli-obk added the A-mir-opt Area: MIR optimizations label Jun 10, 2020
@oli-obk
Copy link
Contributor

oli-obk commented Jun 10, 2020

This should probably be written as a MIR optimization. The mentioned array can be created as a const directly in the mir (mentioned via Rvalue::Use(Operand::Const(...))). If we add a query that takes a Ty<'tcx> and emits said constant, we'll even get deduplication for free.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-mir-opt Area: MIR optimizations C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants