Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elide unnecessary ret slot allocas #8270

Closed
wants to merge 1 commit into from

Conversation

dotdash
Copy link
Contributor

@dotdash dotdash commented Aug 3, 2013

When there is only a single store to the ret slot that dominates the
load that gets the value for the "ret" instruction, we can elide the
ret slot and directly return the operand of the dominating store
instruction. This is the same thing that clang does, except for a
special case that doesn't seem to affect us.

Fixes #8238

@brson
Copy link
Contributor

brson commented Aug 3, 2013

Did this have an effect on compile time?

@dotdash
Copy link
Contributor Author

dotdash commented Aug 3, 2013

I only measured on a branch together with a few other WIP changes, and all those together amounted to about 2-3s off of a librustc build with llvm assertion disabled. Going from about 1m36 to 1m33s.

@brson
Copy link
Contributor

brson commented Aug 3, 2013

Someone else more familiar with trans should review this as well. @pcwalton?

@brson
Copy link
Contributor

brson commented Aug 5, 2013

A comment I left in IRC: "The thing that most concerns me is that it incidentally adds two high-level bindings for LLVM types, and I'm not sure whether that jives with whatever intentions we have here."

@emberian
Copy link
Member

emberian commented Aug 8, 2013

@dotdash also needs a rebase

@dotdash
Copy link
Contributor Author

dotdash commented Aug 9, 2013

Performed some measurements after rebasing. LLVM passes actually get a bit slower (about 1%), but the resulting code is better. For example, LLVM inlines closures more often. The benchmark from issue #6623 shows this:

Before with -O:

running 2 tests
test test1 ... bench: 2360 ns/iter (+/- 10)
test test2 ... bench: 2359 ns/iter (+/- 6)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

After with -O:

running 2 tests
test test1 ... bench: 2103 ns/iter (+/- 14)
test test2 ... bench: 2102 ns/iter (+/- 8)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

Before without -O:

running 2 tests
test test1 ... bench: 7914 ns/iter (+/- 9)
test test2 ... bench: 7998 ns/iter (+/- 23)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

After without -O:

running 2 tests
test test1 ... bench: 7357 ns/iter (+/- 43)
test test2 ... bench: 7343 ns/iter (+/- 21)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured

@dotdash
Copy link
Contributor Author

dotdash commented Aug 9, 2013

FWIW the newtype wrappers were loosely modelled after the existing Type type and are primarily there to deal with possible NULL pointers to get less verbose code (through Options, by using the chain method). If you want that out, just let me know. Regarding "whatever intentions we have here", are those intentions documented anywhere?

@emberian
Copy link
Member

emberian commented Aug 9, 2013

@dotdash can you include the benchmarks in the PR? that's the only thing stopping me from giving this an r+

@dotdash
Copy link
Contributor Author

dotdash commented Aug 9, 2013

@cmr While that benchmark shows some improvement, it's not exactly a good one for this thing, because once #6623 gets fixed, the effect from this change is mostly gone, (only test2 still gains like 3%), I tried to find a better one, but my time is rather limited at the moment.

When there is only a single store to the ret slot that dominates the
load that gets the value for the "ret" instruction, we can elide the
ret slot and directly return the operand of the dominating store
instruction. This is the same thing that clang does, except for a
special case that doesn't seem to affect us.

Fixes rust-lang#8238
@dotdash
Copy link
Contributor Author

dotdash commented Aug 10, 2013

Added a codegen test, that was easy enough ;-)

bors added a commit that referenced this pull request Aug 10, 2013
When there is only a single store to the ret slot that dominates the
load that gets the value for the "ret" instruction, we can elide the
ret slot and directly return the operand of the dominating store
instruction. This is the same thing that clang does, except for a
special case that doesn't seem to affect us.

Fixes #8238
@bors bors closed this Aug 10, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Omit the ret slot alloca when it's not needed
5 participants