Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rustc: Improve compile time of platform intrinsics #32236

Merged
merged 1 commit into from
Mar 16, 2016

Conversation

alexcrichton
Copy link
Member

This commit improves the compile time of rustc_platform_intrinsics from 23s to
3.6s if compiling with -O and from 77s to 17s if compiling with -O -g. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with Type by
removing the internal Box and Vec indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes #28273

@rust-highfive
Copy link
Collaborator

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member Author

cc @huonw

@alexcrichton alexcrichton force-pushed the better-compile-intrinsics branch from 6e2956b to 0c8b1ba Compare March 14, 2016 00:03
Aggregate(bool, Vec<Type>),
Pointer(&'static Type, Option<&'static Type>, /* const */ bool),
Vector(&'static Type, Option<&'static Type>, u8),
Aggregate(bool, &'static &'static [&'static Type]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a &&[] so that it is just pointer-sized, not slice sized?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem a little odd, actually. Since it's not going to change the overall size of the type, the alignment means that Pointer and Vector are 4 words big anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this was actually because the generated static values had the type &'static [T] and that couldn't be used by value (only by reference). I realize now though that all of the statics can have type [T; N] and then we can use references and coercions to get the right type, so no there's no longer double indirection.

@alexcrichton alexcrichton force-pushed the better-compile-intrinsics branch from 0c8b1ba to d92b717 Compare March 14, 2016 04:57
@aturon
Copy link
Member

aturon commented Mar 14, 2016

cc me

@arielb1
Copy link
Contributor

arielb1 commented Mar 15, 2016

Why not store everything in rodata and just do a binary search (instead of a match)?

@arielb1
Copy link
Contributor

arielb1 commented Mar 15, 2016

That's it, generate code of the form

pub static INTRINSICS: &'static [Intrinsic] = &[
        Intrinsic {
            name: "movemask_ps",
            inputs: &[&::F32x4],
            output: &::I32,
            definition: Named("llvm.x86.sse.movmsk.ps")
        },
        Intrinsic {
            name: "storeu_ps",
            inputs: &[&Type::Pointer(&::F32, Some(&::I8), false), &::F32x4],
            output: &::VOID,
            definition: Named("llvm.x86.sse.storeu.ps")
        },
];

@alexcrichton
Copy link
Member Author

@arielb1 yeah that would probably work, I don't think it would improve compile times that much, more, however.

@arielb1
Copy link
Contributor

arielb1 commented Mar 15, 2016

I think it would be cleaner

@nagisa
Copy link
Member

nagisa commented Mar 15, 2016

@alexcrichton last time (quite a while ago) I was dealing with matches of strs, LLVM was spending considerable amount of times figuring out the best way to order memcmps. The most proper way to do matching of this sort would be with a prefix trie, which also turned out to be much faster at runtime back then.

@arielb1
Copy link
Contributor

arielb1 commented Mar 15, 2016

Yeah that would be nice to implement. OTOH, I think that being data-driven is the right thing to do here.

@nagisa
Copy link
Member

nagisa commented Mar 15, 2016

We could also use something phf-like for static tables. I’m pretty sure it wouldn’t be too hard to do something smart about encoding tries into rodata either.

Regardless, this PR stands very well on its own and we could see if there’s anything more to gain here independently.

@arielb1
Copy link
Contributor

arielb1 commented Mar 15, 2016

I'll go with what we have.

@bors r+

@bors
Copy link
Contributor

bors commented Mar 15, 2016

📌 Commit d92b717 has been approved by arielb1

@alexcrichton
Copy link
Member Author

Yeah I was actually thinking this'd be a neat case for phf, which gave me more inspiration to keep pushing on rustbuild :)

@bors
Copy link
Contributor

bors commented Mar 15, 2016

⌛ Testing commit d92b717 with merge 4f8ab7d...

bors added a commit that referenced this pull request Mar 15, 2016
rustc: Improve compile time of platform intrinsics

This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes #28273
@bors
Copy link
Contributor

bors commented Mar 15, 2016

💔 Test failed - auto-linux-64-opt

This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes rust-lang#28273
@alexcrichton alexcrichton force-pushed the better-compile-intrinsics branch from d92b717 to 87ede2d Compare March 16, 2016 00:32
@alexcrichton
Copy link
Member Author

@bors: r=arielb1 87ede2d

@bors
Copy link
Contributor

bors commented Mar 16, 2016

⌛ Testing commit 87ede2d with merge 0986d64...

bors added a commit that referenced this pull request Mar 16, 2016
rustc: Improve compile time of platform intrinsics

This commit improves the compile time of `rustc_platform_intrinsics` from 23s to
3.6s if compiling with `-O` and from 77s to 17s if compiling with `-O -g`. The
compiled rlib size also drops from 3.1M to 1.2M.

The wins here were gained by removing the destructors associated with `Type` by
removing the internal `Box` and `Vec` indirections. These destructors meant that
a lot of landing pads and extra code were generated to manage the runtime
representations. Instead everything can basically be statically computed and
shoved into rodata, so all we need is a giant string compare to lookup what's
what.

Closes #28273
@bors bors merged commit 87ede2d into rust-lang:master Mar 16, 2016
@alexcrichton alexcrichton deleted the better-compile-intrinsics branch March 27, 2016 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants