Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable the use of [SU]Int32Size and EnumSize templates for AArch64 #11102

Closed
wants to merge 1 commit into from

Conversation

avieira-arm
Copy link
Contributor

Hi,

When benchmarking proto_benchmark from fleetbench on an AArch64 target we found that clang is able to vectorize these functions and they offer better performance than the scalar alternative.

I ran //src/google/protobuf:arena_unittest on aarch64-none-linux-gnu. Should I run any other tests? Also protobuf used to have its own set of benchmarks, but I can't find these when I query all targets with bazel. Let me know if you'd like me to run anything else, I couldn't find instructions on what the full test run is.

@ckennelly
Copy link
Contributor

The benchmarks are now located in https://github.com/google/fleetbench

@avieira-arm
Copy link
Contributor Author

Rebased this as I thought that might be the cause for the 'Mergeable 1/1 Fail(s): LABEL', but that still seems to be there. Any idea what that's trying to tell me?

And @ckennelly, thanks. I used proto_benchmark off that repo to benchmark protobuf and it's how I found out we weren't using this path for Arm. The reason I asked about benchmarks was because I remembered the protobuf-repo ones were slightly different (I think they had smaller messages).

@fowles
Copy link
Contributor

fowles commented Jan 6, 2023

mergable fail is a safeguard to prevent google engineers from accidentally merging it without doing internal testing

@danlark1
Copy link

danlark1 commented Jan 8, 2023

For fleetbench we see 1-1.5% improvement on G2A instances

name                 old cpu/op  new cpu/op  delta
BM_Protogen_Arena    19.1ms ± 2%  18.9ms ± 3%  -1.48%  (p=0.000 n=36+45)
BM_Protogen_NoArena  20.6ms ± 1%  20.4ms ± 3%  -1.18%  (p=0.000 n=30+44)

For int32 size microbenchmarks we have something like 2x speed up

BM_RepeatedFieldSize/1        2.67ns ± 0%  5.39ns ± 0%  +101.67%      (p=0.000 n=495+552)
BM_RepeatedFieldSize/8        10.6ns ± 0%   6.5ns ± 1%   -38.57%      (p=0.000 n=515+581)
BM_RepeatedFieldSize/64       63.8ns ± 0%  30.0ns ± 0%   -52.98%      (p=0.000 n=550+438)
BM_RepeatedFieldSize/512       490ns ± 0%   216ns ± 0%   -55.84%      (p=0.000 n=563+547)
BM_RepeatedFieldSize/1k        981ns ± 0%   430ns ± 0%   -56.15%      (p=0.000 n=545+506)

Godbolt looks great. We managed to get a combination of cmhi and sub with usra when needed for >>31. This is compared to clz + usra in the previous version. https://gcc.godbolt.org/z/W9bGd81KM

image

LGTM from the arm code generation side. Thanks a lot!

@avieira-arm
Copy link
Contributor Author

For int32 size microbenchmarks we have something like 2x speed up

BM_RepeatedFieldSize/1        2.67ns ± 0%  5.39ns ± 0%  +101.67%      (p=0.000 n=495+552)

How do you run the BM_RepeatedFieldSize Benchmarks?

@danlark1
Copy link

danlark1 commented Jan 9, 2023

For int32 size microbenchmarks we have something like 2x speed up

BM_RepeatedFieldSize/1        2.67ns ± 0%  5.39ns ± 0%  +101.67%      (p=0.000 n=495+552)

How do you run the BM_RepeatedFieldSize Benchmarks?

They are internal to us as we did not want to expose gbench at the time, I guess. Not sure about the current state, we probably should expose them.

The benchmark is simple, it just runs Int32Size on a random repeated field. We located it in protobuf/wire_format_unittest.cc

@haberman
Copy link
Member

@avieira-arm could you please rebase on main? Sorry for the trouble, you've caught us in the middle of a migration to GitHub Actions.

@avieira-arm
Copy link
Contributor Author

I've rebased it but I've not been able to rebuild proto_benchmark due to some bazel stuff, so I'm hoping the CI can check whether it builds fine here.

While I have your attention, maybe you can help me resolve the issue I have with building the proto benchmark in fleetbench.

I build it overriding the com_google_protobuf, com_google_absl and com_google_tcmalloc repository's using local ones (that are unchanged, other than this protobuf patch). I get an error complaining the C++ version is too old, and when I check the build commands with --verbose_failures I see '-std=c++0x' is beeing passed. I can't find this option being added in any of the local repositories so I have to assume it is being added by some bazel rule that is being downloaded. I'm not super experienced with bazel and I've only done a minimal level of looking into this for now, in the past I hacked up the config files in abseil to get me past the errors, but the projects now actually use C++14 stuff so making these changes is no longer viable.

@avieira-arm
Copy link
Contributor Author

Looks like none of the tests are running because of some missing 'secrets'. I suspect this is an internal thing too?

@avieira-arm
Copy link
Contributor Author

I keep getting emails about workflows failing to run. I don't think theres much I can do about that though, can someone please confirm.

It would be nice to get this merged or dropped if you don't want it, but the benchmarks seem to indicate that it would be a desirable change :)

@avieira-arm
Copy link
Contributor Author

Rebased again.

@fowles fowles added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 25, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 25, 2023
@fowles
Copy link
Contributor

fowles commented Apr 25, 2023

sorry to be a pain but you are synced to a bad point and we need another rebase

When benchmarking proto_benchmark from fleetbench on an AArch64 target we found
that clang is able to vectorize these functions and they offer better
performance than the scalar alternative.
@avieira-arm
Copy link
Contributor Author

Rebased again. Let me know if you still want this.

@danlark1
Copy link

We definitely want this, this should be a very safe change. LGTM from me

@fowles

@sbenzaquen sbenzaquen removed the request for review from mcy May 15, 2023 21:52
@fowles fowles added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label May 15, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label May 15, 2023
@fowles fowles added the platform related Any issue releated to specific platform or OS label May 16, 2023
@fowles
Copy link
Contributor

fowles commented May 17, 2023

Sorry for all the delays on this one!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ platform related Any issue releated to specific platform or OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants