Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVM] missing some? NEON intrinsics in darwin-aarch64 #4726

Open
Transfusion opened this issue Jul 15, 2022 · 10 comments
Open

[LLVM] missing some? NEON intrinsics in darwin-aarch64 #4726

Transfusion opened this issue Jul 15, 2022 · 10 comments
Assignees

Comments

@Transfusion
Copy link

Describe GraalVM and your environment :

  • GraalVM version or commit id if built from source: 3c4313396a95b90005432f64e9100b0212f92dcf
  • CE or EE: CE
  • JDK version: OpenJDK Runtime Environment GraalVM CE 22.3.0-dev (build 11.0.16+7-jvmci-22.3-b01)
  • OS and OS Version: macOS 12.4
  • Architecture: aarch64
  • The output of java -Xinternalversion:
OpenJDK 64-Bit Server VM (11.0.16+7-jvmci-22.3-b01) for bsd-aarch64 JRE (11.0.16+7-jvmci-22.3-b01), built on Jul  6 2022 11:42:36 by "graal" with clang Apple LLVM 12.0.0 (clang-1200.0.32.29)

Have you verified this issue still happens when using the latest snapshot?

Yes

Describe the issue

I encountered missing LLVM builtin: llvm.aarch64.neon.ld2.v16i8.p0v16i8 when I tried to use the google-protobuf gem via truffleruby.

/Users/transfusion/graalvm-ce-java11-22.3.0-dev/Contents/Home/languages/ruby/lib/gems/gems/google-protobuf-3.21.2/ext/google/protobuf_c/third_party/utf8_range/range2-neon.c:36:in `utf8_range2': missing LLVM builtin: llvm.aarch64.neon.ld2.v16i8.p0v16i8 (Polyglot::ForeignException)
	from /Users/transfusion/graalvm-ce-java11-22.3.0-dev/Contents/Home/languages/ruby/lib/gems/gems/google-protobuf-3.21.2/ext/google/protobuf_c/ruby-upb.c:1221:in `decode_msg'

range2-neon.c may be found here

Code snippet or code repository that reproduces the problem

#include <iostream>
#include <arm_neon.h>

const uint8_t _range_adjust_tbl[] = {
    /* index -> 0~15  16~31 <- index */
    /*  E0 -> */ 2,
    3, /* <- F0  */ 0,   0,    0,    0,    0,    0,    0,
    4, /* <- F4  */
    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
    /*  ED -> */ 3,
    0,    0,    0,    0,    0,
};

int main()
{
  const uint8x16x2_t range_adjust_tbl = vld2q_u8(_range_adjust_tbl);
  auto a = range_adjust_tbl.val[0];
  auto b = range_adjust_tbl.val[1];
  std::cout << vgetq_lane_u8(a, 0) << std::endl;
}

Steps to reproduce the problem

  1. $LLVM_TOOLCHAIN/clang++ foo.cpp -emit-llvm -c -o foo.bc
  2. $GRAAALVM_HOME/bin/lli foo.bc

Output

missing LLVM builtin: llvm.aarch64.neon.ld2.v16i8.p0v16i8
        at <llvm> main(neon_intrinsic.cpp:64:1223)

Expected behavior

Program executes without runtime errors like on stock LLVM.

@lewurm lewurm self-assigned this Jul 18, 2022
@lewurm
Copy link
Member

lewurm commented Jul 19, 2022

@eregon is it possible for the user to sneak in CFLAGS when installing a gem? I think -Xclang -target-feature -Xclang -neon should unblock @Transfusion.

I'm tempted to add those flags to our clang wrapper if running on AArch64, as the list of Neon intrinsics to support is quite long: https://github.com/llvm/llvm-project/blob/532dc62b907554b3f07f17205674aa71e76fc863/clang/test/CodeGen/aarch64-neon-intrinsics.c Even just the subset used in protobuf are quite some. Also I doubt we gain any speed up by emulating vector instructions, so it's probably even more performant to use the generic fallback in this case. On the other hand, armv8 includes the Neon extension, so I'm unsure how much problems this would cause in practice.

@rschatz what is your take on this? How is the situation handled around e.g. AVX on x86_64?

@rschatz
Copy link
Member

rschatz commented Jul 19, 2022

Currently, we disable what we can on x86_64. And for the rest, we implement the intrinsics as we see them.

Concretely, we disable SSE3 and higher, and AVX:


Unfortunately, it's not possible to disable SSE2 completely, since that would also disable the scalar floating point operations, not just the vectorized ones.

@eregon
Copy link
Member

eregon commented Jul 19, 2022

@eregon is it possible for the user to sneak in CFLAGS when installing a gem?

Not currently, no.

I'm tempted to add those flags to our clang wrapper if running on AArch64

Yes, I think we should do that, similar to what we do on x86_64.

@eregon eregon added the ruby label Jul 19, 2022
@lewurm
Copy link
Member

lewurm commented Jul 20, 2022

See #4738

@Transfusion can you provide some steps to reproduce the issue?

Using https://github.com/cyb70289/utf8 as an example kind of confirms my fear that disabling NEON support won't fly with unclean codebases: it needed a few #if defined(__ARM_NEON) guards to make it work. Interestingly enough the copy in the protobuf repository has the right guards in place.

@eregon
Copy link
Member

eregon commented Aug 8, 2022

#4738 has been merged, and that's part of the current truffleruby-dev build.
However there is still some problems to install grpc on macOS M1 with Neon, @lewurm could you look into it?
See oracle/truffleruby#2697 (comment)

@lewurm
Copy link
Member

lewurm commented Aug 10, 2022

Heavy sigh: There is __ARM_NEON which the PR has taken care of, but there is also __ARM_NEON__. The former is the recommended macro by ARM, but Apple remains to use the latter all over in their SDKs. LLVM sets the latter unconditionally for Darwin, unlike __ARM_NEON which is guarded by whatever is set via -target-feature. This is a problem as for example a header used by gprc tests for both macros: https://github.com/grpc/grpc/blob/9479089ac8cb99e66a71eab687b06ce220a94838/third_party/xxhash/xxhash.h#L2716 and thus still doesn't work then.

We could fix this on the LLVM side (and try to upstream it), like this: https://gist.github.com/lewurm/746ab6a78374be9529ce6a58063ae7f0

I tried this fix locally, and tried to build OpenJDK with the Sulong toolchain, but running into this scenario:

In file included from /Users/lewurm/work/labsjdk-ce-17/src/java.desktop/macosx/native/libawt_lwawt/font/CGGlyphImages.m:26:
In file included from /Applications/Xcode13.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework/Headers/Accelerate.h:20:
In file included from /Applications/Xcode13.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework/Headers/../Frameworks/vecLib.framework/Headers/vecLib.h:25:
In file included from /Applications/Xcode13.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Headers/vBasicOps.h:42:
/Users/lewurm/work/graal/sdk/mxbuild/darwin-aarch64/GRAALVM_3AA0483B57_JAVA17/graalvm-3aa0483b57-java17-22.3.0-dev/Contents/Home/lib/llvm/lib/clang/14.0.6/include/arm_neon.h:32:
2: error: "NEON support not enabled"
#error "NEON support not enabled"

The problem is in vBasicOps.h:
https://github.com/phracker/MacOSX-SDKs/blob/041600eda65c6a668f66cb7d56b7d1da3e8bcc93/MacOSX11.3.sdk/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/vecLib/vBasicOps.h#L44-L46
First, it doesn't guard the #include <arm_neon.h> with a test of __ARM_NEON (or __ARM_NEON__ for that matter) as recommended by the ARM C Language Extensions (Section 4.4). But second, even if it would, it doesn't provide a fallback and thus definitions later in the header file would fail anyway.

I guess that Apple assumes that arm64 implies Neon being available is fair as they control the whole stack down to the hardware. I'm reporting it anyway via Radar, but even if they would fix that (and provide a generic implementation), that is probably years away from happening.

So I think we are at a loss here, and have to (1) revert the Sulong toolchain PR that disabled Neon, and (2) start implement Neon intrinsics in Sulong as needed.

Any thoughts?

@eregon
Copy link
Member

eregon commented Aug 10, 2022

and tried to build OpenJDK with the Sulong toolchain

Do we need to do that? I don't think we need OpenJDK compiled by the Sulong toolchain (could be another compiler, or without the Sulong toolchain wrappers), do we?

@lewurm
Copy link
Member

lewurm commented Aug 16, 2022

and tried to build OpenJDK with the Sulong toolchain

Do we need to do that? I don't think we need OpenJDK compiled by the Sulong toolchain (could be another compiler, or without the Sulong toolchain wrappers), do we?

We need it for Espresso which supports a mode when running on HotSpot that is called nfi-llvm. The problem that occurs there is that for example libjava has to be opened multiple times, first by OpenJDK itself and then per each Espresso context. dlopen doesn't support that1, but we can exploit Sulong for that. Because of that we need to have OpenJDK libs with bitcode available. We ship that today for linux-x86_64 and darwin-x86_64 (check out labsjdk-17, builds with -sulong suffix contain bitcode) and we plan to ship that for aarch64 as well.

Footnotes

  1. For example we want static vars of libjava to be initialized per JVM instance. There is dlmopen on glibc that supports such isolation via namespaces, but it's only available on newer glibc versions and suffers from bugs. And of course, it's a linux-only solution.

@rschatz
Copy link
Member

rschatz commented Aug 16, 2022

So I think we are at a loss here, and have to (1) revert the Sulong toolchain PR that disabled Neon, and (2) start implement Neon intrinsics in Sulong as needed.

I agree, even though I don't really like it. If everyone expects that to be there on the majority of chips out there, we can't really do anything about it, we'll have to implement it, similar to SSE2 on x86_64. The #ifndef __ARM_NEON #error ... is the good case, I'm sure a lot of people will just use them without any checks at all ;)

Let's hope it's not too many distinct operations we have to support.

graalvmbot pushed a commit that referenced this issue Sep 6, 2022
…h64"

This reverts commit d4f3bed and a511c27.

This flag does not disable `__ARM_NEON__` which is used on Apple. We
could fix LLVM on our side, but there are too many assumptions in the
Apple SDK that this is available (see
#4726 (comment) ).
graalvmbot pushed a commit that referenced this issue Sep 9, 2022
…h64"

This reverts commit d4f3bed and a511c27.

This flag does not disable `__ARM_NEON__` which is used on Apple. We
could fix LLVM on our side, but there are too many assumptions in the
Apple SDK that this is available (see
#4726 (comment) ).
@lewurm
Copy link
Member

lewurm commented Sep 20, 2022

@Transfusion this should be resolved now. Could you please verify with the latest dev build from https://github.com/graalvm/graalvm-ce-dev-builds/releases ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants