[LLVM] missing some? NEON intrinsics in `darwin-aarch64` #4726

Transfusion · 2022-07-15T06:33:24Z

Describe GraalVM and your environment :

GraalVM version or commit id if built from source: 3c4313396a95b90005432f64e9100b0212f92dcf
CE or EE: CE
JDK version: OpenJDK Runtime Environment GraalVM CE 22.3.0-dev (build 11.0.16+7-jvmci-22.3-b01)
OS and OS Version: macOS 12.4
Architecture: aarch64
The output of java -Xinternalversion:

OpenJDK 64-Bit Server VM (11.0.16+7-jvmci-22.3-b01) for bsd-aarch64 JRE (11.0.16+7-jvmci-22.3-b01), built on Jul  6 2022 11:42:36 by "graal" with clang Apple LLVM 12.0.0 (clang-1200.0.32.29)

Have you verified this issue still happens when using the latest snapshot?

Yes

Describe the issue

I encountered missing LLVM builtin: llvm.aarch64.neon.ld2.v16i8.p0v16i8 when I tried to use the google-protobuf gem via truffleruby.

/Users/transfusion/graalvm-ce-java11-22.3.0-dev/Contents/Home/languages/ruby/lib/gems/gems/google-protobuf-3.21.2/ext/google/protobuf_c/third_party/utf8_range/range2-neon.c:36:in `utf8_range2': missing LLVM builtin: llvm.aarch64.neon.ld2.v16i8.p0v16i8 (Polyglot::ForeignException)
	from /Users/transfusion/graalvm-ce-java11-22.3.0-dev/Contents/Home/languages/ruby/lib/gems/gems/google-protobuf-3.21.2/ext/google/protobuf_c/ruby-upb.c:1221:in `decode_msg'

range2-neon.c may be found here

Code snippet or code repository that reproduces the problem

#include <iostream>
#include <arm_neon.h>

const uint8_t _range_adjust_tbl[] = {
    /* index -> 0~15  16~31 <- index */
    /*  E0 -> */ 2,
    3, /* <- F0  */ 0,   0,    0,    0,    0,    0,    0,
    4, /* <- F4  */
    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,    0,
    /*  ED -> */ 3,
    0,    0,    0,    0,    0,
};

int main()
{
  const uint8x16x2_t range_adjust_tbl = vld2q_u8(_range_adjust_tbl);
  auto a = range_adjust_tbl.val[0];
  auto b = range_adjust_tbl.val[1];
  std::cout << vgetq_lane_u8(a, 0) << std::endl;
}

Steps to reproduce the problem

$LLVM_TOOLCHAIN/clang++ foo.cpp -emit-llvm -c -o foo.bc
$GRAAALVM_HOME/bin/lli foo.bc

Output

missing LLVM builtin: llvm.aarch64.neon.ld2.v16i8.p0v16i8
        at <llvm> main(neon_intrinsic.cpp:64:1223)

Expected behavior

Program executes without runtime errors like on stock LLVM.

The text was updated successfully, but these errors were encountered:

lewurm · 2022-07-19T10:36:24Z

@eregon is it possible for the user to sneak in CFLAGS when installing a gem? I think -Xclang -target-feature -Xclang -neon should unblock @Transfusion.

I'm tempted to add those flags to our clang wrapper if running on AArch64, as the list of Neon intrinsics to support is quite long: https://github.com/llvm/llvm-project/blob/532dc62b907554b3f07f17205674aa71e76fc863/clang/test/CodeGen/aarch64-neon-intrinsics.c Even just the subset used in protobuf are quite some. Also I doubt we gain any speed up by emulating vector instructions, so it's probably even more performant to use the generic fallback in this case. On the other hand, armv8 includes the Neon extension, so I'm unsure how much problems this would cause in practice.

@rschatz what is your take on this? How is the situation handled around e.g. AVX on x86_64?

rschatz · 2022-07-19T13:05:53Z

Currently, we disable what we can on x86_64. And for the rest, we implement the intrinsics as we see them.

Concretely, we disable SSE3 and higher, and AVX:

graal/sulong/projects/com.oracle.truffle.llvm.toolchain.launchers/src/com/oracle/truffle/llvm/toolchain/launchers/common/ClangLikeBase.java

Line 183 in 2cdb368

return Arrays.asList("-mno-sse3", "-mno-avx");

Unfortunately, it's not possible to disable SSE2 completely, since that would also disable the scalar floating point operations, not just the vectorized ones.

eregon · 2022-07-19T13:26:35Z

@eregon is it possible for the user to sneak in CFLAGS when installing a gem?

Not currently, no.

I'm tempted to add those flags to our clang wrapper if running on AArch64

Yes, I think we should do that, similar to what we do on x86_64.

lewurm · 2022-07-20T15:15:02Z

See #4738

@Transfusion can you provide some steps to reproduce the issue?

Using https://github.com/cyb70289/utf8 as an example kind of confirms my fear that disabling NEON support won't fly with unclean codebases: it needed a few #if defined(__ARM_NEON) guards to make it work. Interestingly enough the copy in the protobuf repository has the right guards in place.

eregon · 2022-08-08T14:56:40Z

#4738 has been merged, and that's part of the current truffleruby-dev build.
However there is still some problems to install grpc on macOS M1 with Neon, @lewurm could you look into it?
See oracle/truffleruby#2697 (comment)

lewurm · 2022-08-10T13:53:25Z

Heavy sigh: There is __ARM_NEON which the PR has taken care of, but there is also __ARM_NEON__. The former is the recommended macro by ARM, but Apple remains to use the latter all over in their SDKs. LLVM sets the latter unconditionally for Darwin, unlike __ARM_NEON which is guarded by whatever is set via -target-feature. This is a problem as for example a header used by gprc tests for both macros: https://github.com/grpc/grpc/blob/9479089ac8cb99e66a71eab687b06ce220a94838/third_party/xxhash/xxhash.h#L2716 and thus still doesn't work then.

We could fix this on the LLVM side (and try to upstream it), like this: https://gist.github.com/lewurm/746ab6a78374be9529ce6a58063ae7f0

I tried this fix locally, and tried to build OpenJDK with the Sulong toolchain, but running into this scenario:

In file included from /Users/lewurm/work/labsjdk-ce-17/src/java.desktop/macosx/native/libawt_lwawt/font/CGGlyphImages.m:26:
In file included from /Applications/Xcode13.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework/Headers/Accelerate.h:20:
In file included from /Applications/Xcode13.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework/Headers/../Frameworks/vecLib.framework/Headers/vecLib.h:25:
In file included from /Applications/Xcode13.3.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Headers/vBasicOps.h:42:
/Users/lewurm/work/graal/sdk/mxbuild/darwin-aarch64/GRAALVM_3AA0483B57_JAVA17/graalvm-3aa0483b57-java17-22.3.0-dev/Contents/Home/lib/llvm/lib/clang/14.0.6/include/arm_neon.h:32:
2: error: "NEON support not enabled"
#error "NEON support not enabled"

The problem is in vBasicOps.h:
https://github.com/phracker/MacOSX-SDKs/blob/041600eda65c6a668f66cb7d56b7d1da3e8bcc93/MacOSX11.3.sdk/System/Library/Frameworks/Kernel.framework/Versions/A/Headers/vecLib/vBasicOps.h#L44-L46
First, it doesn't guard the #include <arm_neon.h> with a test of __ARM_NEON (or __ARM_NEON__ for that matter) as recommended by the ARM C Language Extensions (Section 4.4). But second, even if it would, it doesn't provide a fallback and thus definitions later in the header file would fail anyway.

I guess that Apple assumes that arm64 implies Neon being available is fair as they control the whole stack down to the hardware. I'm reporting it anyway via Radar, but even if they would fix that (and provide a generic implementation), that is probably years away from happening.

So I think we are at a loss here, and have to (1) revert the Sulong toolchain PR that disabled Neon, and (2) start implement Neon intrinsics in Sulong as needed.

Any thoughts?

eregon · 2022-08-10T15:15:56Z

and tried to build OpenJDK with the Sulong toolchain

Do we need to do that? I don't think we need OpenJDK compiled by the Sulong toolchain (could be another compiler, or without the Sulong toolchain wrappers), do we?

lewurm · 2022-08-16T07:50:04Z

and tried to build OpenJDK with the Sulong toolchain

Do we need to do that? I don't think we need OpenJDK compiled by the Sulong toolchain (could be another compiler, or without the Sulong toolchain wrappers), do we?

We need it for Espresso which supports a mode when running on HotSpot that is called nfi-llvm. The problem that occurs there is that for example libjava has to be opened multiple times, first by OpenJDK itself and then per each Espresso context. dlopen doesn't support that¹, but we can exploit Sulong for that. Because of that we need to have OpenJDK libs with bitcode available. We ship that today for linux-x86_64 and darwin-x86_64 (check out labsjdk-17, builds with -sulong suffix contain bitcode) and we plan to ship that for aarch64 as well.

For example we want static vars of libjava to be initialized per JVM instance. There is dlmopen on glibc that supports such isolation via namespaces, but it's only available on newer glibc versions and suffers from bugs. And of course, it's a linux-only solution. ↩

rschatz · 2022-08-16T13:29:45Z

So I think we are at a loss here, and have to (1) revert the Sulong toolchain PR that disabled Neon, and (2) start implement Neon intrinsics in Sulong as needed.

I agree, even though I don't really like it. If everyone expects that to be there on the majority of chips out there, we can't really do anything about it, we'll have to implement it, similar to SSE2 on x86_64. The #ifndef __ARM_NEON #error ... is the good case, I'm sure a lot of people will just use them without any checks at all ;)

Let's hope it's not too many distinct operations we have to support.

…h64" This reverts commit d4f3bed and a511c27. This flag does not disable `__ARM_NEON__` which is used on Apple. We could fix LLVM on our side, but there are too many assumptions in the Apple SDK that this is available (see #4726 (comment) ).

lewurm · 2022-09-20T07:45:30Z

@Transfusion this should be resolved now. Could you please verify with the latest dev build from https://github.com/graalvm/graalvm-ce-dev-builds/releases ?

Transfusion added the llvm label Jul 15, 2022

lewurm self-assigned this Jul 18, 2022

eregon added the ruby label Jul 19, 2022

eregon mentioned this issue Aug 8, 2022

grpc doesn't build on M1 Mac (aarch64) with 22.2.0 oracle/truffleruby#2697

Closed

ollym mentioned this issue Aug 26, 2022

The grpc gem does not work yet at runtime (it installs fine) oracle/truffleruby#2247

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM] missing some? NEON intrinsics in `darwin-aarch64` #4726

[LLVM] missing some? NEON intrinsics in `darwin-aarch64` #4726

Transfusion commented Jul 15, 2022

lewurm commented Jul 19, 2022

rschatz commented Jul 19, 2022

eregon commented Jul 19, 2022

lewurm commented Jul 20, 2022

eregon commented Aug 8, 2022

lewurm commented Aug 10, 2022

eregon commented Aug 10, 2022 •

edited

Loading

lewurm commented Aug 16, 2022

rschatz commented Aug 16, 2022

lewurm commented Sep 20, 2022

[LLVM] missing some? NEON intrinsics in darwin-aarch64 #4726

[LLVM] missing some? NEON intrinsics in darwin-aarch64 #4726

Comments

Transfusion commented Jul 15, 2022

lewurm commented Jul 19, 2022

rschatz commented Jul 19, 2022

eregon commented Jul 19, 2022

lewurm commented Jul 20, 2022

eregon commented Aug 8, 2022

lewurm commented Aug 10, 2022

eregon commented Aug 10, 2022 • edited Loading

lewurm commented Aug 16, 2022

Footnotes

rschatz commented Aug 16, 2022

lewurm commented Sep 20, 2022

[LLVM] missing some? NEON intrinsics in `darwin-aarch64` #4726

[LLVM] missing some? NEON intrinsics in `darwin-aarch64` #4726

eregon commented Aug 10, 2022 •

edited

Loading