osx, build: Node v6.3.0 won't build on Clang 3.2 (OSX 10.8 Darwin 12.5.0) #7618

gibfahn · 2016-07-08T14:43:38Z

Version: v6.3.0
Platform: Mac OSX 10.8 (Darwin 12.5.0)
Subsystem: build

The build is failing on OSX 10.8 but passing on 10.10. It looks like the problem is due to this commit: 4014ecb, which was added in #7157 and backported in #7546. I note that @zbjornson said that:

Tested with gcc/g++ 4.8.3, clang 3.4 and MSVC 2015 (i.e. something like #4284 shouldn't happen here).

But I'm still getting this problem. I assume something similar to #4290 would fix it. @bnoordhuis

Failing machine:

Darwin Kernel Version 12.5.0
$ cc --version
Apple LLVM version 4.2 (clang-425.0.28) (based on LLVM 3.2svn)
Target: x86_64-apple-darwin12.5.0

Passing machine:

Darwin Kernel Version 14.0.0
$ cc --version
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.0.0
Thread model: posix

Error:

../src/node_buffer.cc:1193:19: error: use of undeclared identifier '__builtin_bswap16'
      data16[i] = BSWAP_INTRINSIC_2(data16[i]);
                  ˆ
../src/node_buffer.cc:56:30: note: expanded from macro 'BSWAP_INTRINSIC_2'
#define BSWAP_INTRINSIC_2(x) __builtin_bswap16(x)
                             ˆ
  c++ '-D_DARWIN_USE_64_BIT_INODE=1' '-DNODE_ARCH="x64"' '-DNODE_WANT_INTERNALS=1' '-DV8_DEPRECATION_WARNINGS=1' '-DNODE_USE_V8_PLATFORM=1' '-DNODE_TAG="-IBMBuild-"' '-DNODE_HAVE_I18N_SUPPORT=1' '-DNODE_HAVE_SMALL_ICU=1' '-DHAVE_INSPECTOR=1' '-DV8_INSPECTOR_USE_STL=1' '-DHAVE_OPENSSL=1' '-DHAVE_DTRACE=1' '-D__POSIX__' '-DNODE_PLATFORM="darwin"' '-DUCONFIG_NO_TRANSLITERATION=1' '-DUCONFIG_NO_SERVICE=1' '-DUCONFIG_NO_REGULAR_EXPRESSIONS=1' '-DU_ENABLE_DYLOAD=0' '-DU_STATIC_IMPLEMENTATION=1' '-DU_HAVE_STD_STRING=0' '-DUCONFIG_NO_BREAK_ITERATION=0' '-DUCONFIG_NO_LEGACY_CONVERSION=1' '-DUCONFIG_NO_CONVERSION=1' '-DHTTP_PARSER_STRICT=0' '-D_LARGEFILE_SOURCE' '-D_FILE_OFFSET_BITS=64' -I../src -I../tools/msvs/genfiles -I../deps/uv/src/ares -I/build/node/out/Release/obj/gen -I../deps/v8_inspector -I../deps/v8_inspector/deps/wtf -I/build/node/out/Release/obj/gen/blink -I../deps/v8/include -I../deps/icu-small/source/i18n -I../deps/icu-small/source/common -I../deps/openssl/openssl/include -I../deps/zlib -I../deps/http_parser -I../deps/cares/include -I../deps/uv/include  -Os -gdwarf-2 -mmacosx-version-min=10.7 -arch x86_64 -Wall -Wendif-labels -W -Wno-unused-parameter -std=gnu++0x -fno-rtti -fno-exceptions -fno-threadsafe-statics -fno-strict-aliasing -MMD -MF /build/node/out/Release/.deps//build/node/out/Release/obj.target/node/src/node_file.o.d.raw   -c -o /build/node/out/Release/obj.target/node/src/node_file.o ../src/node_file.cc
1 error generated.
make[2]: *** [/build/node/out/Release/obj.target/node/src/node_buffer.o] Error 1
make[2]: *** Waiting for unfinished jobs....
rm _node_deps_v8_inspector_platform_v8_inspector_v8_inspector_gyp_protocol_sources_target_generateV8InspectorProtocolBackendSources.intermediate
make[1]: *** [node] Error 2
make: *** [node-v7.0.0-darwin-x64.tar] Error 2

The text was updated successfully, but these errors were encountered:

MylesBorins · 2016-07-08T15:17:54Z

/cc @nodejs/build regarding our OSX testing infra

mscdex · 2016-07-08T15:21:09Z

I'm not familiar with clang, so I'm not entirely sure which version we should be looking at, but the last version number on the failing machine shows 3.2 which is less than 3.4. shrug

zbjornson · 2016-07-08T15:26:20Z

Urg. I tested 3.4 because that was the oldest version listed in https://github.com/nodejs/node/blob/master/BUILDING.md. If 3.2 needs to be supported, then we can detect builtin presence and fallback to one of the alternative definitions.

gibfahn · 2016-07-08T15:47:03Z

@mscdex @zbjornson I had assumed that the clang version was the 425.0.28 (i.e. 4.2), but if it is the 3.2 then our clang level might be out of date.

I think this clang level is the default one for OSX 10.8, which is theoretically supported by node.

zbjornson · 2016-07-08T16:54:49Z

Apple versions clang kinda unusually. Their 4.2 is based on LLVM 3.2, and their 5.1 is based on LLVM 3.4.

I can work on a patch today, unless the answer is to require people who build to update xcode to a version with LLVM 3.4+ support.

zbjornson · 2016-07-10T03:22:00Z

@gibfahn when you run ./configure do you get the message "WARNING: C++ compiler too old, need g++ 4.8 or clang++ 3.4 (CXX=clang++)"? I don't have a mac to play with, and I'm facing problems unrelated to 4014ecb trying to get LLVM clang 3.2 to compile node on ubuntu.

bnoordhuis · 2016-07-10T07:47:56Z

@zbjornson No, no warning. OS X 10.8 with Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn).

For anyone who wants to try it, you have to build with make CXX.host=c++ now because apple's g++ driver doesn't cut it, it doesn't understand the -std=gnu++0x switch.

bnoordhuis · 2016-07-10T09:00:13Z

Opening gambit: #7644

zbjornson · 2016-07-10T20:01:49Z

Contra gambit: #7645 :)

Said builtins are not supported by older versions of apple-gcc, breaking the build on OS X 10.8. Fixes: nodejs#7618 Refs: nodejs#4290 Refs: nodejs#7157

mhdawson · 2016-08-22T16:56:11Z

Any update on this ?

zbjornson · 2016-08-22T17:08:23Z

I have a small change (addressing my own comment) that I seem to have forgotten to push and that's sitting on a computer that I can't get to 'till tonight. Not sure what @bnoordhuis wants to do, but I'd propose moving forward with #7645 because it benchmarks faster at least on Windows.

Will push that change and post the updated benchmarks tonight.

bnoordhuis · 2016-08-29T16:12:59Z

I'm not hugely invested in this issue so I don't really care which way it goes. The thing I like about my own PR is that it reduces line count and removes the intrinsics (something I've never been a fan of) but if it's that much slower with VS, then I guess we should go with Zach's PR.

zbjornson · 2016-09-19T21:25:03Z

(Really sorry about the delay.)

I'm also not highly invested in this.

#7645 currently benchmarks faster on windows in 5 of 6 cases (slower for aligned 16-bit; faster for other combos of size/alignment) and I think would be okay to merge. If anyone wants, I could work to make the final case to match Ben's on perf. I haven't had time to benchmark on linux recently.

If we go with Ben's, I do have an npm package for fast byte swapping that would satisfy anyone else who needs as-fast-as-possible swapping. Thus, I don't really care which moves forward.

Edit Looked at linux benchmarks. Even with -march=native gcc will only use movbe element-wise and won't vectorize (whereas msvc does). Vectorizing it by hand works nicely, but (a) requires more preprocessor macros and builtins, (b) as discussed won't help with the default build config, and (c) because of (a) means it's harder to get test coverage on. So, I don't think it's a great idea to pursue that, but can if someone thinks it's the right path. (I'm putting the manual vectorization for GCC into my package though.)

gibfahn · 2016-09-21T08:20:37Z

So to summarise, the two options are #7644 (@bnoordhuis ), which makes the code cleaner/more maintainable but reduces performance on windows, or #7645 (@zbjornson ), which increases performance on Windows but keeps the intrinsics.

@zbjornson Do you have a rough estimate of how significant the speedup will be? Is it only windows which is currently slower on #7644? As VS 2013 has been dropped on master, the windows performance loss only affects us if it happens on VS 2015, so it'd be useful to have some specific numbers. If you've written a benchmark you could point me to, I'd be happy to try it with VS 2015.

It'd be great to have master building on Clang 3.2 again either way!

EDIT: Looking at #7645 (comment), it sounds like #7644 also includes the performance boost from #7157, so we'll get @zbjornson's performance improvements either way!

zbjornson · 2016-09-21T17:58:54Z

Mostly correct, except for the edit:

src: don't use __builtin_bswap16() and friends #7644 and src: fix build failure for clang 3.2; consolidate byte swapping code; fix buffer writes for unaligned ucs2 strings #7645 are roughly equivalent on linux (and presumably Mac OSX but I haven't tested) with GCC and the default build params.
src: fix build failure for clang 3.2; consolidate byte swapping code; fix buffer writes for unaligned ucs2 strings #7645 is approx. 10x faster on Windows with VS2015 (and VS2013) for 5 of the 6 combinations of alignment and element size. It is ~50% slower for aligned 16-bit types.

Less important:

src: fix build failure for clang 3.2; consolidate byte swapping code; fix buffer writes for unaligned ucs2 strings #7645 even with -march=native doesn't get much faster with GCC (up to a snapshot of GCC 7) because it only emits movbe and not [v]pshufb. MSVC emits pshufb with the default build params. GCC emits bswap (32/64) or ror (16).
Vectorizing the loop by hand is effective for GCC and clang (emit vpshufb), but requires another intrinsic that is only available when non-default compiler flags are used. I'm taking this approach in my bswap lib, but it seems pointless given that it would be rarely engaged and hard to maintain in node.js core.

If you'd like to play with benchmarks, they're in https://github.com/nodejs/node/blob/master/benchmark/buffers/buffer-swap.js. You probably want to just pick one len (a high value like 8192 so it uses the C++ impl) and drop the n, otherwise it's dreadfully slow to run them all.

Given the above, I think it would make sense to use the intrinsics on Windows only and remove them from the rest, which will keep perf where it's easy and commonly available, and cut the line count down. I can try to get to this tonight.

Edit example godbolt: https://godbolt.org/g/4rvmAa

gibfahn · 2016-09-21T22:32:20Z

@zbjornson You're right, the edit was unclear. I meant that the performance improvements from #7157 were in both #7644 and #7645 except for windows as mentioned earlier.

It would seem to make a lot of sense to only keep the intrinsics in for windows if that's where they make a big performance difference. Is there a way to get to at least the previous performance for aligned 16-bit types? The speedup for the other types might make it worth it even if not.

#7645 even with -march=native doesn't get much faster with GCC (up to a snapshot of GCC 7) because it only emits movbe and not [v]pshufb. MSVC emits pshufb with the default build params. GCC emits bswap (32/64) or ror (16).

Could you explain a little more here? #7645 doesn't get much faster that what? Are you saying that #7645 doesn't get much faster than #7644 (i.e. using the builtins doesn't give a large boost without using non-default compiler flags)?

So I guess your updated PR would be halfway between #7645 and #7644? That makes sense to me (as long as @bnoordhuis approves).

zbjornson · 2016-09-22T00:17:14Z

Is there a way to get to at least the previous performance for aligned 16-bit types?

Sorry, to be clear: #7645 does not have a perf regression for aligned 16-bit types on Windows, if that's what you mean. Rather, Ben's PR gives an improvement to this case.

Could you explain a little more here?

I meant, even if someone were to compile node with -march=native to allow SSE/AVX, GCC emits MOVBE (which is fast for one element) for the bswap builtin, but it does not vectorize the loop, so you don't see huge perf gains with or without the builtin. MSVC vectorizes and ends up a lot faster.

Here's what the hand-vectorized code looks like:
https://github.com/zbjornson/node-bswap/blob/master/src/bswap.cc#L13-L56
and here's how it benchmarks on win and linux:
https://github.com/zbjornson/node-bswap#benchmarks
The node values are equivalent to what we would get from #7645. The bswap.native column is what it could be if we went all-out and risked breaking non-x86 platforms (~30% faster on Windows and 2 to 7x faster on Linux).

Working on the halfway PR now that I've figured that stuff out. :)

zbjornson · 2016-09-22T02:48:03Z

OK, #7645 is now modified to use builtins for Windows only and vanilla C++ for the rest.

Benchmarks faster than or comparably to #7644 in all cases on Windows, and aligned cases on Linux:

I could keep fiddling with this to improve unaligned perf on Linux, but given that unaligned is less common than aligned, I'd also be happy with this as-is.

Removes use of builtins that are unavailable for older clang. Per benchmarks, only uses builtins on Windows, where speedup is significant. Fixes: nodejs#7618

Removes use of builtins that are unavailable for older clang. Per benchmarks, only uses builtins on Windows, where speedup is significant. Also adds test for unaligned ucs2 buffer write. Between #3410 and #7645, bytes were swapped twice on bigendian platforms if buffer was not two-byte aligned. See comment in #7645. PR-URL: #7645 Fixes: #7618 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

Removes use of builtins that are unavailable for older clang. Per benchmarks, only uses builtins on Windows, where speedup is significant. Also adds test for unaligned ucs2 buffer write. Between #3410 and #7645, bytes were swapped twice on bigendian platforms if buffer was not two-byte aligned. See comment in #7645. PR-URL: #7645 Fixes: #7618 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Conflicts: src/node_buffer.cc

gibfahn changed the title ~~osx, build: Node v6.3.0 won't build on OSX 10.8 Darwin 12.5.0~~ osx, build: Node v6.3.0 won't build on OSX 10.8 Darwin 12.5.0 Clang 4.2 Jul 8, 2016

mscdex added build Issues and PRs related to build files or the CI. macos Issues and PRs related to the macOS platform / OSX. labels Jul 8, 2016

bnoordhuis mentioned this issue Jul 10, 2016

src: don't use __builtin_bswap16() and friends #7644

Closed

gibfahn changed the title ~~osx, build: Node v6.3.0 won't build on OSX 10.8 Darwin 12.5.0 Clang 4.2~~ osx, build: Node v6.3.0 won't build on Clang 3.2 (OSX 10.8 Darwin 12.5.0) Jul 10, 2016

zbjornson mentioned this issue Jul 10, 2016

src: fix build failure for clang 3.2; consolidate byte swapping code; fix buffer writes for unaligned ucs2 strings #7645

Closed

bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue Jul 11, 2016

src: don't use __builtin_bswap16() and friends

cd020e3

Said builtins are not supported by older versions of apple-gcc, breaking the build on OS X 10.8. Fixes: nodejs#7618 Refs: nodejs#4290 Refs: nodejs#7157

zbjornson added a commit to zbjornson/node that referenced this issue Sep 30, 2016

src: fix build for older clang

62ba6c8

Removes use of builtins that are unavailable for older clang. Per benchmarks, only uses builtins on Windows, where speedup is significant. Fixes: nodejs#7618

gibfahn closed this as completed in 7420835 Oct 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

osx, build: Node v6.3.0 won't build on Clang 3.2 (OSX 10.8 Darwin 12.5.0) #7618

osx, build: Node v6.3.0 won't build on Clang 3.2 (OSX 10.8 Darwin 12.5.0) #7618

gibfahn commented Jul 8, 2016

MylesBorins commented Jul 8, 2016

mscdex commented Jul 8, 2016 •

edited

Loading

zbjornson commented Jul 8, 2016

gibfahn commented Jul 8, 2016

zbjornson commented Jul 8, 2016

zbjornson commented Jul 10, 2016 •

edited

Loading

bnoordhuis commented Jul 10, 2016

bnoordhuis commented Jul 10, 2016

zbjornson commented Jul 10, 2016

mhdawson commented Aug 22, 2016

zbjornson commented Aug 22, 2016 •

edited

Loading

bnoordhuis commented Aug 29, 2016

zbjornson commented Sep 19, 2016 •

edited

Loading

gibfahn commented Sep 21, 2016 •

edited

Loading

zbjornson commented Sep 21, 2016 •

edited

Loading

gibfahn commented Sep 21, 2016

zbjornson commented Sep 22, 2016 •

edited

Loading

zbjornson commented Sep 22, 2016 •

edited

Loading

osx, build: Node v6.3.0 won't build on Clang 3.2 (OSX 10.8 Darwin 12.5.0) #7618

osx, build: Node v6.3.0 won't build on Clang 3.2 (OSX 10.8 Darwin 12.5.0) #7618

Comments

gibfahn commented Jul 8, 2016

Failing machine:

Passing machine:

Error:

MylesBorins commented Jul 8, 2016

mscdex commented Jul 8, 2016 • edited Loading

zbjornson commented Jul 8, 2016

gibfahn commented Jul 8, 2016

zbjornson commented Jul 8, 2016

zbjornson commented Jul 10, 2016 • edited Loading

bnoordhuis commented Jul 10, 2016

bnoordhuis commented Jul 10, 2016

zbjornson commented Jul 10, 2016

mhdawson commented Aug 22, 2016

zbjornson commented Aug 22, 2016 • edited Loading

bnoordhuis commented Aug 29, 2016

zbjornson commented Sep 19, 2016 • edited Loading

gibfahn commented Sep 21, 2016 • edited Loading

zbjornson commented Sep 21, 2016 • edited Loading

gibfahn commented Sep 21, 2016

zbjornson commented Sep 22, 2016 • edited Loading

zbjornson commented Sep 22, 2016 • edited Loading

mscdex commented Jul 8, 2016 •

edited

Loading

zbjornson commented Jul 10, 2016 •

edited

Loading

zbjornson commented Aug 22, 2016 •

edited

Loading

zbjornson commented Sep 19, 2016 •

edited

Loading

gibfahn commented Sep 21, 2016 •

edited

Loading

zbjornson commented Sep 21, 2016 •

edited

Loading

zbjornson commented Sep 22, 2016 •

edited

Loading

zbjornson commented Sep 22, 2016 •

edited

Loading