-
Notifications
You must be signed in to change notification settings - Fork 29.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for including GNU Gold linker's section ordering #16891
Conversation
Where would be a good location to store the documentation to re-generate the function ordering ? |
Is there a script or something that generates |
yes !! @addaleax I wanted to add this documentation Collect the perf profile for the workload:
Use perf script command to decode the collected profile
Use nm to dump the binary's symbol information
Run the hfsort program to determine the function ordering for the node binary ideal for this workload.
This application will create 2 files hotfuncs.txt and result-hfsort.txt hotfuncs.txt is what is used here. hfsort is one way to generate it. I am not sure if I write a script, it would assume the user has hfsort utility available. |
@@ -678,7 +684,6 @@ def check_compiler(o): | |||
else: | |||
o['variables']['gas_version'] = get_gas_version(CC) | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you undo this change?
configure
Outdated
@@ -1062,6 +1067,28 @@ def configure_static(o): | |||
o['libraries'] += ['-static-libasan'] | |||
|
|||
|
|||
def configure_gold(o): | |||
try: | |||
proc = subprocess.Popen(shlex.split('ld.gold') + ['-v'], stdin=subprocess.PIPE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to use shlex.split()
here since you're passing the static string 'ld.gold'
.
That said, it should arguably be overridable through a command line option or an environment variable, like CC and CXX are and result in -fuse-ld=/path/to/ld.gold
being passed so that the compiler picks up the right linker.
As well, this step should probably be skipped when cross-compiling.
configure_node()
currently contains the cross-compiling logic, that should probably be factored out somehow. For now, making cross_compiling
global is probably acceptable.
diff --git a/configure b/configure
index 95f103fbcb..04cbb91ea9 100755
--- a/configure
+++ b/configure
@@ -35,6 +35,9 @@ import subprocess
import shutil
import string
+# Set by configure_node().
+cross_compiling = False
+
# gcc and g++ as defaults matches what GYP's Makefile generator does,
# except on OS X.
CC = os.environ.get('CC', 'cc' if sys.platform == 'darwin' else 'gcc')
@@ -837,6 +840,7 @@ def configure_node(o):
o['variables']['target_arch'] = target_arch
o['variables']['node_byteorder'] = sys.byteorder
+ global cross_compiling
cross_compiling = (options.cross_compiling
if options.cross_compiling is not None
else target_arch != host_arch)
@@ -1402,7 +1406,7 @@ if (options.dest_os):
flavor_params['flavor'] = options.dest_os
flavor = GetFlavor(flavor_params)
-configure_node(output)
+configure_node(output) # Sets `cross_compiling` as a side effect.
configure_library('zlib', output)
configure_library('http_parser', output)
configure_library('libuv', output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks ! hadn't thought about cross compiling
configure
Outdated
o['variables']['goldl_function_reorder'] = options.section_reordering | ||
else: | ||
o['variables']['goldl_function_reorder'] = os.path.realpath('tools/gold_linker_section_reordering.txt') | ||
if(flavor in ('linux', 'freebsd', 'openbsd')): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Superfluous parens and can you make sure lines are <= 80 columns?
Also, why is this restricted to Linux and two BSDs? Isn't the presence of the gold linker enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would make sense to keep this check if cross_compiling
is defined ?
ld.gold
is only supported on these platforms and available by default when bin.utils
package is installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a ld.gold
for the target, then logically it supports function reordering, doesn't it?
I've been thinking it over but I can't think of a good reason to restrict it.
configure
Outdated
proc = subprocess.Popen(shlex.split('ld.gold') + ['-v'], stdin=subprocess.PIPE, | ||
stderr=subprocess.PIPE, stdout=subprocess.PIPE) | ||
match = re.match(r"(GNU gold) .* ([0-9]\.[0-9]+)", | ||
proc.communicate()[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the arguments line up?
configure
Outdated
if match: | ||
gold_version = match.group(2) | ||
if gold_version > '1.1': | ||
o['variables']['gold_linker'] = 'true' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use o['variables']
if the settings aren't used in the *.gyp files. Just use normal variables.
o['ldflags'] += ["-fuse-ld=gold -Wl,--section-ordering-file=" + o['variables']['goldl_function_reorder']] | ||
else: | ||
return 0 | ||
except OSError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The try/except block should just be around the Popen call.
configure
Outdated
o['variables']['goldl_function_reorder'] = os.path.realpath('tools/gold_linker_section_reordering.txt') | ||
if(flavor in ('linux', 'freebsd', 'openbsd')): | ||
o['cflags'] += ["-fuse-ld=gold -ffunction-sections"] | ||
o['ldflags'] += ["-fuse-ld=gold -Wl,--section-ordering-file=" + o['variables']['goldl_function_reorder']] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use single quotes?
I don't think this actually works right now. You are adding single arguments to cflags and ldflags instead of two separate ones. It should probably look like this:
o['cflags'] += ['-fuse-ld=gold', '-ffunction-sections']
o['ldflags'] += ['-fuse-ld=gold', '-Wl,--section-ordering-file=...']
But that aside, hacking the flags straight into cflags and ldflags is somewhat horrible. That should really be done in common.gypi:
diff --git a/common.gypi b/common.gypi
index d152c81498..bc73591391 100644
--- a/common.gypi
+++ b/common.gypi
@@ -42,6 +42,8 @@
# Don't use ICU data file (icudtl.dat) from V8, we use our own.
'icu_use_data_file_flag%': 0,
+ 'gold_section_ordering_file%': '', # export this from configure
+
'conditions': [
['GENERATOR=="ninja"', {
'OBJ_DIR': '<(PRODUCT_DIR)/obj',
@@ -431,7 +433,14 @@
'ldflags': [
'-Wl,--export-dynamic',
],
- }]
+ }],
+ ['gold_section_ordering_file!=""', {
+ 'cflags': [ '-fuse-ld=gold', '-ffunction-sections' ],
+ 'ldflags': [
+ '-fuse-ld=gold',
+ '-Wl,--section-ordering-file=<(gold_section_ordering_file)',
+ ],
+ }],
],
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-I../../common -fuse-ld=gold -ffunction-sections -pthread -Wall -Wextra -Wno-unused-parameter -m64 -fno-strict-aliasing -m64 -fdata-sections -O3 -fno-omit-frame-pointer -fno-rtti -fno-exceptions -std=gnu++0x -MMD -MF
This is what it shows up as on the Make output. But I think separating it with commas would work better.
I agree with, it's horrible to hack straight into ldflags and cflags.
I just couldn't figure out a way to export the value of gold_section_ordering_file%
into common.gypi which is a GYP include file.
I got an error saying symbol gold_section_ordering_file
is not found..
I will try with the
+ 'gold_section_ordering_file%': '', # export this from configure
+
I updated the branch, added a doc/md to this.. |
At least this failure on alpine seems like it could very well be related to this PR. New CI since there were definitely unrelated failures in here as well: https://ci.nodejs.org/job/node-test-commit/14517/ |
@addaleax not sure why it says openssl-cli is unsupported, I built it on ubuntu 14.10.. alpine is Ubuntu 12 |
@sathvikl Sorry, missed the ping here and the old CI run is no longer accessible. :/
|
@sathvikl would you be so kind and please rebase? :-) |
@BridgeAR sure, will do. My dev system is messed up, but I will find a way.. |
merge upstream nodejs/node on Jan 30th
5eb45c9
to
1882886
Compare
can you please start another CI run with this rebased branch. |
…ng-file GNU Gold linker will be auto-detected on Linux systems. If available the pre-generated section ordering file will be used while linking the node binary. This will help improve performance through reduction of iTLB misses as the most frequently used functions in the runtime will be packed together. Add documentation to generate the section reorderding file Add an option to override this configure option if LD environment variable is defined.
It seems like there are some failures in the CI that might be related. @sathvikl would you be so kind and have a look? |
@BridgeAR Sathvik has moved onto another project but I'll take a look at the failures and try to address it. |
@uttampawar thanks a lot for the heads up and for taking over. |
Ping @nodejs/build |
This adds a |
(By the way, that last comment is not a blocking objection to this. That file can be moved at a later date.) |
They're all using binutils-2.28-r3, binutils-libs-2.28-r3 and binutils-gold-2.28-r3. Could it be the musl version? That's incremented with each one, with a jump from musl-1.1.16-r14 to musl-1.1.18-r3 with Alpine 3.6 to 3.7. We could clear this up by just ditching the old Alpine machines. I'm going to propose a pruning to the Build WG to get rid of unsupported versions of these short-lived distros (Fedora and Alpine) simply to reduce complexity and our maintenance burden. That would leave just Apline 3.7 for now. It would be nice to understand the problem here tho. I've asked for a rerun of the arm build here: https://ci.nodejs.org/job/node-test-commit-arm/14320/ cause the error in @BridgeAR's build looks like the problem I just cleaned up where one of those machines was using gcc 4.6 instead of 4.9. |
I am not sure if there where any changes to fix this from the build side, so I thought it might make sense to just start a new CI and see what happens: |
@nodejs/build this is sadly still failing on Apline |
I'm not sure what we're supposed to do with this - people are building Node on Alpine 3.6 still and that's what we're running in alpine-last-latest-x64, we'll drop that from CI when the next major Alpine is released and maybe we'll be past that but unless there's something we can do in the container to make this work (and perhaps document that in BUILDING.md if we're expecting users to make this work), or, something in |
One way of dealing with this might be to exclude this feature from Alpine builds. @sathvikl would that be possible? |
@rvagg the current version is 3.7 |
@nodejs/build I guess it would be best to just exclude this feature from Alpine builds. Is that something feasible? |
Can someone please paste here what the exact build error is on alpine? The links to CI are no longer valid and not very helpful. |
@rvagg @BridgeAR I think current stable Alpine is 3.8.0 (https://wiki.alpinelinux.org/wiki/Alpine_Linux:Releases)? It also has binutils 2.30 (https://git.alpinelinux.org/cgit/aports/log/?h=3.8-stable&qt=grep&q=binutils, release was at https://git.alpinelinux.org/cgit/aports/commit/?h=3.8-stable&id=2484b3eda99f681c7de0866b438f63cdcc31b5da) and musl 1.1.19. I think it's worth trying to build it on newest Alpine (Though I'm not sure if we support it yet, do we have a machine with it?) @sathvikl would you be willing to continue working on this? It needs a rebase on master (and possibly some way of excluding this from Alpine build but that's for later). |
Oh, I missed the Alpine 3.8 release. I'll get that into CI this week and retire 3.6. Thanks for the heads-up @lundibundi. |
Alpine 3.8 got added to CI and 3.6 was removed. Unfortunately we're now getting 3.8 failures on a couple of cluster tests #22308 |
ping @sathvikl. |
Unfortunately I think it might be time to close this out given that the original author has not been back in over 10 months and no one else has taken this up. If you would like to continue in this effort, do feel free to reopen the issue. |
Adds support for using a section ordering file with the gold linker. This makes it possible to reorder functions in a build to optimize for a specific workload. `hfsort` is a tool that can be used to generate such a file from perf- recorded last branch record (LBR) data by running Node.js as `node --perf-basic-prof`. Refs: https://github.com/facebook/hhvm/tree/9966d482c19c6120c621c6f3896525fb19fb3842/hphp/tools/hfsort Refs: https://software.intel.com/content/www/us/en/develop/articles/runtime-optimization-blueprint-IA-optimization-with-last-branch-record.html Refs: nodejs#16891 Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
Adds support for using a section ordering file with the gold linker. This makes it possible to reorder functions in a build to optimize for a specific workload. `hfsort` is a tool that can be used to generate such a file from perf- recorded last branch record (LBR) data by running Node.js as `node --perf-basic-prof`. Refs: https://github.com/facebook/hhvm/tree/9966d482c19c6120c621c6f3896525fb19fb3842/hphp/tools/hfsort Refs: https://software.intel.com/content/www/us/en/develop/articles/runtime-optimization-blueprint-IA-optimization-with-last-branch-record.html Refs: #16891 Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com> PR-URL: #35272 Reviewed-By: Christian Clauss <cclauss@me.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Adds support for using a section ordering file with the gold linker. This makes it possible to reorder functions in a build to optimize for a specific workload. `hfsort` is a tool that can be used to generate such a file from perf- recorded last branch record (LBR) data by running Node.js as `node --perf-basic-prof`. Refs: https://github.com/facebook/hhvm/tree/9966d482c19c6120c621c6f3896525fb19fb3842/hphp/tools/hfsort Refs: https://software.intel.com/content/www/us/en/develop/articles/runtime-optimization-blueprint-IA-optimization-with-last-branch-record.html Refs: #16891 Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com> PR-URL: #35272 Reviewed-By: Christian Clauss <cclauss@me.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Adds support for using a section ordering file with the gold linker. This makes it possible to reorder functions in a build to optimize for a specific workload. `hfsort` is a tool that can be used to generate such a file from perf- recorded last branch record (LBR) data by running Node.js as `node --perf-basic-prof`. Refs: https://github.com/facebook/hhvm/tree/9966d482c19c6120c621c6f3896525fb19fb3842/hphp/tools/hfsort Refs: https://software.intel.com/content/www/us/en/develop/articles/runtime-optimization-blueprint-IA-optimization-with-last-branch-record.html Refs: nodejs#16891 Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com> PR-URL: nodejs#35272 Reviewed-By: Christian Clauss <cclauss@me.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com>
GNU Gold linker will be auto-detected on Linux systems.
If available the pre-generated section ordering file will be used while
linking the node binary.
The most frequently used functions in the node+deps runtime will be packed
together as the gold linker will re-order the text sections based on the input.
This will help improve performance of production workloads through reduction of iTLB misses.
The benchmark used for training is nodejs-ghost-bench
https://github.com/sathvikl/ghostjs-benchmark
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passesAffected core subsystem(s)
Build