Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for including GNU Gold linker's section ordering #16891

Closed
wants to merge 2 commits into from

Conversation

sathvikl
Copy link

@sathvikl sathvikl commented Nov 8, 2017

GNU Gold linker will be auto-detected on Linux systems.
If available the pre-generated section ordering file will be used while
linking the node binary.
The most frequently used functions in the node+deps runtime will be packed
together as the gold linker will re-order the text sections based on the input.
This will help improve performance of production workloads through reduction of iTLB misses.

The benchmark used for training is nodejs-ghost-bench
https://github.com/sathvikl/ghostjs-benchmark

Checklist
  • [ x ] make -j4 test (UNIX), or vcbuild test (Windows) passes
  • [ x ] tests and/or benchmarks are included
  • documentation is changed or added
  • [x ] commit message follows commit guidelines
Affected core subsystem(s)

Build

@nodejs-github-bot nodejs-github-bot added build Issues and PRs related to build files or the CI. tools Issues and PRs related to the tools directory. labels Nov 8, 2017
@sathvikl
Copy link
Author

sathvikl commented Nov 8, 2017

Where would be a good location to store the documentation to re-generate the function ordering ?
Discussion on this PR is here #16131

@addaleax
Copy link
Member

addaleax commented Nov 8, 2017

Is there a script or something that generates tools/gold_linker_section_reordering.txt? It seems like something we’d want to regenerate every now and then

@sathvikl
Copy link
Author

sathvikl commented Nov 8, 2017

yes !! @addaleax

I wanted to add this documentation
Once the workload has reached a steady state/ post warm-up phase

Collect the perf profile for the workload:

 perf record -ag -e cycles:pp  -o <perf_profile_cycles_pp> --  sleep 30 

:pp offers a more precise dist of cycles accounting

Use perf script command to decode the collected profile

  perf script -i perf_profile_inst -f comm,ip -c node | gzip -c > perf_profile_decoded.pds.gz

Use nm to dump the binary's symbol information

    nm -S node > node.nm

Run the hfsort program to determine the function ordering for the node binary ideal for this workload.
https://github.com/facebook/hhvm/tree/master/hphp/tools/hfsort

      Usage: hfsort [-p] [-e <EDGCNT_FILE>] <SYMBOL_FILE> <PERF_DATA_FILE>

       -p,   use pettis-hansen algorithm for code layout  (optional)

       -e    <EDGCNT_FILE>, use edge profile result to build the call graph

    hfsort -p node.nm perf_profile_decoded.pds.gz

This application will create 2 files hotfuncs.txt and result-hfsort.txt

hotfuncs.txt is what is used here. hfsort is one way to generate it.

I am not sure if I write a script, it would assume the user has hfsort utility available.
Do you think a documentation write-up would be sufficient ?

@@ -678,7 +684,6 @@ def check_compiler(o):
else:
o['variables']['gas_version'] = get_gas_version(CC)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you undo this change?

configure Outdated
@@ -1062,6 +1067,28 @@ def configure_static(o):
o['libraries'] += ['-static-libasan']


def configure_gold(o):
try:
proc = subprocess.Popen(shlex.split('ld.gold') + ['-v'], stdin=subprocess.PIPE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to use shlex.split() here since you're passing the static string 'ld.gold'.

That said, it should arguably be overridable through a command line option or an environment variable, like CC and CXX are and result in -fuse-ld=/path/to/ld.gold being passed so that the compiler picks up the right linker.

As well, this step should probably be skipped when cross-compiling.

configure_node() currently contains the cross-compiling logic, that should probably be factored out somehow. For now, making cross_compiling global is probably acceptable.

diff --git a/configure b/configure
index 95f103fbcb..04cbb91ea9 100755
--- a/configure
+++ b/configure
@@ -35,6 +35,9 @@ import subprocess
 import shutil
 import string
 
+# Set by configure_node().
+cross_compiling = False
+
 # gcc and g++ as defaults matches what GYP's Makefile generator does,
 # except on OS X.
 CC = os.environ.get('CC', 'cc' if sys.platform == 'darwin' else 'gcc')
@@ -837,6 +840,7 @@ def configure_node(o):
   o['variables']['target_arch'] = target_arch
   o['variables']['node_byteorder'] = sys.byteorder
 
+  global cross_compiling
   cross_compiling = (options.cross_compiling
                      if options.cross_compiling is not None
                      else target_arch != host_arch)
@@ -1402,7 +1406,7 @@ if (options.dest_os):
   flavor_params['flavor'] = options.dest_os
 flavor = GetFlavor(flavor_params)
 
-configure_node(output)
+configure_node(output)  # Sets `cross_compiling` as a side effect.
 configure_library('zlib', output)
 configure_library('http_parser', output)
 configure_library('libuv', output)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks ! hadn't thought about cross compiling

configure Outdated
o['variables']['goldl_function_reorder'] = options.section_reordering
else:
o['variables']['goldl_function_reorder'] = os.path.realpath('tools/gold_linker_section_reordering.txt')
if(flavor in ('linux', 'freebsd', 'openbsd')):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Superfluous parens and can you make sure lines are <= 80 columns?

Also, why is this restricted to Linux and two BSDs? Isn't the presence of the gold linker enough?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make sense to keep this check if cross_compiling is defined ?
ld.gold is only supported on these platforms and available by default when bin.utils package is installed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a ld.gold for the target, then logically it supports function reordering, doesn't it?

I've been thinking it over but I can't think of a good reason to restrict it.

configure Outdated
proc = subprocess.Popen(shlex.split('ld.gold') + ['-v'], stdin=subprocess.PIPE,
stderr=subprocess.PIPE, stdout=subprocess.PIPE)
match = re.match(r"(GNU gold) .* ([0-9]\.[0-9]+)",
proc.communicate()[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make the arguments line up?

configure Outdated
if match:
gold_version = match.group(2)
if gold_version > '1.1':
o['variables']['gold_linker'] = 'true'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use o['variables'] if the settings aren't used in the *.gyp files. Just use normal variables.

o['ldflags'] += ["-fuse-ld=gold -Wl,--section-ordering-file=" + o['variables']['goldl_function_reorder']]
else:
return 0
except OSError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The try/except block should just be around the Popen call.

configure Outdated
o['variables']['goldl_function_reorder'] = os.path.realpath('tools/gold_linker_section_reordering.txt')
if(flavor in ('linux', 'freebsd', 'openbsd')):
o['cflags'] += ["-fuse-ld=gold -ffunction-sections"]
o['ldflags'] += ["-fuse-ld=gold -Wl,--section-ordering-file=" + o['variables']['goldl_function_reorder']]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use single quotes?

I don't think this actually works right now. You are adding single arguments to cflags and ldflags instead of two separate ones. It should probably look like this:

o['cflags'] += ['-fuse-ld=gold', '-ffunction-sections']
o['ldflags'] += ['-fuse-ld=gold', '-Wl,--section-ordering-file=...']

But that aside, hacking the flags straight into cflags and ldflags is somewhat horrible. That should really be done in common.gypi:

diff --git a/common.gypi b/common.gypi
index d152c81498..bc73591391 100644
--- a/common.gypi
+++ b/common.gypi
@@ -42,6 +42,8 @@
     # Don't use ICU data file (icudtl.dat) from V8, we use our own.
     'icu_use_data_file_flag%': 0,
 
+    'gold_section_ordering_file%': '',  # export this from configure
+
     'conditions': [
       ['GENERATOR=="ninja"', {
         'OBJ_DIR': '<(PRODUCT_DIR)/obj',
@@ -431,7 +433,14 @@
         'ldflags': [
           '-Wl,--export-dynamic',
         ],
-      }]
+      }],
+      ['gold_section_ordering_file!=""', {
+        'cflags': [ '-fuse-ld=gold', '-ffunction-sections' ],
+        'ldflags': [
+          '-fuse-ld=gold',
+          '-Wl,--section-ordering-file=<(gold_section_ordering_file)',
+        ],
+      }],
     ],
   }
 }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-I../../common -fuse-ld=gold -ffunction-sections -pthread -Wall -Wextra -Wno-unused-parameter -m64 -fno-strict-aliasing -m64 -fdata-sections -O3 -fno-omit-frame-pointer -fno-rtti -fno-exceptions -std=gnu++0x -MMD -MF

This is what it shows up as on the Make output. But I think separating it with commas would work better.

I agree with, it's horrible to hack straight into ldflags and cflags.
I just couldn't figure out a way to export the value of gold_section_ordering_file% into common.gypi which is a GYP include file.

I got an error saying symbol gold_section_ordering_file is not found..
I will try with the

+    'gold_section_ordering_file%': '',  # export this from configure
+

@sathvikl
Copy link
Author

I updated the branch, added a doc/md to this..

@addaleax addaleax added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Nov 30, 2017
@addaleax
Copy link
Member

@addaleax addaleax removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Dec 1, 2017
@addaleax
Copy link
Member

addaleax commented Dec 1, 2017

At least this failure on alpine seems like it could very well be related to this PR.

New CI since there were definitely unrelated failures in here as well: https://ci.nodejs.org/job/node-test-commit/14517/

@sathvikl
Copy link
Author

sathvikl commented Dec 2, 2017

@addaleax not sure why it says openssl-cli is unsupported, I built it on ubuntu 14.10.. alpine is Ubuntu 12
can you please point me to the console results of the new CI job, I clicked through and just found "failure".

@addaleax
Copy link
Member

addaleax commented Dec 27, 2017

@sathvikl Sorry, missed the ping here and the old CI run is no longer accessible. :/

CI: https://ci.nodejs.org/job/node-test-commit/15084/
CI without rebase: https://ci.nodejs.org/job/node-test-commit/15085/

@BridgeAR
Copy link
Member

@sathvikl would you be so kind and please rebase? :-)

@sathvikl
Copy link
Author

@BridgeAR sure, will do. My dev system is messed up, but I will find a way..

merge upstream nodejs/node on Jan 30th
@sathvikl
Copy link
Author

can you please start another CI run with this rebased branch.

…ng-file

GNU Gold linker will be auto-detected on Linux systems.
If available the pre-generated section ordering file will be used while
linking the node binary.
This will help improve performance through reduction of iTLB misses
as the most frequently used functions in the runtime will be packed
together.

Add documentation to generate the section reorderding file
Add an option to override this configure option if
LD environment variable is defined.
@BridgeAR
Copy link
Member

CI https://ci.nodejs.org/job/node-test-pull-request/12829/

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Jan 31, 2018
@BridgeAR
Copy link
Member

BridgeAR commented Feb 1, 2018

@BridgeAR
Copy link
Member

BridgeAR commented Feb 6, 2018

It seems like there are some failures in the CI that might be related. @sathvikl would you be so kind and have a look?

@uttampawar
Copy link
Contributor

@BridgeAR Sathvik has moved onto another project but I'll take a look at the failures and try to address it.

@BridgeAR
Copy link
Member

BridgeAR commented Feb 9, 2018

@uttampawar thanks a lot for the heads up and for taking over.

@uttampawar
Copy link
Contributor

@addaleax @sathvikl By adding a check for the binutils version would help address this issue?

@BridgeAR
Copy link
Member

BridgeAR commented Mar 2, 2018

Ping @nodejs/build

@Trott
Copy link
Member

Trott commented Mar 2, 2018

This adds a .md documentation file to the tools directory. There are no other .md files in that directory. Is there a better location for this particular file? @nodejs/documentation @nodejs/build

@Trott
Copy link
Member

Trott commented Mar 2, 2018

(By the way, that last comment is not a blocking objection to this. That file can be moved at a later date.)

@rvagg
Copy link
Member

rvagg commented Mar 4, 2018

They're all using binutils-2.28-r3, binutils-libs-2.28-r3 and binutils-gold-2.28-r3. Could it be the musl version? That's incremented with each one, with a jump from musl-1.1.16-r14 to musl-1.1.18-r3 with Alpine 3.6 to 3.7.

We could clear this up by just ditching the old Alpine machines. I'm going to propose a pruning to the Build WG to get rid of unsupported versions of these short-lived distros (Fedora and Alpine) simply to reduce complexity and our maintenance burden. That would leave just Apline 3.7 for now. It would be nice to understand the problem here tho.

I've asked for a rerun of the arm build here: https://ci.nodejs.org/job/node-test-commit-arm/14320/ cause the error in @BridgeAR's build looks like the problem I just cleaned up where one of those machines was using gcc 4.6 instead of 4.9.

@BridgeAR
Copy link
Member

I am not sure if there where any changes to fix this from the build side, so I thought it might make sense to just start a new CI and see what happens:

CI https://ci.nodejs.org/job/node-test-pull-request/14161/

@BridgeAR
Copy link
Member

@nodejs/build this is sadly still failing on Apline

@rvagg
Copy link
Member

rvagg commented Apr 16, 2018

I'm not sure what we're supposed to do with this - people are building Node on Alpine 3.6 still and that's what we're running in alpine-last-latest-x64, we'll drop that from CI when the next major Alpine is released and maybe we'll be past that but unless there's something we can do in the container to make this work (and perhaps document that in BUILDING.md if we're expecting users to make this work), or, something in configure we can do to opt out of this option where it's not supported.

@BridgeAR
Copy link
Member

One way of dealing with this might be to exclude this feature from Alpine builds. @sathvikl would that be possible?

@BridgeAR
Copy link
Member

@rvagg the current version is 3.7

@BridgeAR
Copy link
Member

@nodejs/build I guess it would be best to just exclude this feature from Alpine builds. Is that something feasible?

@ncopa
Copy link
Contributor

ncopa commented May 15, 2018

Can someone please paste here what the exact build error is on alpine? The links to CI are no longer valid and not very helpful.

@addaleax
Copy link
Member

@jasnell jasnell added the stalled Issues and PRs that are stalled. label Aug 12, 2018
@lundibundi
Copy link
Member

@rvagg @BridgeAR I think current stable Alpine is 3.8.0 (https://wiki.alpinelinux.org/wiki/Alpine_Linux:Releases)? It also has binutils 2.30 (https://git.alpinelinux.org/cgit/aports/log/?h=3.8-stable&qt=grep&q=binutils, release was at https://git.alpinelinux.org/cgit/aports/commit/?h=3.8-stable&id=2484b3eda99f681c7de0866b438f63cdcc31b5da) and musl 1.1.19. I think it's worth trying to build it on newest Alpine (Though I'm not sure if we support it yet, do we have a machine with it?)

@sathvikl would you be willing to continue working on this? It needs a rebase on master (and possibly some way of excluding this from Alpine build but that's for later).

@rvagg
Copy link
Member

rvagg commented Aug 13, 2018

Oh, I missed the Alpine 3.8 release. I'll get that into CI this week and retire 3.6. Thanks for the heads-up @lundibundi.

@rvagg
Copy link
Member

rvagg commented Aug 14, 2018

Alpine 3.8 got added to CI and 3.6 was removed. Unfortunately we're now getting 3.8 failures on a couple of cluster tests #22308

@lundibundi
Copy link
Member

ping @sathvikl.

@apapirovski
Copy link
Member

Unfortunately I think it might be time to close this out given that the original author has not been back in over 10 months and no one else has taken this up. If you would like to continue in this effort, do feel free to reopen the issue.

gabrielschulhof pushed a commit to gabrielschulhof/node that referenced this pull request Sep 23, 2020
Adds support for using a section ordering file with the gold linker.
This makes it possible to reorder functions in a build to optimize for
a specific workload.

`hfsort` is a tool that can be used to generate such a file from perf-
recorded last branch record (LBR) data by running Node.js as
`node --perf-basic-prof`.

Refs: https://github.com/facebook/hhvm/tree/9966d482c19c6120c621c6f3896525fb19fb3842/hphp/tools/hfsort
Refs: https://software.intel.com/content/www/us/en/develop/articles/runtime-optimization-blueprint-IA-optimization-with-last-branch-record.html
Refs: nodejs#16891
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
gabrielschulhof pushed a commit that referenced this pull request Sep 30, 2020
Adds support for using a section ordering file with the gold linker.
This makes it possible to reorder functions in a build to optimize for
a specific workload.

`hfsort` is a tool that can be used to generate such a file from perf-
recorded last branch record (LBR) data by running Node.js as
`node --perf-basic-prof`.

Refs: https://github.com/facebook/hhvm/tree/9966d482c19c6120c621c6f3896525fb19fb3842/hphp/tools/hfsort
Refs: https://software.intel.com/content/www/us/en/develop/articles/runtime-optimization-blueprint-IA-optimization-with-last-branch-record.html
Refs: #16891
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #35272
Reviewed-By: Christian Clauss <cclauss@me.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
danielleadams pushed a commit that referenced this pull request Oct 6, 2020
Adds support for using a section ordering file with the gold linker.
This makes it possible to reorder functions in a build to optimize for
a specific workload.

`hfsort` is a tool that can be used to generate such a file from perf-
recorded last branch record (LBR) data by running Node.js as
`node --perf-basic-prof`.

Refs: https://github.com/facebook/hhvm/tree/9966d482c19c6120c621c6f3896525fb19fb3842/hphp/tools/hfsort
Refs: https://software.intel.com/content/www/us/en/develop/articles/runtime-optimization-blueprint-IA-optimization-with-last-branch-record.html
Refs: #16891
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: #35272
Reviewed-By: Christian Clauss <cclauss@me.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
joesepi pushed a commit to joesepi/node that referenced this pull request Jan 8, 2021
Adds support for using a section ordering file with the gold linker.
This makes it possible to reorder functions in a build to optimize for
a specific workload.

`hfsort` is a tool that can be used to generate such a file from perf-
recorded last branch record (LBR) data by running Node.js as
`node --perf-basic-prof`.

Refs: https://github.com/facebook/hhvm/tree/9966d482c19c6120c621c6f3896525fb19fb3842/hphp/tools/hfsort
Refs: https://software.intel.com/content/www/us/en/develop/articles/runtime-optimization-blueprint-IA-optimization-with-last-branch-record.html
Refs: nodejs#16891
Signed-off-by: Gabriel Schulhof <gabriel.schulhof@intel.com>
PR-URL: nodejs#35272
Reviewed-By: Christian Clauss <cclauss@me.com>
Reviewed-By: Richard Lau <riclau@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. stalled Issues and PRs that are stalled. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.