-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hetzner Benchmarking Machine Replacements #3657
Comments
@mcollina is it ok for performance measurements to have a CPU with a mix of non-identical cores? |
It should be possible to schedule certain processes only on a subset of the cores with This should be part of the testing phase. |
These machines have been provisioned and added to Jenkins with the same labels/configs as the former nearform machines. Next steps: Can somebody with permssions can kick off a benchmarking jobs and some v8 builds to verify that all is working as intended? |
I've marked the two Nearform machines offline in Jenkins and started a V8 build which is running on test-hetzner-ubuntu2204-x64-1: |
This has failed: 02:47:26 + DEPOT_TOOLS_DIR=/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/_depot_tools
02:47:26 + PATH=/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/_depot_tools:/home/iojs/build/workspace/node-test-commit-v8-linux/depot_tools:/home/iojs/venv/bin:/home/iojs/nghttp2/src:/home/iojs/wrk:/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin tools/dev/v8gen.py x64.release --no-goma
02:47:26
02:47:26 Hint: You can raise verbosity (-vv) to see the output of failed commands.
02:47:26
02:47:26 Traceback (most recent call last):
02:47:26 File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/tools/dev/v8gen.py", line 309, in <module>
02:47:26 sys.exit(gen.main())
02:47:26 File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/tools/dev/v8gen.py", line 303, in main
02:47:26 return self._options.func()
02:47:26 File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/tools/dev/v8gen.py", line 162, in cmd_gen
02:47:26 self._call_cmd([
02:47:26 File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/tools/dev/v8gen.py", line 211, in _call_cmd
02:47:26 output = subprocess.check_output(
02:47:26 File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
02:47:26 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
02:47:26 File "/usr/lib/python3.10/subprocess.py", line 526, in run
02:47:26 raise CalledProcessError(retcode, process.args,
02:47:26 subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'tools/mb/mb.py', 'gen', '-f', 'infra/mb/mb_config.pyl', '-m', 'developer_default', '-b', 'x64.release', 'out.gn/x64.release']' returned non-zero exit status 1.
02:47:26 make: *** [Makefile:303: v8] Error 1 Logging into the machine and running the failing command with iojs@test-hetzner-ubuntu2204-x64-1:~/build/workspace/node-test-commit-v8-linux/deps/v8$ PATH=/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/_depot_tools:/home/iojs/build/workspace/node-test-commit-v8-linux/depot_tools:/home/iojs/venv/bin:/home/iojs/nghttp2/src:/home/iojs/wrk:/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin tools/dev/v8gen.py x64.release --no-goma -vv
################################################################################
/usr/bin/python3 -u tools/mb/mb.py gen -f infra/mb/mb_config.pyl -m developer_default -b x64.release out.gn/x64.release
Writing """\
dcheck_always_on = false
is_debug = false
target_cpu = "x64"
""" to /home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/out.gn/x64.release/args.gn.
/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/buildtools/linux64/gn gen out.gn/x64.release --check
-> returned 1
ERROR at //build/config/linux/pkg_config.gni:104:17: Script returned non-zero exit code.
pkgresult = exec_script(pkg_config_script, args, "json")
^----------
Current dir: /home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/out.gn/x64.release/
Command: python3 /home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/build/config/linux/pkg-config.py -s /home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/build/linux/debian_bullseye_amd64-sysroot -a x64 glib-2.0 gmodule-2.0 gobject-2.0 gthread-2.0
Returned 1.
stderr:
Traceback (most recent call last):
File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/build/config/linux/pkg-config.py", line 247, in <module>
sys.exit(main())
File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/build/config/linux/pkg-config.py", line 142, in main
prefix = GetPkgConfigPrefixToStrip(options, args)
File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/build/config/linux/pkg-config.py", line 80, in GetPkgConfigPrefixToStrip
prefix = subprocess.check_output([options.pkg_config,
File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'pkg-config'
See //build/config/linux/BUILD.gn:58:3: whence it was called.
pkg_config("glib") {
^-------------------
See //build/config/compiler/BUILD.gn:300:18: which caused the file to be included.
configs += [ "//build/config/linux:compiler" ]
^------------------------------
GN gen failed: 1
Traceback (most recent call last):
File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/tools/dev/v8gen.py", line 309, in <module>
sys.exit(gen.main())
File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/tools/dev/v8gen.py", line 303, in main
return self._options.func()
File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/tools/dev/v8gen.py", line 162, in cmd_gen
self._call_cmd([
File "/home/iojs/build/workspace/node-test-commit-v8-linux/deps/v8/tools/dev/v8gen.py", line 211, in _call_cmd
output = subprocess.check_output(
File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-u', 'tools/mb/mb.py', 'gen', '-f', 'infra/mb/mb_config.pyl', '-m', 'developer_default', '-b', 'x64.release', 'out.gn/x64.release']' returned non-zero exit status 1.
iojs@test-hetzner-ubuntu2204-x64-1:~/build/workspace/node-test-commit-v8-linux/deps/v8$ |
Looking at the other nearform benchmarking machines I see a considerable amount of manual package installations that are outside of ansible's setup. There's at least 220 packages that have been manually installed. Theres also the likelyhood that I was supposed to configure ansible differently to set up these machines beyond what I understood. I attempted to add the 'is_benchmark' = true to add the benchmark role as I see that pkg-config is installed that way, but some of the other packages in that role have been dropped from ubuntu and that role hasnt been updated for six years, so its probably got some stale packages in there. List of missing packages
Take a look at the above list and we should decide if we need to update ansible to include some of this setup or not. |
I suspect it is the
We should probably add
I'm less familiar with what is needed to run the benchmarks. |
Adding pkg-config this way has allowed the V8 CI to build and run tests.
Neither encountered the networking issues we had with the Nearform hosted benchmark machines (🎉). For the missing build/ansible/playbooks/jenkins/worker/create.yml Lines 74 to 77 in 51ad778
|
That intel tag was what was being used to target the Nearform intel donated machines. We should definitely change that to target the Hetzner ones now.
How shall we approach getting these machines into a stable usable state going forward? I can continue to adjust which packages are installed as part of the ansible setup, but I don't want to inadvertently step on or undo any work that anybody else is doing. (Though I also lack any background in what the jobs do/accomplish) |
Ran the benchmark job which fails - https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1498/ Looks like there may be a directory missing. That may have been created manually as these machines were set up a long time ago. |
@ryanaslett have you added the linux-perf role? That might be all that is needed to get the v8 jobs running as well as they were before on the machines. @richardlau is that your expectation? |
manually creating the directory /w owned by iojs and with group iojs has let the benchmark run get further |
@ryanaslett if you are updating the ansible scripts, is there a section which is specific to the benchmark machines that we can add the creation of the /w directory owned by iojs with group iojs? |
Job to see if perf job runs ok after adding the /w directory - https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1499/ |
Thats sort of what I was asking in #3657 (comment) - mostly who should be doing this. |
And also, whether we're trying to capture every change in ansible, or just doing some manual steps. (which we should still document) |
I went ahead and modified the jenkins worker config to target the hetzner machines, modified the benchmarking role to remove any packages that are not currently installed (mostly python 2 packages that are no longer on ubuntu) There is still the question of "all the rest of the packages". Seems like we can either
|
@ryanaslett I suspect that most of the missing packages will not e needed. Hoping @richardlau can confirm that for the V8 benchmarking part and the jobs I'm kicking off should help see if that is true for running the benchmarking job. The last one I kicked off failed because I used the same parameters as the last run, but that PR has landed since then and therefore there were conflicts. Kicked off this one to see if it passes on a fress PR - https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1500/ |
We should capture everything in ansible. The manual change to create the /w directory I made is just to see if that actually resolves the problem or not. We should capture it in ansbile for the benchmarking machines. Next time hopefully we can just run the ansible script and everything will work afterwards. |
@ryanaslett not necessarily something that needs to be resolved immediately, but are there any Hertzner machines where the CPUs are all the same? It seems we'd have to not use either the 4 Pcores or 8 ecores for the benchmark tests. I guess I'd leave it up to @mcollina, and @anonrig to comment on wether that number of cores will be ok or not. |
I was under the understanding that it would be worked around using taskset per: #3657 (comment) |
I believe @mcollina had a solution for that in #3657 (comment) with using taskset , but that's really the goal of this current exercise is to ensure that the servers we have will work as replacements for the servers we had before. |
I have wrangled both the Can somebody kick off another build to try it out? Im still lacking jenkins admin. |
@ryanaslett @mhdawson we'd need to change the job so that benchmarks are run using taskset. I have no idea on how to wire it to the benchmark jenkins jobs. |
From a V8 CI POV the CI is failing on the new machine, but I think those are known issues that were unfortunately semi-masked by #3050 (which isn't occurring on the Hetzner machines 🎉):
cc @nodejs/v8-update In other words, for the V8 CI, the new machines are in no worse state that the Nearform machines that are being replaced. |
I posted a script on how to download the right version of Linux source code to build perf, that'll probably make the perf test failures go away, hoping someone who knows how to convert it to ansible pick up the rest.. nodejs/node#50079 (comment) |
@mcollina the script that runs the benchmarking jobs is in - https://github.com/nodejs/benchmarking/blob/master/experimental/benchmarks/community-benchmark/run.sh. The person who wrote it is long gone so I think somebody from @nodejs/performance is going to need to figure out how to inject taskset if we want to solve the problem that way. My point was that even if we do that it means only a fraction of the machine can be used for the performance run. I was not sure if that made sense or not. |
from
Seems like one last thing is missing from the machines:
|
@ryanaslett it seems like the ansible scripts should be installing Rscript
- name: Install Rscript repo | {{ os }}
when: os|startswith("ubuntu")
shell: echo "deb https://ftp.heanet.ie/mirrors/cran.r-project.org/bin/linux/ubuntu {{ ansible_distribution_release }}-cran40/" > /etc/apt/sources.list.d/r.list
- name: Add R key
apt_key:
keyserver: keyserver.ubuntu.com
id: E084DAB9
- name: Update keys
shell: "apt-key update"
- name: Update packages
include_role:
name: package-upgrade
- name: Install Rscript packages
package:
name: "{{ package }}"
state: present
loop_control:
loop_var: package
with_items: Did you run ansible with the benchmarking role ? |
The benchmarking script run by the CI is https://github.com/nodejs/benchmarking/blob/master/experimental/benchmarks/community-benchmark/run.sh. |
Isn't it performance team responsibility? |
@rluvaton make a PR to the performance repo and add the script there. |
Added nodejs/performance#156 |
Hey, can you please update the benchmarking scripts location to be from here: |
Done. |
I left a comment in nodejs/performance#157 -- I think the script is currently limiting the benchmarks to a single CPU, which is probably not the intent 🙂. See nodejs/node#52233 (comment). Perhaps the script could parameterize the |
Done: nodejs/performance#158 |
@rluvaton btw we can use |
This comment was marked as outdated.
This comment was marked as outdated.
|
@rluvaton based on #3657 (comment), this does not seem to be working as quite a lot of results are NaN. |
This was before my use of 0-11 cpu, should it change anything? |
Not sure what's the problem but there was no analysis of the result. |
If this is still happening after the change of 0-11 CPU I will revert it... Currently the CI is locked |
as it is still not working, reverted @thisalihassan can you please take a look why all the NaN? |
@rluvaton sure also where can I see the logs? |
You have this example: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1514/consoleFull but you can run locally |
FWIW there's a difference in output, e.g. 21:42:04 "new","test_runner/suite-tests.js","concurrency='no' testType='async' testsPerSuite=1000 numberOfSuites=100",34812.26643390852,2.872550691 vs 13:39:14 test_runner/suite-tests.js concurrency="no" testType="async" testsPerSuite=1000 numberOfSuites=100: 29,112.48474928658 i.e. as if the |
@rluvaton nodejs/node#52456 "ipc" I untintentially removed this in the spawn stdio, that's why child.on message wasn't listening events in Fork IPC is established by default but in spawn it is not established by default hence listener wasn't working |
The procurement process has completed, and I have created two EX44's (https://www.hetzner.com/dedicated-rootserver/ex44/) at Hetzner.
If all goes well we should be able to have these online and running benchmark tests.
I believe the next steps are
The text was updated successfully, but these errors were encountered: