-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark machine maintenance #2656
Comments
This also reproduces on my local machine (macOS Catalina) |
I assume that it's a similar issue to nodejs/node#36871. There's a timeout issue that's getting emitted on the request in IMO there's probably a real underlying issue in HTTP, and this is just putting a band-aid on the problem, but it would at least allow benchmarking again. |
I've done a big update and cleanup on both benchmark machines, including clearing out workspaces and temp files (although I have a bad feeling I might have been too liberal with my removals because these machines have some very specific workflows that may be putting things into unexpected places 🤞). They've been rebooted so let's see if they behave any differently now. We could upgrade to 18.04, but that might take input from maintainers of the benchmarking work - is a jump in OS likely to have any meaningful impact on benchmark numbers? Does it matter? |
I might be wrong, it seems the (only?) benchmark CI that is run on nodejs/node PRs is
I've spawn https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1030/, let's see how it performs. |
Options are in https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/configure if you have access to see what's going on. It does a |
Thanks for the info, the script I was looking for is this one: https://github.com/nodejs/benchmarking/blob/master/experimental/benchmarks/community-benchmark/run.sh. The repo is read-only, I've asked in nodejs/TSC#822 (comment) where we can move this script.
The job seems to be stuck again, it hasn't output anything in the last hour... |
|
I had a look on the machine and it looked like your job had indeed "stalled" by some definition - the load on the machine was effectively zero. While it was going I attempted to initiate another run from another user account, and it looks like that has caused an port conflict in your job which has caused it to end. This suggests that your job was, in fact, still progressing in some fashion, even though it wasn't visibily using much CPU or producing additional output so it's possible it would have eventually run to completion.
I've re-initiated your job as https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1031/console which is now running on the I would definitely be in favour of upgrading the machine to 20.04 in principle, although the two benchmarking machines are of a specific type so we'd need to be certain that they could be upgraded cleanly... |
@aduh95 Have you tried to run the benchmarks locally? Maybe something is broken and they cannot end. |
For some reason the benchmark involving http cannot even start on my machine, (probably some issue with my config), but I think @mcollina is able to run them on a personal server. |
It's the |
Just to clarify what I've stated earlier - at least locally, for me, I see similar issues with the http-server benchmark in the |
Only the |
Is there something I can do to help this move forward? I think upgrading to 20.04 would be a good first step, even if it doesn't solve the stalling issue. |
Also happy to help, if I can. #2681 is probably blocked on this. |
OK, so upgrading to 20.04 is relatively straightforward to do, but we need to make allowance for mess-ups. I'm happy to do the work but need to be told when's a good day to be doing it. My midday is usually downtime for everyone else (1pm for me now, 3am UTC), but is there a good day of the week for this to happen? Or is any day good enough for this machine? I don't really know what its usage pattern is these days or how critical it might be. |
@rvagg I think there are 2 benchmark machines at NearForm. https://ci.nodejs.org/label/benchmark-ubuntu1604-intel-64/. I'm thinking we could do one at a time to limit impact. Most of the jobs seems to run on the same -2 machine, likely because 1 machine can handle the typical load in terms of concurrent requests. |
@rvagg would it be possible for you to do the upgrade after the next v14.x release (scheduled on the 2021-07-27)? In case something wrong happen, it should give us more time to fix/rollback before the next release. |
sure, no problems, you might have to remind me though @aduh95, I'd already forgotten about this |
Is this issue why https://benchmarking.nodejs.org/ hasn't seen an update since Dec 2020? |
@dany-on-demand No, it isn't. https://benchmarking.nodejs.org/ hasn't been updated since the Benchmarking Working Group was wound down due to lack of participation (nodejs/TSC#822). |
I still have on my list to remove benchmarking.nodejs.org, will try to take another look this week. |
@rvagg now that the security release are out, would it be a good time for you to take care of the update? (No planned release until 2021-08-17 on nodejs/release repo) |
Can we try to get back to that? I think we can upgrade straight to Ubuntu 22.04 now. |
This issue is stale because it has been open many days with no activity. It will be closed soon unless the stale label is removed or a comment is made. |
I just took https://ci.nodejs.org/manage/computer/test-nearform_intel-ubuntu1804-x64-2/ offline and will try to upgrade it to Ubuntu 22.04. |
Here's what I did:
System is up and running, but there seems to be an issue with DNS resolution. I'm not sure what to do:
Edit: Fixed it with |
Trying node-test-commit-v8-linux |
Suspect that will fail because of #3206. |
Something aborted the build 🤔 |
Another one is running though: https://ci.nodejs.org/job/node-test-commit-v8-linux/nodes=benchmark-ubuntu2204-intel-64,v8test=v8test/5282/console |
That's from me adding a |
BTW, while monitoring the Ubuntu updates, the download speed seemed quite slow, and that is also the case with git clones in these runs. |
Doing the other machine now. |
Finished |
Is there something wrong with the benchmark machine? For some reason it keeps stalling/hanging or maybe the connection to Jenkins is silently being severed? I've tried restarting the job a couple times now. Normally the
async_hooks
benchmarks should only take about 50 minutes or less on a decent modern machine....Originally posted by @mscdex in nodejs/node#38785 (comment)
The above comment is referencing https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1027/, https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1028/, https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1029/, ... It seems that any benchmark using
http.createServer
gets disconnected.Also it seems the machine is using Ubuntu 16.04, maybe it's due to an update. Let me know if there's something I can do to help with that.
The text was updated successfully, but these errors were encountered: