Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tools: build parallel-ssh config files #1891

Merged
merged 2 commits into from
Sep 4, 2019
Merged

tools: build parallel-ssh config files #1891

merged 2 commits into from
Sep 4, 2019

Conversation

sam-github
Copy link
Contributor

I'm exploring what of the nodejs/build inventory I can actually ssh into. I should have test and release access, but I don't have infra.

@rvagg I use parallel-ssh and hosts files pretty frequently, I suspect you do, too. I've been keeping my configs locally, but it might be worth sharing them.

  1. How do you deal with hosts that throttle ssh connections? I have been seeing that a bunch... p-ssh fails to connect, but it works if I try manually :-(. Very frustrating. I'm currently attempting a work-around with parallel-ssh -o pass -e fail -h ../release -h ../test -p 1 "echo CONNECTED && sleep 10" which is quite slow. It occurs to me that randomizing the input host order might help, too.
  2. should I replace all the {{ ... }} with the explicit path to my node test ssh key?
  3. should I be able to ssh into any msft or windows machines?

There are lots of machines I can't ssh into. I'm not sure if that's expected or not. I'll post a gist soon.

@sam-github
Copy link
Contributor Author

I'm going to try to hack the bad ansible lines and try again, but with a .ssh/config generated by #1890, this is the state:

How: rm -rf pass fail; parallel-ssh -o pass -e fail -h ../release -h ../test -p 1 "echo CONNECTED && sleep 10"; find pass fail -type f -size 0 | xargs rm -f

@sam-github
Copy link
Contributor Author

Redid with {{ ... }] fixed up.

I'd be prepared to shrug these off, except that last time I did that the arm builds failed the next day and I couldn't get in to look at them. In fairness, there is no way I could have fixed the systems, it turned out to not be a simple cleanup of /tmp.

@rvagg @mhdawson For the machines listed in https://gist.github.com/sam-github/c06aa2aa63473811ffafa8bb60e5374c that I can't access, should I care? If so, any suggestions?

For this PR... is it useful to anybody in @nodejs/build?

@rvagg
Copy link
Member

rvagg commented Aug 20, 2019

Yes, I love parallel-ssh, but I mainly use it locally and use Ansible, or manual ssh for stuff outside my network in the Node CI.

How do you deal with hosts that throttle ssh connections? I have been seeing that a bunch... p-ssh fails to connect, but it works if I try manually :-(. Very frustrating.

I fix the throttling itself, like on the jump host for the arm cluster I've raised the SSH connection limit to accommodate connecting to them all. I don't have any other workarounds, other than that, and changing the parallelism (-p) if that's a cause.

should I replace all the {{ ... }} with the explicit path to my node test ssh key?

Yes, that's a bug, if it's in your ssh config then it should be replaced. But note the name, some are release, some are infra and some are test. #1483

A couple of things I regularly find useful that may be worth documenting:

  • -i if you want to actually see the output (in your example you omit it so it's probably not going to do anything useful).
  • -t for timeout since the timeout is so short if you're doing anything more than a quick command (i use a -t 1200 for doing apt updating across the Pi's).
  • -p to increase parallelism where your connections can support it (default is pretty low, same with Ansible actually, where I use -f to speed things up).
  • -O UserKnownHostsFile=/dev/null -O StrictHostKeyChecking=no if you have host key matching problems that become annoying to solve .. although maybe that's not awesome advice.

@rvagg
Copy link
Member

rvagg commented Aug 20, 2019

re your ability to access:

fail/release-*

Without release key you won't be able to access these. We can consider upgrading your access to make this happen but we maintain this very tightly so we can offer a high level of release asset integrity guarantee. But of course we have to weight that with the cost of having so few people being able to access & maintain these machines!

fail/test-azure_msft-win*
fail/test-rackspace-win*

I don't think you should expect ssh access to these, at least I've never done so but maybe Joao has some magic enabled for it (since Ansible is supposed to work on them?). I always do these manually via Remote Desktop - you should set that up on your machine. My mac has all of these my config for the Microsoft Remote Desktop app so it's pretty easy to jump to a full desktop for them whenever I need to something on them.

fail/test-mininodes-ubuntu1604-arm64_odroid_c2-1
fail/test-mininodes-ubuntu1604-arm64_odroid_c2-2
fail/test-mininodes-ubuntu1604-arm64_odroid_c2-3

No, we need to remove these, we dropped them.

fail/test-rackspace-fedora26-x64-1

Borked, I've forced a restart and it's back up now.

fail/test-requireio-osx1010-x64-1

I turned this off this weekend, we've migrated to macstadium for release and test so I'm keen to save on the maintenance burden of this.

fail/test-requireio_rvagg-ubuntu1404-arm64_odroidxu-1
fail/test-requireio_rvagg-ubuntu1404-arm64_odroidxu-2
fail/test-requireio_rvagg-ubuntu1404-arm64_odroidxu3-1

Still exist but have been off for a long time, they are a bit of a maintenance burden due to power connector problems so they should be removed I think. Our Scaleway instances help with spreading our ARMv7 coverage and while more diversity of devices would be nice, it's a maintenance cost for unprovable benefit (i.e. we haven't seen many, or any, bugs about specific ARMv7 hardware that doesn't span all ARMv7 hardware).

Copy link
Member

@mhdawson mhdawson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sam-github sam-github merged commit c923d74 into nodejs:master Sep 4, 2019
@sam-github sam-github deleted the p-ssh-config branch September 4, 2019 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants