-
Notifications
You must be signed in to change notification settings - Fork 91
How to deploy and restart runners
Runners physical node configuration: Four nodes with 4C-8G, 100GB storage. Suggested system image: ubuntu-20.04-2nic
How to deploy the four nodes:
- Build the runner deployer
cd vHive/scripts/github_runner
go build .
- Modify the conf.json
Need to modify conf.json, the format is as following:
{
"ghOrg": "<GitHub account>",
"ghPat": "<GitHub PAT>",
"hostUsername": "<username>",
"runners": {
"<hostname-1>": {
"type": "cri",
"sandbox": "firecracker"
},
"<hostname-2>": {
"type": "cri",
"sandbox": "gvisor",
},
"<hostname-3>": {
"type": "integ",
"num": 2,
"restart": false
},
"<hostname-4>": {
"type": "profile"
}
}
}
Note that in conf.json
, for ghOrg
, it's vhive-serverless
, for ghPat
, it should be your own account's Personal Access Token, as long as your account has the correct permissions for vhive-serverless
org
<username>:<hostname-1/2/3/4>
is the ssh username and hostname, so if you use SCSE cloud nodes as runners, <hostname-1/2/3/4>
should be their ip addresses.
After modifying this, deploy the runners remotely by running:
./deploy_runners --loglvl=debug
If it gives out error like “dial unix: missing address”, use:
eval `ssh-agent`
ssh-add ~/.ssh/<private_key>
Here <private_key>
should be the key that has the ssh permission to all four runners, typically it's id_rsa
It is normal that this script doesn't success in one pass, simply re-run the deployment script after a while.
On SCSE cloud, rebuild the three nodes and redeploy them.
For firecracker and gvisor cri tests, when the test stuck in helloworld is waiting for a Revision to be ready
This basically implies that the firecracker and gvisor cri runners need to be restart(You can also restart only one runner in that case)
But if the firecracker and gvisor cri test passed the Setup vHive CRI test environment
step and failed in Run vHive CRI tests
step, this typically is just sporadic failure and can be resolved by re-running the tests, just trigger the re-run button on github webpage is okay.