-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What are permissions needed by provider in GCP / ability to use default application credentials #16
Comments
The provider needs to be able to create VMs in your GCP account. For testing purposes, I used In terms of authentication to the API, we should be able to use whatever the golang SDK permits. We went with service account keys for now but other methods can be added. Do you plan to run GARM in GCP? I believe there is something similar to "managed identity" in Azure or "IAM roles" in EC2. Need to look it up. |
Ahh, so FindDefaultCredentials should do the trick. I will test it out and push a PR when it's done. Might take a while, though. |
ok, so it only creates vms, no instance templates, node pools etc necessary I assume, thanks!
yes it would be great if garm could use default application credentials first and fallback to sa json file. Thanks for having a look at this.
yes, atm I'm evaluating and trying to setup locally with ngrok, but ultimately I will either start garm on Cloud Run, or maybe give a go the GKE provider and run it in k8s (we use workfload identity there)
Thank you! Quick question, am I right that instance is spun on job requested webhook and should be terminated on job completed? (so runners are one time use and do you pass |
Yes. GARM only even spins up ephemeral runners. Persistent runners are not advised unless you absolutely trust the authors of a PR. Even then they should be treated adversarial as their systems may be compromised without their knowledge. So we prefer to keep our tin foil hats on for this one and only support ephemeral runners. You have the option to keep a "warmed up" pool of runners by setting |
Perfect, we prefer to use ephemeral runners, as a security and cost-cutting measure. |
In my experience, if you have jobs that take a long time to run, an extra 5 minutes for the runner to spin up doesn't make much of a difference. It only kind of makes sense to have a warm pool with idle runners if you have jobs that run quickly, or if you have no extra cost when keeping the runner online, like if you'd be using the LXD/Incus providers or the k8s provider. In those cases, the LXD/Incus server is already running and incurring cost and so is the k8s cluster. In which case, you can potentially keep your min-idle-runners equal to your Have a look at the using garm guide. Keep in mind that the version in |
We will definitely test. |
Just merged the default credentials PR. Give it a shot. It worked well in my tests, after I created a VM and gave it access to a service account. |
To use default credentials, leave the environment_variables = ["GOOGLE_APPLICATION_CREDENTIALS", "SOME_OTHER_VAR_YOU_WANT_TO_PASS"] If you're running on GCP, you don't need to pass any variables. The VM will get its creds from metadata. I updated the README. Give that a look. |
Thanks, atm I'm still starting this up the old way. Once I get it working I'm definitely going to try default credentials. Is the provider invoked by garm manager on-a-request basis, meaning it could technically be a python script that calls Asking, since before giving permissions to service account for provider, I have started GARM with the gcp provider and was getting some errors that make me think provider is kept running all the time?
|
@gabriel-samfira I'm getting errors also after adding permissions (used It looks like provider tries to create the instance (which does not work as I do not see any in cloud console), but I do not see any errors in GARM output related to creating an instance:
Is immediately folllowed by:
It looks to like sth goes wrong when adding instance (which I cannot see from logs), then provider tries to get info about instance, gets 404, then tries to clean up and fails (as it does not exist) It might be worth reporting a separate issue, happy to do that. |
The error you're getting is a bug. It should not die like that because of a 404. I have a PR here: #19 which I will merge as soon as the tests finish.
The provider is just an executable that gets exec-ed with some environment variable set and in the case of See: https://github.com/cloudbase/garm/blob/main/doc/external_provider.md It might be a bit outdated and sparse on info. But essentially, the provider can be anything as long as it's an executable, it respects the external provider interface and you point GARM to the executable. It can be bash, python, etc. Doesn't matter. The providers written in Go all use this common scaffolding: https://github.com/cloudbase/garm-provider-common/blob/d0fe67934a5bcb773503553555274080ba60a852/execution/execution.go#L150-L204 This is the interface that external providers need to implement: This is where GARM executes the provider: |
when GARM fails to create an instance, the |
try building the latest main branch. I suspect that the real error is masked by the nil pointer bug. After you rebuild main, if it fails again, do a: garm-cli runner show <runner name> If the runner is in error state, you should see the provider error there. |
Yeah, I figured as much. I'm trying to find out why it failed to create an instance, I cannot see anything useful in the logs of GARM after that line
From logs it looks like it fails to create an instance, because it cannot find it afterwards:
I'm using latest main, I can see the runner in list and pending, the error in
|
My configs:
and provider:
I really appreciate your help, I really like the idea behind GARM and would love to make it work. |
Let's try the service account way: Create a new service account: gcloud iam service-accounts create garm-vm Grant the needed roles: gcloud projects add-iam-policy-binding prj-redacted \
--member="serviceAccount:garm-vm@prj-redacted.iam.gserviceaccount.com" \
--role=roles/compute.instanceAdmin.v1
gcloud projects add-iam-policy-binding prj-redacted \
--member="serviceAccount:garm-vm@prj-redacted.iam.gserviceaccount.com" \
--role=roles/iam.serviceAccountUser
gcloud projects add-iam-policy-binding prj-redacted \
--member="serviceAccount:garm-vm@prj-redacted.iam.gserviceaccount.com" \
--role=roles/iam.serviceAccountTokenCreator
gcloud iam service-accounts add-iam-policy-binding garm-vm@prj-redacted.iam.gserviceaccount.com \
--member="user:yourGCPUser@example.com" \
--role=roles/iam.serviceAccountUser Create a VM in GCP using this role: gcloud compute instances create garm-vm \
--service-account=garm-vm@prj-redacted.iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=ubuntu-pro-2404-noble-amd64-v20240607 \
--image-project=ubuntu-os-pro-cloud \
--zone=europe-west1-c \
--machine-type=e2-small That VM should now have access to the project you're using and the GCP provider should work with a pool like: garm-cli pool add --repo REPO_ID \
--enabled true \
--provider-name=gcp \
--flavor=e2-small \
--image=projects/debian-cloud/global/images/debian-11-bullseye-v20240110 \
--min-idle-runners 1 --tags gcp,linux |
Apropos, if it's easier, you can also find me on slack. |
gah. Found it. The error was indeed masked in create instance as well. Really sorry about the head ache. |
See: #20 |
I might do that as well, I think I found what can be going wrong.
Instance cannot be created with external IP address, apparently we have a policy that restricts this. I assume public IP is necessary for github to be able to access the runners. I will look into sorting out the constraint, but the bigger question is why GARM logs did not show this in the logs, while trying to create the actual instance. |
You can set GARM itself needs to be accessible by github and the runners, so either a public IP or ngrok will work. |
@gabriel-samfira thanks for the above fix.It helped me uncover some other permission issues when creating the instance
I have allowed public ip (at least for now), as I suspect I might need some debugging access. But are registered as offline and do not pick up jobs: eg:
I have followed the quick start and my pool looks like this:
All runners show up as pending:
And do not show any errors via cli:
I will try ssh-ing into runner to see if I can peek into github client logs, but tips would be welcome |
The runner you see in Github are offline due to the fact that GARM uses JIT runners. This means that GARM creates them in GitHub beforehand and saves the credentials for them. Those credentials are transfered to the instances that become the actual runners. In most cases, the fact that instances never transition from |
Thanks for your work on GARM. I'm setting GARM up with the intention to use it with GCP compute runners.
The provider documentation is missing information what permissions (or roles) are necessary for the provider to work.
Also, currently it states that service account json key is needed, can we use default application credentials instead or Workload Identity Federation?
Our organisation policy disallows creating service account keys.
The text was updated successfully, but these errors were encountered: