Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication and service account plan for Pipeline + Kubeflow #374

Closed
IronPan opened this issue Nov 26, 2018 · 5 comments
Closed

Authentication and service account plan for Pipeline + Kubeflow #374

IronPan opened this issue Nov 26, 2018 · 5 comments

Comments

@IronPan
Copy link
Member

IronPan commented Nov 26, 2018

Currently in Pipeline instruction, we deploy the cluster with cloud-platform scope. Without any configuration, the pipeline and any derived argo jobs will run under the default service account (default Compute Engine service account). This unrestricted setup was OK in the past for development and testing propose. But as project goes public, I think it might want to change the service account setup into a managed way.

Kubeflow uses different service accounts for different roles.

  • kf admin: managing the kubeflow cluster. E.g. networking, kf deployment config
  • kf user: have access to various GCP APIs such as GCS and BQ, for various ML jobs.

For ML pipeline, the argo job, as well as any derived workload such as tf-job, should ideally run under kf-user service account in order to access to GCP API. To achieve this here are some todo items needed from various pipeline points.

  1. Pipeline System need to mount the kf-user service account key to each pod in the argo job and set up GOOGLE_APPLICATION_CREDENTIALS environment variable. Kubeflow stores the service account key as K8s secrete in the cluster.
apiVersion: v1
kind: Pod
metadata:
  generateName: gcloud
  namespace: kubeflow
spec:
  containers:
  - image: google/cloud-sdk
    name: gcloud
    command: [sleep]
    args: ["10000000"]
    env:
    - name: GOOGLE_APPLICATION_CREDENTIALS
      value: "/etc/secrets/user-gcp-sa.json"      
    volumeMounts:
    - name: sa
      mountPath: "/etc/secrets"
      readOnly: true
  volumes:
  - name: sa
    secret:
      secretName: user-gcp-sa
  1. The service account need to be activated inside the pod using gcloud auth activate-service-account, before GCP API call. Please see here as an example.
  2. For tf-job samples, the scheduler need to do the same and mount the service account to the tf-job. Otherwise tf-job can't write output to GCS. E.g. here
  3. For other places that we create pod that reads data from GCS, we need to do the same. E.g tensorboard
  4. Kaniko build which is used by notebook to build images also need to be part of this effort, since it touches cloud registry.
@IronPan
Copy link
Member Author

IronPan commented Nov 26, 2018

Update-
Tensorboard fixed #273
Kaniko fixed #343
DSL supports volume ane env APIs now #300

TODO:
Update current released sample to use the strongly typed GCP op #314

@jlewi
Copy link
Contributor

jlewi commented Dec 3, 2018

@IronPan what is the remaining work here?

@IronPan
Copy link
Member Author

IronPan commented Dec 3, 2018

@Ark-kun How is the work to migrate the samples to use the gcp credential?

@jlewi
Copy link
Contributor

jlewi commented Dec 17, 2018

@Ark-kun @IronPan Any update on this issue?

@IronPan
Copy link
Member Author

IronPan commented Dec 31, 2018

The samples are updated to have the right permission now.

@IronPan IronPan closed this as completed Dec 31, 2018
HumairAK pushed a commit to red-hat-data-services/data-science-pipelines that referenced this issue Mar 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants