Use in-cluster config instead of kubeconfig when running NNI from within a Kubernetes container. #3719

rmfan · 2021-06-02T17:00:07Z

Problem this pull request fixes:

When currently using NNI to manage a Kubernetes object (e.g. for an AdaptDL job), NNI will load the Kubernetes config via the kubeconfig file config.fromKubeconfig(). However, inside a Kubernetes container, the kubeconfig file will generally not exist, and NNI will raise an exception.

Solution:

Before creating the kubernetes client config, determine if the code is running within a Kubernetes container. If it is, load the in-cluster config (via config.getInCluster()). Otherwise, load the kubeconfig as before.

Some notes on implementation:

config.getInCluster() and config.fromKubeconfig() are both depricated (see this document)
Currently I am using the presence of the environment variable KUBERNETES_SERVICE_HOST to determine if the code is running inside a kubernetes container. This is an established way of doing so (at least according to stack overflow), however it is a bit hacky and can obviously produce false positives. Other methods include checking for certain files -- which has the same set of issues, but as far as I know there isn't a better way to do this.
I am unsure if NNI has any test infrastructure that would support testing these code changes properly (notably, the code needs to run from within a Kubernetes cluster).

ghost · 2021-06-02T17:00:20Z

All CLA requirements met.

SparkSnail · 2021-06-03T09:47:11Z

Problem this pull request fixes:

When currently using NNI to manage a Kubernetes object (e.g. for an AdaptDL job), NNI will load the Kubernetes config via the kubeconfig file config.fromKubeconfig(). However, inside a Kubernetes container, the kubeconfig file will generally not exist, and NNI will raise an exception.

Solution:

Before creating the kubernetes client config, determine if the code is running within a Kubernetes container. If it is, load the in-cluster config (via config.getInCluster()). Otherwise, load the kubeconfig as before.

Some notes on implementation:

config.getInCluster() and config.fromKubeconfig() are both depricated (see this document)

Currently I am using the presence of the environment variable KUBERNETES_SERVICE_HOST to determine if the code is running inside a kubernetes container. This is an established way of doing so (at least according to stack overflow), however it is a bit hacky and can obviously produce false positives. Other methods include checking for certain files -- which has the same set of issues, but as far as I know there isn't a better way to do this.

I am unsure if NNI has any test infrastructure that would support testing these code changes properly (notably, the code needs to run from within a Kubernetes cluster).

Why will the code run in kubernetes container? Do you mean you start nni experiment in a container?

rmfan · 2021-06-03T16:34:36Z

Why will the code run in kubernetes container? Do you mean you start nni experiment in a container?

Yea, this is a little bit of an unusual usecase. Here, nnictl is being run from within a Kubernetes container.

Richard Fan and others added 6 commits May 27, 2021 01:36

use incluster config when env var KUBERNETES_SERVICE_HOST is set

3875362

fix syntax

e1742cf

fix syntax

5276d4d

adds incluster config to crd client

b0eb3a3

factor out common code

54c4131

Merge branch 'microsoft:master' into kube-incluster

188c9ea

adds any type to getKubernetesConfig helper function

c55ef8f

QuanluZhang requested review from SparkSnail and liuzhe-lz June 3, 2021 01:41

liuzhe-lz approved these changes Jun 3, 2021

View reviewed changes

SparkSnail approved these changes Jun 4, 2021

View reviewed changes

ultmaster merged commit e82731f into microsoft:master Jun 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use in-cluster config instead of kubeconfig when running NNI from within a Kubernetes container. #3719

Use in-cluster config instead of kubeconfig when running NNI from within a Kubernetes container. #3719

rmfan commented Jun 2, 2021

ghost commented Jun 2, 2021 •

edited by ghost

Loading

SparkSnail commented Jun 3, 2021 •

edited

Loading

Problem this pull request fixes:

Solution:

Some notes on implementation:

rmfan commented Jun 3, 2021

Use in-cluster config instead of kubeconfig when running NNI from within a Kubernetes container. #3719

Use in-cluster config instead of kubeconfig when running NNI from within a Kubernetes container. #3719

Conversation

rmfan commented Jun 2, 2021

Problem this pull request fixes:

Solution:

Some notes on implementation:

ghost commented Jun 2, 2021 • edited by ghost Loading

SparkSnail commented Jun 3, 2021 • edited Loading

Problem this pull request fixes:

Solution:

Some notes on implementation:

rmfan commented Jun 3, 2021

ghost commented Jun 2, 2021 •

edited by ghost

Loading

SparkSnail commented Jun 3, 2021 •

edited

Loading