Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

Use in-cluster config instead of kubeconfig when running NNI from within a Kubernetes container. #3719

Merged
merged 7 commits into from
Jun 4, 2021

Conversation

rmfan
Copy link
Contributor

@rmfan rmfan commented Jun 2, 2021

Problem this pull request fixes:

When currently using NNI to manage a Kubernetes object (e.g. for an AdaptDL job), NNI will load the Kubernetes config via the kubeconfig file config.fromKubeconfig(). However, inside a Kubernetes container, the kubeconfig file will generally not exist, and NNI will raise an exception.

Solution:

Before creating the kubernetes client config, determine if the code is running within a Kubernetes container. If it is, load the in-cluster config (via config.getInCluster()). Otherwise, load the kubeconfig as before.

Some notes on implementation:

  • config.getInCluster() and config.fromKubeconfig() are both depricated (see this document)
  • Currently I am using the presence of the environment variable KUBERNETES_SERVICE_HOST to determine if the code is running inside a kubernetes container. This is an established way of doing so (at least according to stack overflow), however it is a bit hacky and can obviously produce false positives. Other methods include checking for certain files -- which has the same set of issues, but as far as I know there isn't a better way to do this.
  • I am unsure if NNI has any test infrastructure that would support testing these code changes properly (notably, the code needs to run from within a Kubernetes cluster).

@ghost
Copy link

ghost commented Jun 2, 2021

CLA assistant check
All CLA requirements met.

@SparkSnail
Copy link
Contributor

SparkSnail commented Jun 3, 2021

Problem this pull request fixes:

When currently using NNI to manage a Kubernetes object (e.g. for an AdaptDL job), NNI will load the Kubernetes config via the kubeconfig file config.fromKubeconfig(). However, inside a Kubernetes container, the kubeconfig file will generally not exist, and NNI will raise an exception.

Solution:

Before creating the kubernetes client config, determine if the code is running within a Kubernetes container. If it is, load the in-cluster config (via config.getInCluster()). Otherwise, load the kubeconfig as before.

Some notes on implementation:

  • config.getInCluster() and config.fromKubeconfig() are both depricated (see this document)
  • Currently I am using the presence of the environment variable KUBERNETES_SERVICE_HOST to determine if the code is running inside a kubernetes container. This is an established way of doing so (at least according to stack overflow), however it is a bit hacky and can obviously produce false positives. Other methods include checking for certain files -- which has the same set of issues, but as far as I know there isn't a better way to do this.
  • I am unsure if NNI has any test infrastructure that would support testing these code changes properly (notably, the code needs to run from within a Kubernetes cluster).

Why will the code run in kubernetes container? Do you mean you start nni experiment in a container?

@rmfan
Copy link
Contributor Author

rmfan commented Jun 3, 2021

Why will the code run in kubernetes container? Do you mean you start nni experiment in a container?

Yea, this is a little bit of an unusual usecase. Here, nnictl is being run from within a Kubernetes container.

@ultmaster ultmaster merged commit e82731f into microsoft:master Jun 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants