Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubernetesExecutor does not work with exec authentication on kubernetes client >= 22.6.0 #22560

Closed
2 tasks done
dszakallas opened this issue Mar 27, 2022 · 7 comments
Closed
2 tasks done
Labels
area:core kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues

Comments

@dszakallas
Copy link

dszakallas commented Mar 27, 2022

Apache Airflow version

2.2.4 (latest released)

What happened

When using KubernetesExecutor with in_cluster = False in conjunction with a Config with an exec user using kubelogin, something wrong happens and authentication does not take place:

hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py", line 747, in _adopt_completed_pods
hack-scheduler-1  |     pod_list = kube_client.list_namespaced_pod(namespace=self.kube_config.kube_namespace, **kwargs)
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 15697, in list_namespaced_pod
hack-scheduler-1  |     return self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api/core_v1_api.py", line 15812, in list_namespaced_pod_with_http_info
hack-scheduler-1  |     return self.api_client.call_api(
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 348, in call_api
hack-scheduler-1  |     return self.__call_api(resource_path, method,
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 180, in __call_api
hack-scheduler-1  |     response_data = self.request(
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/api_client.py", line 373, in request
hack-scheduler-1  |     return self.rest_client.GET(url,
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/rest.py", line 240, in GET
hack-scheduler-1  |     return self.request("GET", url,
hack-scheduler-1  |   File "/airflow/lib/python3.8/site-packages/kubernetes/client/rest.py", line 234, in request
hack-scheduler-1  |     raise ApiException(http_resp=r)
hack-scheduler-1  | kubernetes.client.exceptions.ApiException: (401)
hack-scheduler-1  | Reason: Unauthorized
hack-scheduler-1  | HTTP response headers: HTTPHeaderDict({'Audit-Id': '9d09a92f-d294-4a82-9aac-bbafe9573469', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Sun, 27 Mar 2022 21:16:39 GMT', 'Content-Length': '129'})

I managed to trace back to the source of the error to RefreshConfiguration, and create a workaround.

cfg = RefreshConfiguration()
load_kube_config(client_configuration=cfg, config_file=config_file, context=cluster_context)

Bypassing the RefreshConfiguration by changing the above two lines to:

config.load_kube_config(context=cluster_context, config_file=configfile)
cfg = None

resolves the problem. I am still debugging what exactly the problem is with RefreshConfiguration and kubelogin.

Factoids:

What you think should happen instead

Authentication should work without a problem.

How to reproduce

It's hard to reproduce given the specificity of the problem.

  1. Create a service principal and assign permissions to be able to create resources on the AKS cluster.
  2. Install kubelogin
  3. Create a Config file that uses the kubelogin exec authentication flow with service principal authentication with the correct values filled in. See docs
  4. Confirm it works by running
    from kubernetes import client, config
    config.load_kube_config()
    print(client.CoreV1Api().list_namespaced_pod('default'))
  5. Try out with Airflow and get lots of 401 errors.

Operating System

Debian

Versions of Apache Airflow Providers

No response

Deployment

Docker-Compose

Deployment details

Proof of concept deployment with Docker compose for local development purposes using KubernetesExecutor to schedule worker pods in an AKS cluster.

Anything else

This issue happens every time.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@dszakallas dszakallas added area:core kind:bug This is a clearly a bug labels Mar 27, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Mar 27, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@dszakallas
Copy link
Author

dszakallas commented Mar 27, 2022

I did a bit more of debugging, it seems that the error is not specific to kubelogin and a much more general problem. I am renaming the ticket as it has little to do with kubelogin and Azure.

I found this in the logs:

{{refresh_config.py:73}} ERROR - __init__() missing 1 required positional argument: 'cwd'

I am using the following version of kubernetes client:

kubernetes                               23.3.0

Here's the relevant diff in the upstream:

kubernetes-client/python@1c5bf58

So it seems like this API breaks in v22.6.0.

Changing

status = ExecProvider(self._user['exec']).run()

to

            status = ExecProvider(self._user['exec'], os.getcwd()).run()

resolves the issue.

@dszakallas dszakallas changed the title KubernetesExecutor does not work with kubelogin exec authentication KubernetesExecutor does not work with exec authentication on kubernetes >= 22.6.0 Mar 27, 2022
@dszakallas dszakallas changed the title KubernetesExecutor does not work with exec authentication on kubernetes >= 22.6.0 KubernetesExecutor does not work with exec authentication on kubernetes client >= 22.6.0 Mar 27, 2022
@eladkal
Copy link
Contributor

eladkal commented Mar 28, 2022

@dszakallas can you open PR with the suggested fix?

@potiuk
Copy link
Member

potiuk commented Mar 28, 2022

That might be another one (@ephraimbuddy @dstandish) that might make 2.2.5 needing an RC2
There is this huge bump of K8S client version from 2.2.4 (11.0.0 -> 23.3.0) so I think we should be extra careful and possibly do some more testing to prevent regressions there, otherwise we might need to release 2.2.6 reallly quickly.

@potiuk potiuk added the provider:cncf-kubernetes Kubernetes provider related issues label Mar 28, 2022
@jedcunningham
Copy link
Member

@dszakallas, not sure what version of the kubernetes provider you have, but try with this combo:

kubernetes==11.0.0
apache-airflow-providers-cncf-kubernetes==3.1.1

Basically, 2.2.4 only works with 11.0.0, and provider 3.1.2 only works with k8s >=21.7.0. The "common ground" is the above versions.

@jedcunningham
Copy link
Member

I'll also add, you should be using constraints, which gets you on versions all tested together:
https://github.com/apache/airflow/blob/constraints-2.2.4/constraints-3.7.txt
https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#constraints-files

That would have gotten you 11.0.0 and 3.0.2 which would work together as well.

@dstandish
Copy link
Contributor

closing for the following reasons

this issue should be resolved by using constraints file as suggested by @jedcunningham

And from airflow 2.3.0, we remove anyway the “refresh configuration” code that is the problem here.

potiuk added a commit to potiuk/airflow that referenced this issue Mar 29, 2022
Kubernetes and Celery are both providers and part of the core.
The dependencies for both are added via "extras" which makes them
"soft" limits and in case of serious dependency bumps this might
end up with a mess (as we experienced with bumping min K8S
library version from 11.0.0 to 22.* (resulting in yanking 4
versions of `cncf.kubernetes` provider.

After this learning, we approach K8S and Celery dependencies a bit
differently than any other dependencies.

* for Celery and K8S (and Dask but this is rather an afterhought)
  we do not strip-off the dependencies from the extra (so for
  example [cncf.kubernetes] extra will have dependencies on
  both 'apache-airflow-providers-cncf-kubernetes' as well as
  directly on kubernetes library

* We add upper-bound limits for both Celery and Kubernetes to prevent
  from accidental upgrades. Both Celery and Kubernetes Python library
  follow SemVer, and they are crucial components of Airlfow so they
  both squarely fit our "do not upper-bound" exceptions.

* We also add a rule that whenever dependency upper-bound limit is
  raised, we should also make sure that additional testing is done
  and appropriate `apache-airflow` lower-bound limit is added for
  the `apache-airflow-providers-cncf-kubernetes` and
  `apache-airflow-providers-celery` providers.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

Related to: apache#22560, apache#21727
potiuk added a commit to potiuk/airflow that referenced this issue Mar 29, 2022
Kubernetes and Celery are both providers and part of the core.
The dependencies for both are added via "extras" which makes them
"soft" limits and in case of serious dependency bumps this might
end up with a mess (as we experienced with bumping min K8S
library version from 11.0.0 to 22.* (resulting in yanking 4
versions of `cncf.kubernetes` provider.

After this learning, we approach K8S and Celery dependencies a bit
differently than any other dependencies.

* for Celery and K8S (and Dask but this is rather an afterhought)
  we do not strip-off the dependencies from the extra (so for
  example [cncf.kubernetes] extra will have dependencies on
  both 'apache-airflow-providers-cncf-kubernetes' as well as
  directly on kubernetes library

* We add upper-bound limits for both Celery and Kubernetes to prevent
  from accidental upgrades. Both Celery and Kubernetes Python library
  follow SemVer, and they are crucial components of Airlfow so they
  both squarely fit our "do not upper-bound" exceptions.

* We also add a rule that whenever dependency upper-bound limit is
  raised, we should also make sure that additional testing is done
  and appropriate `apache-airflow` lower-bound limit is added for
  the `apache-airflow-providers-cncf-kubernetes` and
  `apache-airflow-providers-celery` providers.

As part of this change we also had to fix two issues:
* the image was needlesly rebuilt during constraint generation as
  we already have the image and we even warn that it should
  be built before we run constraint generation

* after this change, the currently released, unyanked cncf.kubernetes
  provider cannot be installed with airflow, because it has
  conflicting requirements for kubernetes library (provider has
  <11 and airflow has > 22.7). Therefore during constraint
  generation with PyPI providers we install providers from PyPI, we
  explicitly install the yanked 3.1.2 version. This should be
  removed after we release the next K8S provider version.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

Related to: apache#22560, apache#21727
potiuk added a commit that referenced this issue Mar 29, 2022
Kubernetes and Celery are both providers and part of the core.
The dependencies for both are added via "extras" which makes them
"soft" limits and in case of serious dependency bumps this might
end up with a mess (as we experienced with bumping min K8S
library version from 11.0.0 to 22.* (resulting in yanking 4
versions of `cncf.kubernetes` provider.

After this learning, we approach K8S and Celery dependencies a bit
differently than any other dependencies.

* for Celery and K8S (and Dask but this is rather an afterhought)
  we do not strip-off the dependencies from the extra (so for
  example [cncf.kubernetes] extra will have dependencies on
  both 'apache-airflow-providers-cncf-kubernetes' as well as
  directly on kubernetes library

* We add upper-bound limits for both Celery and Kubernetes to prevent
  from accidental upgrades. Both Celery and Kubernetes Python library
  follow SemVer, and they are crucial components of Airlfow so they
  both squarely fit our "do not upper-bound" exceptions.

* We also add a rule that whenever dependency upper-bound limit is
  raised, we should also make sure that additional testing is done
  and appropriate `apache-airflow` lower-bound limit is added for
  the `apache-airflow-providers-cncf-kubernetes` and
  `apache-airflow-providers-celery` providers.

As part of this change we also had to fix two issues:
* the image was needlesly rebuilt during constraint generation as
  we already have the image and we even warn that it should
  be built before we run constraint generation

* after this change, the currently released, unyanked cncf.kubernetes
  provider cannot be installed with airflow, because it has
  conflicting requirements for kubernetes library (provider has
  <11 and airflow has > 22.7). Therefore during constraint
  generation with PyPI providers we install providers from PyPI, we
  explicitly install the yanked 3.1.2 version. This should be
  removed after we release the next K8S provider version.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

Related to: #22560, #21727
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

5 participants