Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in KubernetesPodOperator while fetching logs from kube API #21727

Closed
2 tasks done
chengzi0103 opened this issue Feb 22, 2022 · 8 comments
Closed
2 tasks done

Error in KubernetesPodOperator while fetching logs from kube API #21727

chengzi0103 opened this issue Feb 22, 2022 · 8 comments
Labels
area:core kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues

Comments

@chengzi0103
Copy link

chengzi0103 commented Feb 22, 2022

Apache Airflow version

2.2.3 (latest released)

What happened

airflow log error when running multiple k8s_pods

[2022-02-22, 07:36:47 UTC] {pod_manager.py:163} WARNING - Pod not yet started: ivol.78b6e382f2204fe3a8da9081cb468acb
[2022-02-22, 07:46:48 UTC] {pod_manager.py:205} WARNING - Failed to read logs for pod ivol.78b6e382f2204fe3a8da9081cb468acb
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 697, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 438, in _error_catcher
    yield
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 764, in read_chunked
    self._update_chunk_length()
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 701, in _update_chunk_length
    raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 201, in follow_logs
    for line in logs:
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 808, in __iter__
    for chunk in self.stream(decode_content=True):
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 572, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 793, in read_chunked
    self._original_response.close()
  File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.8/site-packages/urllib3/response.py", line 455, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
[2022-02-22, 07:46:48 UTC] {pod_manager.py:218} WARNING - Pod ivol.78b6e382f2204fe3a8da9081cb468acb log read interrupted but container base still running
[2022-02-22, 07:52:04 UTC] {pod_manager.py:218} WARNING - Pod ivol.78b6e382f2204fe3a8da9081cb468acb log read interrupted but container base still running
[2022-02-22, 07:52:05 UTC] {pod_manager.py:218} WARNING - Pod ivol.78b6e382f2204fe3a8da9081cb468acb log read interrupted but container base still running
[2022-02-22, 07:52:06 UTC] {kubernetes_pod.py:417} INFO - Deleting pod: ivol.78b6e382f2204fe3a8da9081cb468acb
[2022-02-22, 07:52:07 UTC] {taskinstance.py:1700} ERROR - Task failed with exception

What you expected to happen

how to fix it ?

How to reproduce

I have three machines
Machine A: run airflow webserver and scheduler
B and C machines: run celery worker
All operators run on the k8s cluster through k8s_config

Once I run multiple tasks, the program will automatically report the error log problem I don't know how to solve it

Operating System

Debian GNU/Linux 11

Versions of Apache Airflow Providers

alembic 1.7.5
amqp 5.0.9
anyio 3.5.0
apache-airflow 2.2.3
apache-airflow-providers-celery 2.1.0
apache-airflow-providers-cncf-kubernetes 3.0.2
apache-airflow-providers-docker 2.4.1
apache-airflow-providers-ftp 2.0.1
apache-airflow-providers-http 2.0.3
apache-airflow-providers-imap 2.2.0
apache-airflow-providers-sqlite 2.1.0
apispec 3.3.2
argcomplete 1.12.3
attrs 20.3.0
Babel 2.9.1
billiard 3.6.4.0
bleach 4.1.0
blinker 1.4
cachetools 5.0.0
cattrs 1.6.0
celery 5.2.2
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.12
click 8.0.4
click-didyoumean 0.3.0
click-plugins 1.1.1
click-repl 0.2.0
clickclick 20.10.2
colorama 0.4.4
colorlog 5.0.1
commonmark 0.9.1
coverage 6.3.2
croniter 1.0.15
cryptography 36.0.1
datacompy 0.7.3
defusedxml 0.7.1
dill 0.3.4
dnspython 2.2.0
docker 5.0.3
docutils 0.16
email-validator 1.1.3
Flask 1.1.2
Flask-AppBuilder 3.4.4
Flask-Babel 2.0.0
Flask-Caching 1.10.1
Flask-JWT-Extended 3.25.1
Flask-Login 0.4.1
Flask-OpenID 1.3.0
Flask-SQLAlchemy 2.5.1
Flask-WTF 0.14.3
flower 1.0.0
gevent 21.12.0
google-auth 2.6.0
graphviz 0.19.1
greenlet 1.1.2
gunicorn 20.1.0
h11 0.12.0
httpcore 0.13.7
httpx 0.19.0
humanize 4.0.0
idna 3.3
importlib-metadata 4.11.1
importlib-resources 5.4.0
inflection 0.5.1
iniconfig 1.1.1
iso8601 1.0.2
itsdangerous 1.1.0
jeepney 0.7.1
Jinja2 3.0.3
jsonschema 3.2.0
keyring 23.5.0
kombu 5.2.3
kubernetes 22.6.0
lazy-object-proxy 1.7.1
lockfile 0.12.2
Mako 1.1.6
Markdown 3.3.6
MarkupSafe 2.1.0
marshmallow 3.14.1
marshmallow-enum 1.5.1
marshmallow-oneofschema 3.0.1
marshmallow-sqlalchemy 0.26.1
numexpr 2.8.1
numpy 1.22.2
oauthlib 3.2.0
openapi-schema-validator 0.2.3
openapi-spec-validator 0.4.0
ordered-set 4.1.0
packaging 21.3
pandas 1.3.5
pendulum 2.1.2
pip 22.0.3
pkginfo 1.8.2
pluggy 1.0.0
prettytable 3.1.1
prison 0.2.1
prometheus-client 0.13.1
prompt-toolkit 3.0.28
psutil 5.9.0
psycopg2-binary 2.9.3
py 1.11.0
pyasn1 0.4.8
pyasn1-modules 0.2.8
pycparser 2.21
Pygments 2.11.2
PyJWT 1.7.1
pyparsing 3.0.7
pyrsistent 0.18.1
pytest 7.0.1
pytest-cov 3.0.0
python-daemon 2.3.0
python-dateutil 2.8.2
python-nvd3 0.15.0
python-slugify 4.0.1
python3-openid 3.2.0
pytz 2021.3
pytzdata 2020.1
PyYAML 6.0
readme-renderer 32.0
redis 3.5.3
requests 2.27.1
requests-oauthlib 1.3.1
requests-toolbelt 0.9.1
rfc3986 1.5.0
rich 11.2.0
rsa 4.8
SecretStorage 3.3.1
semantic-version 2.9.0
setproctitle 1.2.2
setuptools 59.0.1
setuptools-rust 1.1.2
six 1.16.0
sniffio 1.2.0
SQLAlchemy 1.4.31
SQLAlchemy-JSONField 1.0.0
SQLAlchemy-Utils 0.38.2
swagger-ui-bundle 0.0.9
tables 3.6.1
tabulate 0.8.9
tenacity 8.0.1
termcolor 1.1.0
text-unidecode 1.3
tomli 2.0.1
tornado 6.1
tqdm 4.62.3
twine 3.8.0
typing_extensions 4.1.1
unicodecsv 0.14.1
urllib3 1.26.8
vine 5.0.0
wcwidth 0.2.5
webencodings 0.5.1
websocket 0.2.1
websocket-client 1.2.3
Werkzeug 1.0.1
wheel 0.37.0
WTForms 2.3.3
zipp 3.7.0
zope.event 4.5.0
zope.interface 5.4.0

Deployment

Docker-Compose

Deployment details

I have three machines
Machine A: run airflow webserver and scheduler
B and C machines: run celery worker
All operators run on the k8s cluster through k8s_config

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@chengzi0103 chengzi0103 added area:core kind:bug This is a clearly a bug labels Feb 22, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Feb 22, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@uranusjr
Copy link
Member

Looks similar to #12136 and #15990.

@chengzi0103
Copy link
Author

@uranusjr I noticed that @raphaelauv does not have this problem when using apache-airflow-providers-cncf-kubernetes version greater than 3.0.0 but my version is obviously 3.0.2 I don't know where the problem is

@chengzi0103
Copy link
Author

chengzi0103 commented Feb 22, 2022

download = KubernetesPodOperator(
task_id='X12',
is_delete_operator_pod=True,
get_logs=True,
image=images_name,
namespace=name_space,
name=f'download_api',
cmds=['python cmd'],
arguments=[f"--account={config.dags['daily_01_download_process']['account']}"],
in_cluster=False,
volumes=[get_k8s_pod_mount_volume_of_host(mount_local_path)],
volume_mounts=[get_k8s_pod_mount_volume_of_worker(remote_path), ], )

@potiuk potiuk added the provider:cncf-kubernetes Kubernetes provider related issues label Mar 28, 2022
@potiuk
Copy link
Member

potiuk commented Mar 28, 2022

Please downgrade your kubernetes library version to 11.0.0 (you have kubernetes 22.6.0) @chengzi0103 . You apparently did not use constraints when you installed airflow and providers. We are just adding a protection to make it harder to upgrade to incompatible version of kubernets library, and we just yanked 3.0.2 cncf.kubernetes provider for people who did it - but using constraints as described in https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html is the best way to have stable airflow installation (that's the only officially supported way of installing Airflow).

@potiuk potiuk closed this as completed Mar 28, 2022
@dstandish dstandish changed the title The problem of using the KubernetesPodOperator log to report errors Error in KubernetesPodOperator while fetching logs from kube API Mar 28, 2022
potiuk added a commit to potiuk/airflow that referenced this issue Mar 29, 2022
Kubernetes and Celery are both providers and part of the core.
The dependencies for both are added via "extras" which makes them
"soft" limits and in case of serious dependency bumps this might
end up with a mess (as we experienced with bumping min K8S
library version from 11.0.0 to 22.* (resulting in yanking 4
versions of `cncf.kubernetes` provider.

After this learning, we approach K8S and Celery dependencies a bit
differently than any other dependencies.

* for Celery and K8S (and Dask but this is rather an afterhought)
  we do not strip-off the dependencies from the extra (so for
  example [cncf.kubernetes] extra will have dependencies on
  both 'apache-airflow-providers-cncf-kubernetes' as well as
  directly on kubernetes library

* We add upper-bound limits for both Celery and Kubernetes to prevent
  from accidental upgrades. Both Celery and Kubernetes Python library
  follow SemVer, and they are crucial components of Airlfow so they
  both squarely fit our "do not upper-bound" exceptions.

* We also add a rule that whenever dependency upper-bound limit is
  raised, we should also make sure that additional testing is done
  and appropriate `apache-airflow` lower-bound limit is added for
  the `apache-airflow-providers-cncf-kubernetes` and
  `apache-airflow-providers-celery` providers.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

Related to: apache#22560, apache#21727
potiuk added a commit to potiuk/airflow that referenced this issue Mar 29, 2022
Kubernetes and Celery are both providers and part of the core.
The dependencies for both are added via "extras" which makes them
"soft" limits and in case of serious dependency bumps this might
end up with a mess (as we experienced with bumping min K8S
library version from 11.0.0 to 22.* (resulting in yanking 4
versions of `cncf.kubernetes` provider.

After this learning, we approach K8S and Celery dependencies a bit
differently than any other dependencies.

* for Celery and K8S (and Dask but this is rather an afterhought)
  we do not strip-off the dependencies from the extra (so for
  example [cncf.kubernetes] extra will have dependencies on
  both 'apache-airflow-providers-cncf-kubernetes' as well as
  directly on kubernetes library

* We add upper-bound limits for both Celery and Kubernetes to prevent
  from accidental upgrades. Both Celery and Kubernetes Python library
  follow SemVer, and they are crucial components of Airlfow so they
  both squarely fit our "do not upper-bound" exceptions.

* We also add a rule that whenever dependency upper-bound limit is
  raised, we should also make sure that additional testing is done
  and appropriate `apache-airflow` lower-bound limit is added for
  the `apache-airflow-providers-cncf-kubernetes` and
  `apache-airflow-providers-celery` providers.

As part of this change we also had to fix two issues:
* the image was needlesly rebuilt during constraint generation as
  we already have the image and we even warn that it should
  be built before we run constraint generation

* after this change, the currently released, unyanked cncf.kubernetes
  provider cannot be installed with airflow, because it has
  conflicting requirements for kubernetes library (provider has
  <11 and airflow has > 22.7). Therefore during constraint
  generation with PyPI providers we install providers from PyPI, we
  explicitly install the yanked 3.1.2 version. This should be
  removed after we release the next K8S provider version.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

Related to: apache#22560, apache#21727
@dstandish
Copy link
Contributor

hi @chengzi0103 it appears that your task likely failed for reasons unrelated to the traceback shown.

sometimes logs read is interrupted due to connection issue. in that case we catch the error and resume logging. and that's what that traceback is about. but note that it is only a warning, and that the logs later resume, and the task doesn't fail for another 8 minutes.

your issue report inspired us to move that traceback to the DEBUG level in #22595, so as not to cause false alarm or confusion.

@raphaelauv
Copy link
Contributor

raphaelauv commented Mar 29, 2022

@chengzi0103

Yes my setting is

 kubernetes==11.0.0
 apache-airflow-providers-cncf-kubernetes==3.0.2

potiuk added a commit that referenced this issue Mar 29, 2022
Kubernetes and Celery are both providers and part of the core.
The dependencies for both are added via "extras" which makes them
"soft" limits and in case of serious dependency bumps this might
end up with a mess (as we experienced with bumping min K8S
library version from 11.0.0 to 22.* (resulting in yanking 4
versions of `cncf.kubernetes` provider.

After this learning, we approach K8S and Celery dependencies a bit
differently than any other dependencies.

* for Celery and K8S (and Dask but this is rather an afterhought)
  we do not strip-off the dependencies from the extra (so for
  example [cncf.kubernetes] extra will have dependencies on
  both 'apache-airflow-providers-cncf-kubernetes' as well as
  directly on kubernetes library

* We add upper-bound limits for both Celery and Kubernetes to prevent
  from accidental upgrades. Both Celery and Kubernetes Python library
  follow SemVer, and they are crucial components of Airlfow so they
  both squarely fit our "do not upper-bound" exceptions.

* We also add a rule that whenever dependency upper-bound limit is
  raised, we should also make sure that additional testing is done
  and appropriate `apache-airflow` lower-bound limit is added for
  the `apache-airflow-providers-cncf-kubernetes` and
  `apache-airflow-providers-celery` providers.

As part of this change we also had to fix two issues:
* the image was needlesly rebuilt during constraint generation as
  we already have the image and we even warn that it should
  be built before we run constraint generation

* after this change, the currently released, unyanked cncf.kubernetes
  provider cannot be installed with airflow, because it has
  conflicting requirements for kubernetes library (provider has
  <11 and airflow has > 22.7). Therefore during constraint
  generation with PyPI providers we install providers from PyPI, we
  explicitly install the yanked 3.1.2 version. This should be
  removed after we release the next K8S provider version.

That should protect our users in all scenarios where they might
unknowingly attempt to upgrade Kubernetes or Celery to incompatible
version.

Related to: #22560, #21727
@chengzi0103
Copy link
Author

@chengzi0103

Yes my setting is

 kubernetes==11.0.0
 apache-airflow-providers-cncf-kubernetes==3.0.2

Thank you for your answers I will try to use larger clusters and resources later this problem occurs less often I will use your suggestions Thanks again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

5 participants