Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidChunkLength(got length b'', 0 bytes read)) issue in airflow scheduler which is using kubernetes executor #41753

Open
2 tasks done
shubham-vonage opened this issue Aug 26, 2024 · 3 comments
Labels
area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:cncf-kubernetes Kubernetes provider related issues

Comments

@shubham-vonage
Copy link

Apache Airflow Provider(s)

cncf-kubernetes

Versions of Apache Airflow Providers

apache-airflow-providers-cncf-kubernetes==8.0.1
kubernetes==29.0.0
kubernetes_asyncio==29.0.0

kubernetes version:
Client Version: v1.28.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.9

Apache Airflow version

2.8.3

Operating System

Linux

Deployment

Official Apache Airflow Helm Chart

Deployment details

No response

What happened

I have deployed the airflow using official helm chart in my cluster. everything seems to be working fine, but when the scheduler is idle it gives this error:

ERROR - Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 761, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 444, in _error_catcher
    yield
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 828, in read_chunked
    self._update_chunk_length()
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 765, in _update_chunk_length
    raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)

During handling of the above exception, another exception occurred:


Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py", line 112, in run
    self.resource_version = self._run(
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py", line 168, in _run
    for event in self._pod_events(kube_client=kube_client, query_kwargs=kwargs):
  File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py", line 178, in stream
    for line in iter_resp_lines(resp):
  File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py", line 56, in iter_resp_lines
    for segment in resp.stream(amt=None, decode_content=False):
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 624, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 857, in read_chunked
    self._original_response.close()
  File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 461, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
Process KubernetesJobWatcher-7:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 761, in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 444, in _error_catcher
    yield
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 828, in read_chunked
    self._update_chunk_length()
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 765, in _update_chunk_length
    raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py", line 112, in run
    self.resource_version = self._run(
  File "/usr/local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py", line 168, in _run
    for event in self._pod_events(kube_client=kube_client, query_kwargs=kwargs):
  File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py", line 178, in stream
    for line in iter_resp_lines(resp):
  File "/usr/local/lib/python3.9/site-packages/kubernetes/watch/watch.py", line 56, in iter_resp_lines
    for segment in resp.stream(amt=None, decode_content=False):
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 624, in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 857, in read_chunked
    self._original_response.close()
  File "/usr/local/lib/python3.9/contextlib.py", line 137, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.9/site-packages/urllib3/response.py", line 461, in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

The scheduler works normally but this error is continuously generated. The same chart doesn't give any error in other cluster.
What might be the issue.

P.S.: I have read this issue : #33066 but didnt get any conclusion on this.

What you think should happen instead

The scheduler should not give any error. The kubewatcher seems to be the core component for this issue.

How to reproduce

Use official helm chart 1.13.1 with these configs:

airflowVersion: 2.8.3

# Ingress configuration
ingress:
  enabled: false
  # Enable web ingress resource
  web:
    enabled: True
    # Annotations for the web Ingress

config:
  core:
    parallelism: 32
    max_active_tasks_per_dag: 16
    max_active_runs_per_dag: 16
    dagbag_import_timeout: 100
    dag_file_processor_timeout: 50
    min_serialized_dag_update_interval: 60
    min_serialized_dag_fetch_interval: 30

pgbouncer:
  # Enable PgBouncer
  enabled: true

scheduler:
  replicas: 1
  resources:
    limits:
      cpu: 1500m
      memory: 1500Mi
    requests:
      cpu: 1000m
      memory: 1200Mi

  livenessProbe:
    initialDelaySeconds: 120
    timeoutSeconds: 30
    failureThreshold: 5
    periodSeconds: 300
    command: ~

webserver:
  replicas: 1
  resources:
    limits:
      cpu: 1000m
      memory: 1500Mi
    requests:
      cpu: 500m
      memory: 1200Mi

  livenessProbe:
    initialDelaySeconds: 15
    timeoutSeconds: 30
    failureThreshold: 20
    periodSeconds: 5

  readinessProbe:
    initialDelaySeconds: 15
    timeoutSeconds: 30
    failureThreshold: 20
    periodSeconds: 5

  webserverConfig: |
    from flask_appbuilder.security.manager import AUTH_DB
    # use embedded DB for auth
    AUTH_TYPE = AUTH_DB
    

dags:
  persistence:
    enabled: true
    size: 10Gi
    accessMode: ReadWriteMany

logs:
  persistence:
    enabled: true
    size: 10Gi

postgresql:
  enabled: True

Anything else

This occurs continuously whenever the airflow scheduler is idle (it is not running any dags)

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@shubham-vonage shubham-vonage added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Aug 26, 2024
Copy link

boring-cyborg bot commented Aug 26, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@dosubot dosubot bot added the provider:cncf-kubernetes Kubernetes provider related issues label Aug 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

3 participants
@shubham-vonage and others