Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefect agent raises InvalidChunkLength while streaming Kubernetes logs #7653

Closed
4 tasks done
BitTheByte opened this issue Nov 24, 2022 · 4 comments
Closed
4 tasks done
Labels
bug Something isn't working

Comments

@BitTheByte
Copy link
Contributor

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar issue and didn't find it.
  • I searched the Prefect documentation for this issue.
  • I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

Prefect agent fails randomly with "InvalidChunkLength" error

Reproduction

Any flow run

Error

09:29:33.114 | ERROR   | prefect.agent - An error occured while monitoring flow run '0e2484ad-3cf9-4ada-beb4-7bd031f7fd7a'. The flow run will not be marked as failed, but an issue may have occurred.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 748, 
in _update_chunk_length
    self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 443, 
in _error_catcher
    yield
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 815, 
in read_chunked
    self._update_chunk_length()
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 752, 
in _update_chunk_length
    raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 
bytes read)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/prefect/agent.py", line 267, in 
_submit_run_and_capture_errors
    result = await infrastructure.run(task_status=task_status)
  File 
"/usr/local/lib/python3.10/dist-packages/prefect/infrastructure/kubernetes.py", 
line 286, in run
    return await run_sync_in_worker_thread(self._watch_job, job_name)
  File 
"/usr/local/lib/python3.10/dist-packages/prefect/utilities/asyncutils.py", line 
68, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(call, cancellable=True)
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 31, in
run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", 
line 937, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", 
line 867, in run
    result = context.run(func, *args)
  File 
"/usr/local/lib/python3.10/dist-packages/prefect/infrastructure/kubernetes.py", 
line 471, in _watch_job
    for log in logs.stream():
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 623, 
in stream
    for line in self.read_chunked(amt, decode_content=decode_content):
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 803, 
in read_chunked
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/usr/local/lib/python3.10/dist-packages/urllib3/response.py", line 460, 
in _error_catcher
    raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ("Connection broken: InvalidChunkLength(got 
length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))

Versions

Version:             2.6.8
API version:         0.8.3
Python version:      3.10.8
Git commit:          68044e28
Built:               Thu, Nov 17, 2022 3:19 PM
OS/Arch:             linux/x86_64
Profile:             default
Server type:         hosted

Additional context

No response

@BitTheByte BitTheByte added bug Something isn't working status:triage labels Nov 24, 2022
@BitTheByte
Copy link
Contributor Author

Related to kubernetes-client/python#728

@zanieb zanieb changed the title Prefect agent InvalidChunkLength Prefect agent raises InvalidChunkLength while streaming Kubernetes logs Nov 28, 2022
@zanieb
Copy link
Contributor

zanieb commented Nov 28, 2022

@BitTheByte Thanks for the issue. Please try to include clearer reproduction instructions, this looks specific to a Kubernetes job with streaming logs enabled. Have you tried disabling stream_output?

@zzstoatzz I see you linked to the same issue in prefect-kubernetes — do you have a solution for this?

@BitTheByte
Copy link
Contributor Author

Please try to include clearer reproduction instructions

I think it's caused by networking hiccups so no way to reproduce it reliably

Have you tried disabling stream_output?

Will give it a try

@zzstoatzz
Copy link
Collaborator

hmm I haven't seen this error in particular

in case its helpful, my connection to that issue (retrying expired watches) was solved by this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants