Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream dmypy output instead of dumping everything at the end #16252

Merged
merged 4 commits into from
Oct 16, 2023

Conversation

svalentin
Copy link
Collaborator

@svalentin svalentin commented Oct 11, 2023

This does 2 things:

  1. It changes the IPC code to work with multiple messages.
  2. It changes the dmypy client/server communication so that it streams stdout/stderr instead of dumping everything at the end.

For 1, we have to provide a way to separate out different messages. I chose to frame messages as bytes separated by whitespace character. That means we have to encode the message in a scheme that escapes whitespace. The urllib.parse quote/unquote seems reasonable. It encodes more than needed but the application is not IPC IO limited so it should be fine. With this convention in place, all we have to do is read from the socket stream until we have a whitespace character.
The framing logic can be easily changed.

For 2, since we communicate with JSONs, it's easy to add a "finished" key that tells us it's the final response from dmypy. Anything else is just stdout/stderr output.

Note: dmypy server also returns out/err which is the output of actual mypy type checking. Right now this change does not stream that output. We can stream that in a followup change. We just have to decide on how to differenciate the 4 text streams (stdout/stderr/out/err) that will now be interleaved.

Played around with it on Linux quite a bit. Will also test it some more on Windows.

The WriteToConn class could use more love. I just put a bare minimum to test the rest.

This does 2 things:
1. It changes the IPC code to work with multiple messages.
2. It changes the dmypy client/server communication so that it streams
stdout/stderr instead of dumping everything at the end.

For 1, we have to provide a way to separate out different messages.
We can frame messages as bytes separated by whitespace character. That
means we have to encode the message in a scheme that escapes whitespace.
The urllib.parse quote/unquote seems reasonable. It encodes more than
needed but the application is not IPC IO limited so it should be fine.
With this convention in place, all we have to do is read from the socket
stream until we have a whitespace character.

For 2, since we communicate with JSONs, it's easy to add a "finished"
key that tells us it's the final response from dmypy. Anything else is
just stdout/stderr output.

Note: dmypy server also returns out/err which is the output of actual
mypy type checking. Right now this change does not stream that output.
We can stream that in a followup change. We just have to decide on how
to differenciate the 4 text streams (stdout/stderr/out/err) that will
now be interleaved.

Played around with it on Linux quite a bit. Will also test it some more
on Windows.

The WriteToConn class could use more love. I just put a bare minimum to
test the rest.
@svalentin svalentin requested a review from JukkaL October 11, 2023 22:27
mypy/dmypy_server.py Outdated Show resolved Hide resolved
@github-actions

This comment has been minimized.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, overall looks good! This will make debugging daemon issues easier. Left a few comments (not a full review).

mypy/ipc.py Outdated
@@ -13,6 +13,7 @@
import tempfile
from types import TracebackType
from typing import Callable, Final
from urllib.parse import quote, unquote
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a very minor optimization, what about using codecs.encode(<bytes_data>, 'hex') and codecs.decode(<encoded_data>, 'hex') instead? We wouldn't need to depend on urllib.parse, which is a Python package so importing it may take some time, and it would probably be a bit faster as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A similar minor optimization could be to use bytes.hex() and bytes.fromhex().

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up going with codecs.encode(<bytes_data>, 'base64'). I think it's a better encoding for this. It has a wider alphabet, but without space. Can easily change it to hex if you think it's better.

Base64 is a scheme for converting binary data to printable ASCII characters, namely the upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols, with the "=" symbol as a special suffix code.

p.start()
connection_name = queue.get()
with IPCClient(connection_name, timeout=1) as client:
client.write("test1")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test writing whitespace and non-ascii characters? Maybe test all unicode code points within some range.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Also added a test for multiple consecutive messages to test when a buffer has multiple frames inside it.

Copy link
Collaborator Author

@svalentin svalentin Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I didn't test all unicode code points. I tested some. I would trust codecs.encode to base64 does the right thing and we don't have to worry about it.

@github-actions
Copy link
Contributor

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@svalentin svalentin merged commit 2bcec24 into python:master Oct 16, 2023
17 checks passed
@svalentin svalentin deleted the stream-output branch October 16, 2023 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants