-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stream dmypy output instead of dumping everything at the end #16252
Conversation
This does 2 things: 1. It changes the IPC code to work with multiple messages. 2. It changes the dmypy client/server communication so that it streams stdout/stderr instead of dumping everything at the end. For 1, we have to provide a way to separate out different messages. We can frame messages as bytes separated by whitespace character. That means we have to encode the message in a scheme that escapes whitespace. The urllib.parse quote/unquote seems reasonable. It encodes more than needed but the application is not IPC IO limited so it should be fine. With this convention in place, all we have to do is read from the socket stream until we have a whitespace character. For 2, since we communicate with JSONs, it's easy to add a "finished" key that tells us it's the final response from dmypy. Anything else is just stdout/stderr output. Note: dmypy server also returns out/err which is the output of actual mypy type checking. Right now this change does not stream that output. We can stream that in a followup change. We just have to decide on how to differenciate the 4 text streams (stdout/stderr/out/err) that will now be interleaved. Played around with it on Linux quite a bit. Will also test it some more on Windows. The WriteToConn class could use more love. I just put a bare minimum to test the rest.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, overall looks good! This will make debugging daemon issues easier. Left a few comments (not a full review).
mypy/ipc.py
Outdated
@@ -13,6 +13,7 @@ | |||
import tempfile | |||
from types import TracebackType | |||
from typing import Callable, Final | |||
from urllib.parse import quote, unquote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a very minor optimization, what about using codecs.encode(<bytes_data>, 'hex')
and codecs.decode(<encoded_data>, 'hex')
instead? We wouldn't need to depend on urllib.parse
, which is a Python package so importing it may take some time, and it would probably be a bit faster as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar minor optimization could be to use bytes.hex() and bytes.fromhex().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ended up going with codecs.encode(<bytes_data>, 'base64')
. I think it's a better encoding for this. It has a wider alphabet, but without space. Can easily change it to hex if you think it's better.
Base64 is a scheme for converting binary data to printable ASCII characters, namely the upper- and lower-case Roman alphabet characters (A–Z, a–z), the numerals (0–9), and the "+" and "/" symbols, with the "=" symbol as a special suffix code.
mypy/test/testipc.py
Outdated
p.start() | ||
connection_name = queue.get() | ||
with IPCClient(connection_name, timeout=1) as client: | ||
client.write("test1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test writing whitespace and non-ascii characters? Maybe test all unicode code points within some range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Also added a test for multiple consecutive messages to test when a buffer has multiple frames inside it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I didn't test all unicode code points. I tested some. I would trust codecs.encode to base64 does the right thing and we don't have to worry about it.
9081b1e
to
73ecb8a
Compare
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
This does 2 things:
For 1, we have to provide a way to separate out different messages. I chose to frame messages as bytes separated by whitespace character. That means we have to encode the message in a scheme that escapes whitespace. The urllib.parse quote/unquote seems reasonable. It encodes more than needed but the application is not IPC IO limited so it should be fine. With this convention in place, all we have to do is read from the socket stream until we have a whitespace character.
The framing logic can be easily changed.
For 2, since we communicate with JSONs, it's easy to add a "finished" key that tells us it's the final response from dmypy. Anything else is just stdout/stderr output.
Note: dmypy server also returns out/err which is the output of actual mypy type checking. Right now this change does not stream that output. We can stream that in a followup change. We just have to decide on how to differenciate the 4 text streams (stdout/stderr/out/err) that will now be interleaved.
Played around with it on Linux quite a bit. Will also test it some more on Windows.
The WriteToConn class could use more love. I just put a bare minimum to test the rest.