-
-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor out IO-selection code, prepare for PR #2466 #2762
Conversation
@claui Can you explain a bit on what this class does, when it would be used, how it is implemented and where else in Homebrew we'd use it (and why)? At the moment it's a bit opaque to me. Thanks! |
@MikeMcQuaid Sure! A recap on the technical background
What the class does
UsageAs an example to use the class, you do: selector = IOSelector.new(stdout: $stdout, stderr: $stderr) and then: selector.each_line_nonblock do |tag, line|
case tag
when :stdout then # `line` comes from standard output
else # `line` comes from standard error
end
end tl;drWhat the class does is,
|
@MikeMcQuaid I appreciate this info a lot – it’s a sign that there’s still wiggle room to make the code simpler and more expressive. Feedback welcome! |
end | ||
|
||
def select_streams | ||
IO.select(pending_streams)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is short enough to be in-lined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback. While I technically agree, I beg to differ from an engineering point of view, even more so as we’re dealing with Ruby code. As Avdi Grimm puts it in Confident Ruby,
Writing maintainable code is all about focusing our mental energy on one task at a time, and equally importantly, on one level of abstraction at a time.
That maintainability thing is why I almost never inline such methods in Ruby.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that makes sense. In that case, I'd name this readable_streams
and close_all
close_streams
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. Updated.
end | ||
|
||
def close_all | ||
all_streams.each(&:close_read) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be in-lined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above
@pending_streams ||= all_streams.dup | ||
end | ||
|
||
def initialize(streams = {}, separator = $/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this up to be the first method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks. Done
private | ||
|
||
def each_line_nonblock_enum | ||
Enumerator.new do |yielder| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for using an Enumerator
here, instead of simply calling this directly in each_line_nonblock
above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is in fact one of my favorite concepts in the Ruby standard library! 😊
The benefit here is that we can invoke each_line_nonblock
without a block – as a second form of calling the method, so to speak.
Not having a block causes b == nil
, so we’re basically calling #each(nil)
on the enumerator, which (by contract) means that the enumerator returns itself.
So the wrapper gives us a second calling form for free:
-
each_line_nonblock { |tag, line| } → obj
-
each_line_nonblock → Enumerator
This, in turn, allows client code to do useful and concise things like e. g.:
selector.each_line_nonblock.map { |_, line| line }.join("")
instead of simply calling this directly […]?
Inlining this would mean we’d have to go with do […] end.each
, which would go against our style guide, and Rubocop would certainly want to have a word with us. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That all said, I have no strong opinion on doing away with the second calling form altogether. If we feel it’s unneeded, I’m willing to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't think about calling it without a block, since the only current use for this is with a block. This is neat, but I'm not sure if there will be a use for this without a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if there will be a use for this without a block.
I’m not sure either. Removed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have also removed it from the specs now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to review this more thoroughly but leaving a request changes just so it's not merged before I do.
@reitermarkus @MikeMcQuaid Thanks for committing and taking your time to review. There’s no hurry – so the more eyeballs, the better |
One more nit: Change |
… and done. I have also changed |
|
||
::Utils::IOSelector | ||
.each_line_from(stdout: raw_stdout, stderr: raw_stderr, &b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this (and the method parameter) be &block
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I figure that should be &block
too; however that should be left to another PR.
I have left &b
untouched because the corresponding method signature does not really relate to this issue. I’d prefer to keep commits focused, and diffs too
Clarification question: the old |
This is complicated enough that maybe some of the explanation here should go in comments on the new |
Oh - I think there may be a bug where this could return spurious line breaks when the buffer actually fills. Let me do a closer read on this before merging... |
I think |
Yeah, I think there's a minor bug here, having to do with the design of def readline_nonblock(sep = $INPUT_RECORD_SEPARATOR)
line = ""
buffer = ""
loop do
break if buffer == sep
read_nonblock(1, buffer)
line.concat(buffer)
end
line
rescue IO::WaitReadable, EOFError => e
raise e if line.empty?
line
end So, There's nothing in the return value to indicate whether the string was returned due to and end-of-line or a wait. (The trailing record separator can't be used reliably, because it might not exist in the last line of the file, so it can't distinguish EOF from a wait.) So the calling code sees each returned value as a full line. If the blocking IO occurs in the middle of a line, the caller will end up incorrectly processing it as two or more lines. This doesn't matter if all it's doing is just re-echoing the output. But if it's doing something to each line (like prefixing them, or counting lines, or parsing each line), then it could affect correctness. To be correct, I think that It looks like you're doing a similar thing with the "chunked buffers" in #2466. I just think a similar approach needs to be applied to the text-reading functions, too. (Not slurping the whole output streams there; just enough to handle partial line reads.) Also, this won't work on DOS mode text files or other files with multi-byte record separators, since Do you folks see the same issue? Or am I misreading this? |
require "extend/io" | ||
|
||
module Utils | ||
class IOSelector < DelegateClass(Hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe IOMultiplexer
would be a clearer name? Easier to Google for, and this class does more than just the selection; it actually performs the I/O, too, to some extent. I dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[This] class does more than just the selection; it actually performs the I/O, too, to some extent.
Very good point. 👍
The class was initially meant to be no more than a wrapper for IO::select
. One of the class-level comments I just have added says:
The class
IOSelector
is a wrapper forIO::select
with the added benefit that it spans the streams' lifetimes.
Apart from IO::select
, the I/O that happens would be eof?
and close_read
. My idea of the class was that the client remains fully responsible for reading. (Otherwise we could have simply used Open3::capture3
.)
Somehow, in #2466, I allowed a convenience method #binread_nonblock
to creep in, which does include reading, just to spare the clients from writing a few more lines. Maybe that was a bad idea; It’s definitely beyond what a selector should do.
It feels like we’re having a hard time to find a good name for the class. It also doesn’t seem immediately obvious what the class does. (@MikeMcQuaid has mentioned it a few days ago.) This raises a few red flags for me. I sense a design smell here, not a naming issue. The class wants to do too much.
I figure it would help if I removed all the non-essential I/O code (i. e. the #binread_nonblock
convenience method and the close_streams
private method), and let the client handle that for now. That would leave the IOSelector
class responsible for IO::select
and eof?
, and nothing else.
Thoughts?
@separator = separator | ||
end | ||
|
||
def each_line_nonblock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is each_line_nonblock
the right name for this method? It calls readline_nonblock
, which is nonblocking, but each_readable_stream_until_eof
is calling IO::select
, which may block if there is no input on any stream, so this method may block. Maybe each_line_each_stream
or each_line_all_streams
or each_line_dont_deadlock
?
|
||
private | ||
|
||
def each_readable_stream_until_eof(&block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe read_io_until_eofs
or multiplex_streams_until_eof
or similar would be a better name? To me, in Ruby, a name each_X
suggests iterating over the Xs, processing each once in turn. This method doesn't iterate over the streams; it multiplexes over them in an arbitrary order determined by external state, with possible repeats. (The readable_streams.each
does iterate over some streams, but the streams in each call are an arbitrary sublist of the streams passed in to the IOSelector
object, and doesn't correspond to the behavior visible to the caller.)
My few comments aside, based on your description and references, this makes sense to me, both in terms of how the deadlock happens and how The upshot of this particular PR is that it behaves the same as the current code (for reading text), but it's factored in to smaller and more generic pieces so that binary-reading/non-line-oriented support can be added in #2466, right? |
@reitermarkus We already use |
This commit sets the stage for an upcoming fix for #2466. In a nutshell, this is what it does: - In `cask/system_command.rb`, factor out existing code (“IO selection”) that we’re going to need later in order to fix PR #2466; - move said code into its own class `Utils::IOSelector`; - split the class into smaller methods to make the code more expressive than before; and - add unit tests for `Utils::IOSelector` (they’re a bit bloated because edge cases.)
@apjanke The old code seems to work correctly in all but a few edge cases, which I have never heard of or seen in the wild. The new class now visits each readable stream (not just one) between subsequent Other than that, yes, this PR simply factors out the existing code to make it reusable for #2466.
Good idea. I have added a class-level comment with a short explanation and a link to #2466. |
@apjanke You’re correct. This is an existing bug in our (monkey-patched) As a real-life example,
(Note: per Without the bug, the following code would have been sufficient:
I’d rather not have a fix for In order to still get it fixed (in time before someone starts relying on |
@apjanke That’s exactly what I had in mind. Thank you for this summary – and for your other suggestions. They have helped a lot! |
Great! Glad to be helpful; it's been a while since I've had a Homebrew PR that was really up my alley like this. |
Fine by me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few inline comments but some general comments to follow.
@@ -0,0 +1,79 @@ | |||
require "delegate" | |||
require "English" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be required; it's part of global
now. It may not be needed in the spec_helper
either.
module Utils | ||
# | ||
# The class `IOSelector` is a wrapper for `IO::select` with the | ||
# added benefit that it spans the streams' lifetimes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand why this is a benefit, can you elaborate.
# The class `IOSelector` is a wrapper for `IO::select` with the | ||
# added benefit that it spans the streams' lifetimes. | ||
# | ||
# The class accepts multiple IOs which must be open for reading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean they must be already open? When would they not be open?
# added benefit that it spans the streams' lifetimes. | ||
# | ||
# The class accepts multiple IOs which must be open for reading. | ||
# It then notifies the client as data becomes available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the client? What is the notification?
|
||
alias all_streams keys | ||
alias all_tags values | ||
alias tag_of fetch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the aliases needed?
alias tag_of fetch | ||
|
||
def self.each_line_from(streams = {}, | ||
separator = $INPUT_RECORD_SEPARATOR, &block) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you indent this further in; this looks currently like it's a line in the function.
alias all_tags values | ||
alias tag_of fetch | ||
|
||
def self.each_line_from(streams = {}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious what the streams
input hash format should be.
end | ||
|
||
def readable_streams | ||
IO.select(pending_streams)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd favour .first
here
Thanks for this @claui, I like where this is going. Some general comments:
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
@probot-stale[bot] Bear with me for another couple of days, little robot. |
@claui How's this looking? |
@MikeMcQuaid Thanks for the feedback!
I like this idea a lot. I’m in the middle of moving the whole thing one abstraction level up and see what happens. It often amazes me how in coding, we tend to uncover proper abstractions along the way, not up front. A bit like a story that unfolds in a detective novel.
Actually, I’m an avid YAGNI supporter. Thanks for pointing out cases where unneeded abstractions have crept in; I fully agree, and will gladly remove those.
I believe this is somewhat of a trade-off. Even if we completely leave aside the “idiomatic Ruby” argument: In my experience, code gets read much more often than it gets written. Unfortunately, most developers also seem to prefer writing code over reading. This is why I consider myself a readability evangelist. In that specific case you have mentioned, I don’t see yet how separating code into functions, if named properly, could hurt readability. Would you mind giving an example? I. e., could you please name a specific example where inlining code would make it easier to read?
Just to make sure we’re on the same line here, I assume that by unifying, you mean inlining all functions which are only used in one place. I feel you have a good point, and I’ll be happy to inline the methods as requested; that said, I’d also like you to know the reasons why I tend to avoid inlining. When reading or writing code, we consume abstractions all the time, e. g. when calling a standard library function, or even a language keyword. If we only looked at the “how often is this method called?” metric, we’d end up inlining things like system calls, too. This is why I believe there must be more to it. In other words, I feel we should not focus on the single question of “are we using this just once, or more than once?”. There probably is no definitive answer; therefore, I’ll leave it at that for now. I think I’m going to finish that change in abstraction/inlining soon. |
My example would be
Yep!
Yep, I do agree here. I guess that was my specific recommendation for this PR in particular as I'm finding it hard to follow what's going on due to having to jump between function calls which abstract behaviour in a way that's opaque without a method name a similar length to the function itself. |
I'm normally in favor of encapsulating stuff in well-named functions for readability, but I agree with @MikeMcQuaid in this case in terms of readability. It feels like I'm having to keep a lot of abstractions in my head in this case here, and just seeing the inlined implementation would reduce cognitive load. That's the perspective of me as a relative Ruby newbie, at any rate. |
@claui @apjanke This is a good reason on this topic: https://medium.com/@copyconstruct/small-functions-considered-harmful-91035d316c29 |
Closing this as there's been no updates for over a month. We'll review a new PR with comments addressed. |
brew tests
with your changes locally?This commit sets the stage for an upcoming fix for #2466.
In a nutshell, this is what it does:
In
cask/system_command.rb
, factor out existing code (“IO selection”) that we’re going to need later in order to fix PR [WIP]brew search
freezes on some GitHub API responses #2466;move said code into its own class
Utils::IOSelector
;split the class into smaller methods to make the code more expressive than before; and
add unit tests for
Utils::IOSelector
(they’re a bit bloated because edge cases.)@Homebrew/maintainers As we’re not really in a hurry here (the issue seems to come up only sporadically at the moment), I’d appreciate a thorough review even though it brings zero new features. 😊
Thanks!