Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factor out IO-selection code, prepare for PR #2466 #2762

Closed
wants to merge 1 commit into from
Closed

Factor out IO-selection code, prepare for PR #2466 #2762

wants to merge 1 commit into from

Conversation

claui
Copy link
Contributor

@claui claui commented Jun 9, 2017

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew tests with your changes locally?

This commit sets the stage for an upcoming fix for #2466.
In a nutshell, this is what it does:

  • In cask/system_command.rb, factor out existing code (“IO selection”) that we’re going to need later in order to fix PR [WIP] brew search freezes on some GitHub API responses #2466;

  • move said code into its own class Utils::IOSelector;

  • split the class into smaller methods to make the code more expressive than before; and

  • add unit tests for Utils::IOSelector (they’re a bit bloated because edge cases.)

@Homebrew/maintainers As we’re not really in a hurry here (the issue seems to come up only sporadically at the moment), I’d appreciate a thorough review even though it brings zero new features. 😊

Thanks!

@MikeMcQuaid
Copy link
Member

@claui Can you explain a bit on what this class does, when it would be used, how it is implemented and where else in Homebrew we'd use it (and why)? At the moment it's a bit opaque to me. Thanks!

@claui
Copy link
Contributor Author

claui commented Jun 10, 2017

Can you explain a bit on what this class does, when it would be used, how it is implemented and where else in Homebrew we'd use it (and why)?

@MikeMcQuaid Sure!

A recap on the technical background

  • When you run a Unix command and it gives you standard output and standard error, you might expect that those two streams be independent from each other.

  • However, they’re not really that independent. In fact, stdout sometimes stalls completely – for example when stderr has too much unread data. That data fills up the internal buffer of the runtime environment so there’s no way.

  • As of today, utils/curl.rb uses a one-at-a-time approach: “First, I’m going to fetch stdoutall of it until EOF – and only when I’m done, I’ll fetch stderrall of it until EOF.”
    This is the line of code where this happens, albeit a bit subtly written.

  • This kind of behavior leads to sporadic deadlocks, as explained in more detail outside of Homebrew (1) (2). We have been bitten by it in the past; #18638 in Caskroom was the first issue where this got my attention. I figure that for PR [WIP] brew search freezes on some GitHub API responses #2466, we should reuse the code we already have from the keg_relocate: fix error when dylib_id doesn't need changing #18638 fix.

  • One analogy that has helped me personally understand the deadlock situation a bit better: imagine an email provider that gives me two mailboxes, one work account and one home account, with a combined quota of 10 GB.
    Now I say to myself, »Gee, I’m gonna fetch only work email, and nothing else, UNTIL ALL THE WORK IS DONE«. Guess what happens after a couple of years? My home email will have filled up the quota completely because I have never bothered to fetch it for years. So I can’t fetch email anymore, neither work nor private.
    This is roughly the same thing utils/curl.rb does with stdout and stderr.

  • As a practical, hands-on example of how it fails and in which situation, see this Bash snippet.

What the class does

  • The only responsibility of the Utils::IOSelector class is,

    Gimme all your streams and I’ll take care of them. Gonna call you back every time there is data coming in, no stalling, until the end (all streams EOF).

  • To stay with the above email provider analogy, IOSelector could be the email client which downloads all the email to your local storage and deletes it upstream; your quota will be fine while at the same time, it’s still up to you which of the mailboxes you’re gonna ignore.

  • Utils::IOSelector is designed to be general-purpose; for now, the class is intended to be used in two places: one, two. (This PR only handles number one in order to keep PRs smaller.)

  • A quick annotated rundown of the public methods:

    • The constructor accepts a Hash of streams, e. g.
      { stdout: $stdout; stderr: $stderr }.
    • Because the existing code in Homebrew accepts custom separators, the new code does the same.
    • The key method of the class is #each_line_nonblock, which is the entry point of the “gonna call you back” part.
      You call each_line_nonblock with a block that accepts |tag, line|; the method will then call the block whenever there is data available, e. g. (:stderr, "Error: Hostname not found").
      The method is named #each_line_nonblock because of IO#readline_nonblock, a monkey-patch we already have in our code base.
    • There is also a static short-hand for #each_line_nonblock called ::each_line_from. It is just my personal preference to make client code more concise.
    • The #pending_streams method is just internal bookkeeping; its purpose is to know which streams it can safely ignore.
    • Lastly, there are convenience methods #all_streams, #all_tags, and #tag_of(stream) which are used internally a lot; I didn’t really implement those because the class delegates to Hash.

Usage

As an example to use the class, you do:

selector = IOSelector.new(stdout: $stdout, stderr: $stderr)

and then:

selector.each_line_nonblock do |tag, line|
  case tag
    when :stdout then # `line` comes from standard output
    else              # `line` comes from standard error
  end
end

tl;dr

What the class does is,

Gimme all your streams and I’ll take care of them. Gonna call you back every time there is data coming in, no stalling, until the end (all streams EOF).

It’s intended to be used in two places: 1, 2.

@claui
Copy link
Contributor Author

claui commented Jun 10, 2017

At the moment it's a bit opaque to me.

@MikeMcQuaid I appreciate this info a lot – it’s a sign that there’s still wiggle room to make the code simpler and more expressive. Feedback welcome!

end

def select_streams
IO.select(pending_streams)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is short enough to be in-lined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. While I technically agree, I beg to differ from an engineering point of view, even more so as we’re dealing with Ruby code. As Avdi Grimm puts it in Confident Ruby,

Writing maintainable code is all about focusing our mental energy on one task at a time, and equally importantly, on one level of abstraction at a time.

That maintainability thing is why I almost never inline such methods in Ruby.

Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that makes sense. In that case, I'd name this readable_streams and close_all close_streams.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Updated.

end

def close_all
all_streams.each(&:close_read)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be in-lined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above ☺️

@pending_streams ||= all_streams.dup
end

def initialize(streams = {}, separator = $/)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this up to be the first method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, thanks. Done ☺️

private

def each_line_nonblock_enum
Enumerator.new do |yielder|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for using an Enumerator here, instead of simply calling this directly in each_line_nonblock above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is in fact one of my favorite concepts in the Ruby standard library! 😊

The benefit here is that we can invoke each_line_nonblock without a block – as a second form of calling the method, so to speak.

Not having a block causes b == nil, so we’re basically calling #each(nil) on the enumerator, which (by contract) means that the enumerator returns itself.

So the wrapper gives us a second calling form for free:

  1. each_line_nonblock { |tag, line| } → obj

  2. each_line_nonblock → Enumerator

This, in turn, allows client code to do useful and concise things like e. g.:

selector.each_line_nonblock.map { |_, line| line }.join("")

instead of simply calling this directly […]?

Inlining this would mean we’d have to go with do […] end.each, which would go against our style guide, and Rubocop would certainly want to have a word with us. 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all said, I have no strong opinion on doing away with the second calling form altogether. If we feel it’s unneeded, I’m willing to remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't think about calling it without a block, since the only current use for this is with a block. This is neat, but I'm not sure if there will be a use for this without a block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if there will be a use for this without a block.

I’m not sure either. Removed, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have also removed it from the specs now.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to review this more thoroughly but leaving a request changes just so it's not merged before I do.

@claui
Copy link
Contributor Author

claui commented Jun 11, 2017

@reitermarkus @MikeMcQuaid Thanks for committing and taking your time to review. There’s no hurry – so the more eyeballs, the better ☺️

@reitermarkus
Copy link
Member

One more nit: Change b to block everywhere.

@claui
Copy link
Contributor Author

claui commented Jun 12, 2017

… and done.

I have also changed $/ to $RS as per the new Rubocop rules.


::Utils::IOSelector
.each_line_from(stdout: raw_stdout, stderr: raw_stderr, &b)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this (and the method parameter) be &block too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I figure that should be &block too; however that should be left to another PR.

I have left &b untouched because the corresponding method signature does not really relate to this issue. I’d prefer to keep commits focused, and diffs too ☺️

@apjanke
Copy link
Contributor

apjanke commented Jun 12, 2017

Clarification question: the old SystemCommand#each_line_from in cask/lib/hbc/system_command.rb which is deleted in this PR worked correctly without deadlock, and this change just factors it out to make it usable by the curl_output invocation, right?

@apjanke
Copy link
Contributor

apjanke commented Jun 12, 2017

This is complicated enough that maybe some of the explanation here should go in comments on the new IOSelector code. I don't think the deadlock/buffer issue would occur to me just from reading the source code.

@apjanke
Copy link
Contributor

apjanke commented Jun 12, 2017

Oh - I think there may be a bug where this could return spurious line breaks when the buffer actually fills. Let me do a closer read on this before merging...

@reitermarkus
Copy link
Member

I have also changed $/ to $RS as per the new Rubocop rules.

I think $RS is still kinda hard to read, so I'd use $INPUT_RECORD_SEPARATOR.

@apjanke
Copy link
Contributor

apjanke commented Jun 13, 2017

Yeah, I think there's a minor bug here, having to do with the design of readline_nonblock. It wasn't introduced by this PR, but now's a good time to fix it, especially if we're expanding its use.

  def readline_nonblock(sep = $INPUT_RECORD_SEPARATOR)
    line = ""
    buffer = ""

    loop do
      break if buffer == sep
      read_nonblock(1, buffer)
      line.concat(buffer)
    end

    line
  rescue IO::WaitReadable, EOFError => e
    raise e if line.empty?
    line
  end

So, readline_nonblock always returns a string, or raises an IO:WaitReadable or EOFError. But sometimes it may return a string containing a partial line: if some bytes have been read from input (appended to line) in this call, and then it encounters a wait condition (i.e., the read would block), it returns the partial line that's been read so far. (It doesn't re-raise the WaitReadable because line.empty? is false.)

There's nothing in the return value to indicate whether the string was returned due to and end-of-line or a wait. (The trailing record separator can't be used reliably, because it might not exist in the last line of the file, so it can't distinguish EOF from a wait.) So the calling code sees each returned value as a full line. If the blocking IO occurs in the middle of a line, the caller will end up incorrectly processing it as two or more lines. This doesn't matter if all it's doing is just re-echoing the output. But if it's doing something to each line (like prefixing them, or counting lines, or parsing each line), then it could affect correctness.

To be correct, I think that readline_nonblock needs to return two values – the buffered input bytes, and an indicator of whether it's a finished line versus a partial buffer due to a wait – and the calling code that's running the select loop (each_readable_stream_until_eof, in this case) needs to maintain its own per-stream buffers to accumulate partial lines, only yielding a line when readline_nonblock indicates it's really hit the end of a line.

It looks like you're doing a similar thing with the "chunked buffers" in #2466. I just think a similar approach needs to be applied to the text-reading functions, too. (Not slurping the whole output streams there; just enough to handle partial line reads.)

Also, this won't work on DOS mode text files or other files with multi-byte record separators, since readline_nonblock only buffers and compares a single byte at a time to the record separator. Just in case we care about universality. Maybe could test line.ends_with? sep instead.

Do you folks see the same issue? Or am I misreading this?

require "extend/io"

module Utils
class IOSelector < DelegateClass(Hash)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe IOMultiplexer would be a clearer name? Easier to Google for, and this class does more than just the selection; it actually performs the I/O, too, to some extent. I dunno.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[This] class does more than just the selection; it actually performs the I/O, too, to some extent.

Very good point. 👍

The class was initially meant to be no more than a wrapper for IO::select. One of the class-level comments I just have added says:

The class IOSelector is a wrapper for IO::select with the added benefit that it spans the streams' lifetimes.

Apart from IO::select, the I/O that happens would be eof? and close_read. My idea of the class was that the client remains fully responsible for reading. (Otherwise we could have simply used Open3::capture3.)

Somehow, in #2466, I allowed a convenience method #binread_nonblock to creep in, which does include reading, just to spare the clients from writing a few more lines. Maybe that was a bad idea; It’s definitely beyond what a selector should do.

It feels like we’re having a hard time to find a good name for the class. It also doesn’t seem immediately obvious what the class does. (@MikeMcQuaid has mentioned it a few days ago.) This raises a few red flags for me. I sense a design smell here, not a naming issue. The class wants to do too much.

I figure it would help if I removed all the non-essential I/O code (i. e. the #binread_nonblock convenience method and the close_streams private method), and let the client handle that for now. That would leave the IOSelector class responsible for IO::select and eof?, and nothing else.

Thoughts?

@separator = separator
end

def each_line_nonblock
Copy link
Contributor

@apjanke apjanke Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is each_line_nonblock the right name for this method? It calls readline_nonblock, which is nonblocking, but each_readable_stream_until_eof is calling IO::select, which may block if there is no input on any stream, so this method may block. Maybe each_line_each_stream or each_line_all_streams or each_line_dont_deadlock?


private

def each_readable_stream_until_eof(&block)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe read_io_until_eofs or multiplex_streams_until_eof or similar would be a better name? To me, in Ruby, a name each_X suggests iterating over the Xs, processing each once in turn. This method doesn't iterate over the streams; it multiplexes over them in an arbitrary order determined by external state, with possible repeats. (The readable_streams.each does iterate over some streams, but the streams in each call are an arbitrary sublist of the streams passed in to the IOSelector object, and doesn't correspond to the behavior visible to the caller.)

@apjanke
Copy link
Contributor

apjanke commented Jun 13, 2017

My few comments aside, based on your description and references, this makes sense to me, both in terms of how the deadlock happens and how read_nonblock or IOSelector fixes it.

The upshot of this particular PR is that it behaves the same as the current code (for reading text), but it's factored in to smaller and more generic pieces so that binary-reading/non-line-oriented support can be added in #2466, right?

@claui
Copy link
Contributor Author

claui commented Jun 13, 2017

I think $RS is still kinda hard to read, so I'd use $INPUT_RECORD_SEPARATOR.

@reitermarkus We already use $INPUT_RECORD_SEPARATOR in other places so your nit makes sense. Changed.

This commit sets the stage for an upcoming fix for #2466.
In a nutshell, this is what it does:

- In `cask/system_command.rb`, factor out existing code (“IO
  selection”) that we’re going to need later in order to fix
  PR #2466;

- move said code into its own class `Utils::IOSelector`;

- split the class into smaller methods to make the code more
  expressive than before; and

- add unit tests for `Utils::IOSelector` (they’re a bit bloated
  because edge cases.)
@claui
Copy link
Contributor Author

claui commented Jun 13, 2017

Clarification question: the old SystemCommand#each_line_from in cask/lib/hbc/system_command.rb which is deleted in this PR worked correctly without deadlock, and this change just factors it out to make it usable by the curl_output invocation, right?

@apjanke The old code seems to work correctly in all but a few edge cases, which I have never heard of or seen in the wild.

The new class now visits each readable stream (not just one) between subsequent IO::select invocations. The old SystemCommand#each_line_from always visits only one, which could starve out the other streams in rare cases.

Other than that, yes, this PR simply factors out the existing code to make it reusable for #2466.

This is complicated enough that maybe some of the explanation here should go in comments on the new IOSelector code. I don't think the deadlock/buffer issue would occur to me just from reading the source code.

Good idea. I have added a class-level comment with a short explanation and a link to #2466.

@claui
Copy link
Contributor Author

claui commented Jun 13, 2017

Yeah, I think there's a minor bug here, having to do with the design of readline_nonblock. It wasn't introduced by this PR, but now's a good time to fix it, especially if we're expanding its use.
[…]
Do you folks see the same issue? Or am I misreading this?

@apjanke You’re correct. This is an existing bug in our (monkey-patched) IO#readline_nonblock.

As a real-life example, io_selector_spec.rb actually includes a few workarounds for this in several places; for example, while setting up the test fixture to simulate a pair of co-dependent streams, I needed to do:

wait(1).for { queue.pop }.to end_with("\n")

(Note: per RSpec::Wait’s contract, this causes the block to be called repeatedly until the #end_with expectation is fulfilled.)

Without the bug, the following code would have been sufficient:

wait(1).for { queue.pop }

I’d rather not have a fix for #readline_nonblock included in this PR at hand as it’s actually a whole different issue, and given that it doesn’t really affect Homebrew (yet).

In order to still get it fixed (in time before someone starts relying on #readline_nonblock’s promise), I’ll file a separate PR right after #2466 is merged.

@claui
Copy link
Contributor Author

claui commented Jun 13, 2017

The upshot of this particular PR is that it behaves the same as the current code (for reading text), but it's factored in to smaller and more generic pieces so that binary-reading/non-line-oriented support can be added in #2466, right?

@apjanke That’s exactly what I had in mind. Thank you for this summary – and for your other suggestions. They have helped a lot!

@apjanke
Copy link
Contributor

apjanke commented Jun 13, 2017

Great! Glad to be helpful; it's been a while since I've had a Homebrew PR that was really up my alley like this.

@apjanke
Copy link
Contributor

apjanke commented Jun 13, 2017

I’d rather not have a fix for #readline_nonblock included in this PR at hand as it’s actually a whole different issue, and given that it doesn’t really affect Homebrew (yet).

In order to still get it fixed (in time before someone starts relying on #readline_nonblock’s promise), I’ll file a separate PR right after #2466 is merged.

Fine by me.

Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few inline comments but some general comments to follow.

@@ -0,0 +1,79 @@
require "delegate"
require "English"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be required; it's part of global now. It may not be needed in the spec_helper either.

module Utils
#
# The class `IOSelector` is a wrapper for `IO::select` with the
# added benefit that it spans the streams' lifetimes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand why this is a benefit, can you elaborate.

# The class `IOSelector` is a wrapper for `IO::select` with the
# added benefit that it spans the streams' lifetimes.
#
# The class accepts multiple IOs which must be open for reading.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean they must be already open? When would they not be open?

# added benefit that it spans the streams' lifetimes.
#
# The class accepts multiple IOs which must be open for reading.
# It then notifies the client as data becomes available
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the client? What is the notification?


alias all_streams keys
alias all_tags values
alias tag_of fetch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the aliases needed?

alias tag_of fetch

def self.each_line_from(streams = {},
separator = $INPUT_RECORD_SEPARATOR, &block)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indent this further in; this looks currently like it's a line in the function.

alias all_tags values
alias tag_of fetch

def self.each_line_from(streams = {},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious what the streams input hash format should be.

end

def readable_streams
IO.select(pending_streams)[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd favour .first here

@MikeMcQuaid
Copy link
Member

Thanks for this @claui, I like where this is going. Some general comments:

  • both the current use-cases for this involve reading from Open3.popen3. Perhaps it would be worth making a higher-level abstraction around popen3 that runs the process rather than a separate method that's currently more reusable but not used in different situations
  • similarly, as a general comment: I think the desire for this to be generic and reusable harms the readability. I'd be in favour of limiting this pretty strictly to the current use-cases and it can be extended if needed in future. YAGNI may apply here.
  • I realise it's generally idiomatic Ruby but I find the definition of multiple functions that are only called in a single location to make the code much harder to follow what's going on. I'd advise unifying everything that's called < 2 times and then we can figure out how to split out if needed later on or just use e.g. variables to indicate intermediate state

@stale stale bot added the stale No recent activity label Jul 7, 2017
@stale
Copy link

stale bot commented Jul 7, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@claui
Copy link
Contributor Author

claui commented Jul 14, 2017

@probot-stale[bot] Bear with me for another couple of days, little robot.

@stale stale bot removed the stale No recent activity label Jul 14, 2017
@MikeMcQuaid
Copy link
Member

@claui How's this looking?

@claui
Copy link
Contributor Author

claui commented Aug 6, 2017

@MikeMcQuaid Thanks for the feedback!

both the current use-cases for this involve reading from Open3.popen3. Perhaps it would be worth making a higher-level abstraction around popen3 that runs the process rather than a separate method that's currently more reusable but not used in different situations

I like this idea a lot. I’m in the middle of moving the whole thing one abstraction level up and see what happens.

It often amazes me how in coding, we tend to uncover proper abstractions along the way, not up front. A bit like a story that unfolds in a detective novel.

similarly, as a general comment: I think the desire for this to be generic and reusable harms the readability. I'd be in favour of limiting this pretty strictly to the current use-cases and it can be extended if needed in future. YAGNI may apply here.

Actually, I’m an avid YAGNI supporter. Thanks for pointing out cases where unneeded abstractions have crept in; I fully agree, and will gladly remove those.

I realise it's generally idiomatic Ruby but I find the definition of multiple functions that are only called in a single location to make the code much harder to follow what's going on. I'd advise unifying everything that's called < 2 times and then we can figure out how to split out if needed later on or just use e.g. variables to indicate intermediate state

I realise it's generally idiomatic Ruby but I find the definition of multiple functions that are only called in a single location to make the code much harder to follow what's going on.

I believe this is somewhat of a trade-off. Even if we completely leave aside the “idiomatic Ruby” argument: In my experience, code gets read much more often than it gets written. Unfortunately, most developers also seem to prefer writing code over reading. This is why I consider myself a readability evangelist.

In that specific case you have mentioned, I don’t see yet how separating code into functions, if named properly, could hurt readability. Would you mind giving an example? I. e., could you please name a specific example where inlining code would make it easier to read?

I'd advise unifying everything that's called < 2 times and then we can figure out how to split out if needed later

Just to make sure we’re on the same line here, I assume that by unifying, you mean inlining all functions which are only used in one place.

I feel you have a good point, and I’ll be happy to inline the methods as requested; that said, I’d also like you to know the reasons why I tend to avoid inlining.

When reading or writing code, we consume abstractions all the time, e. g. when calling a standard library function, or even a language keyword. If we only looked at the “how often is this method called?” metric, we’d end up inlining things like system calls, too. This is why I believe there must be more to it. In other words, I feel we should not focus on the single question of “are we using this just once, or more than once?”.

There probably is no definitive answer; therefore, I’ll leave it at that for now. I think I’m going to finish that change in abstraction/inlining soon.

@MikeMcQuaid
Copy link
Member

In that specific case you have mentioned, I don’t see yet how separating code into functions, if named properly, could hurt readability. Would you mind giving an example? I. e., could you please name a specific example where inlining code would make it easier to read?

My example would be each_readable_stream_until_eof. It's only used once and to understand what each_line_nonblock is doing I immediately need to scan down the file to find the each_readable_stream_until_eof method. Then I hit readable_streams and need to see what it's doing and jump between three functions rather than one. I general I think reading from the top of a given function to the bottom without having to jump around a file makes it easier to follow what's going on and why.

Just to make sure we’re on the same line here, I assume that by unifying, you mean inlining all functions which are only used in one place.

Yep!

In other words, I feel we should not focus on the single question of “are we using this just once, or more than once?”.

Yep, I do agree here. I guess that was my specific recommendation for this PR in particular as I'm finding it hard to follow what's going on due to having to jump between function calls which abstract behaviour in a way that's opaque without a method name a similar length to the function itself.

@apjanke
Copy link
Contributor

apjanke commented Aug 7, 2017

I'm normally in favor of encapsulating stuff in well-named functions for readability, but I agree with @MikeMcQuaid in this case in terms of readability. It feels like I'm having to keep a lot of abstractions in my head in this case here, and just seeing the inlined implementation would reduce cognitive load.

That's the perspective of me as a relative Ruby newbie, at any rate.

@MikeMcQuaid
Copy link
Member

@claui @apjanke This is a good reason on this topic: https://medium.com/@copyconstruct/small-functions-considered-harmful-91035d316c29

@MikeMcQuaid
Copy link
Member

Closing this as there's been no updates for over a month. We'll review a new PR with comments addressed.

@MikeMcQuaid MikeMcQuaid closed this Sep 5, 2017
@Homebrew Homebrew locked and limited conversation to collaborators May 4, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants