-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement wrapping functions as iterators #257
Conversation
Oh no, performance has regressed for There are three places where we allocate vectors that could be replaced with iterators after this change: in |
Hey @Koxiaet, thanks for putting this up. It'll take a bit to digest it with Christmas coming up 😄 i'll try and test it and drop some comments over the next few days. |
When i was playing with this in the past, I remember that the trade-off between using vectors and iterators wasn't always clear. Sometimes allocating a few vectors and iterating over them is faster than doing everything with iterators. I'm not 100% sure how it works, but I guess it's a matter of having more complex code vs using more memory upfront.
It would be interesting to see the impact of this on the performance. But please separate this into other commits so that you can easily move it to a separate PR. |
WrapOptimalFit { | ||
fragments: fragments | ||
.into_iter() | ||
.enumerate() | ||
.rev() | ||
.map(|(i, data)| { | ||
let eol = i + 1 == line_start; | ||
if eol { | ||
line_start = minima[line_start].0; | ||
} | ||
(data.fragment, eol) | ||
}) | ||
.collect(), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the WrapOptimalFit
struct needed here? It seems to simply contain a list of tuples and I guess another .map
here could make it unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key thing is that WrapOptimalFit::next
calls .pop()
on the underlying vector, which means that it reverses the order of iteration (which is essential for the algorithm to work). To still support getting the vector directly with Vec::reverse
like we had before WrapOptimalFit
also has a .into_vec()
method.
pub fn wrap_optimal_fit<F, W>( | ||
fragments: F, | ||
line_widths: W, | ||
) -> WrapOptimalFit<<F as IntoIterator>::Item> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to let the functions return impl Iterator
instead of concrete structs? That should slim down the API, just like I did in #201.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both of the iterators I have a .terminate_eol()
function, which I think is very useful - I use both (terminating and non-terminating) variants in different parts of the library. It could be added as an extra parameter to the wrapping functions of course, but it makes it harder to call the functions and it is not obvious what the true
in something like wrap_optimal_fit(fragments, widths, true)
would mean - on the other hand wrap_optimal_fit(fragments, widths).terminate_eol()
is very clear.
Also, WrapOptimalFit::into_vec
is useful for efficiency, and it couldn't exist with impl Iterator
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet figured out what this bool
is for 🙈 but I haven't look super careful at the code yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked more at the code and I think I see it now: instead of yielding full lines, your iterators yield once per Fragment
. The bool
here tells the caller if the Fragment
you just saw is at the end of a line.
This sounds expensive to me since the caller needs to "touch" every fragment. The existing code allows the caller in wrap_first_fit
to simply deal with a full line at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also,
WrapOptimalFit::into_vec
is useful for efficiency, and it couldn't exist withimpl Iterator
.
Would it be possible to implement Into<Vec>
instead?
Do you have such a use case? While it would be technically cool to support |
No, I do not have a use case, and I agree we shouldn't bother actually supporting it until someone asks. But designing it this way would avoid breaking changes should someone ask. Also I still like iterators over |
It's worth mentioning that the previous implementation was technically incorrect; if the fragments' widths' had been nearing the integer limit, it would have had all sorts of problems with integer overflow. On the other hand, this version only uses checked arithmetic which does degrade performance but allows it to be more correct. I benchmarked this version using the (incorrect) unchecked arithmetic and performance increased by up to 10%. |
Yeah, I'm fighting with those overflow problems in #247... My thinking is that Allowing the input string to come from an iterator has been discussed in #224 — not for the purpose of wrapping strings larger than |
Yes, good point. I think we agree that a fully iterator-based design is the most flexible and powerful design. My "plan" was roughly to use iterators for all the helper functions and then let It's of course always a question if the flexible solution is reasonable in terms of complexity and performance. The code already got slower when I introduced the
Yeah, fair enough. |
I looked at the flamegraph to try and find what was causing the slow down. Here is the flamegraph on the old code: And here is the flamegraph on the new code:
But the time spent on |
@mgeisler Please can you finish reviewing this? It's starting to conflict. |
Good morning! Yes, I've also had this on my mind the last few days. The conflicts should be trivial: I just moved the You mentioned integer limits above and I've put up #259 to handle this. It's not merged yet since I haven't heard back from @robinkrahl if this is approach is really better than a simple |
I'm not sure I understand this? The Today the input and output types of the helper functions look like this (where I wrote
The helpers stay the say, but the let line: &str = ...;
let words: Iterator<Word> = find_words(text);
let split_words: Iterator<Word> = split_words(words);
let broken_words: Vec<Word> = break_words(split_words);
let wrapped_lines: Vec<&[Word]> = wrap_first_fit(&broken_words);
let cow_lines: Vec<Cow<str>> = ... // loop over each wrapped line and slice `line` With your changes, the work is shifted a bit since let line: &str = ...;
let words: Iterator<Word> = find_words(text);
let split_words: Iterator<Word> = split_words(words);
let broken_words: Vec<Word> = break_words(split_words);
let wrapped_words: Iterator<(Word, bool)> = wrap_first_fit(&broken_words);
let cow_lines: Vec<Cow<str>> = ... // loop over each Word, check if eol is set and slice `line` The As a whole, I'm concerned about the 50% slowdown for the otherwise fast // This currently isn't as efficient as it could be as we collect into a vec even when it
// isn't necessary.
let wrapped_words = match options.wrap_algorithm {
core::WrapAlgorithm::OptimalFit => core::wrap_optimal_fit(broken_words, line_lengths)
.terminate_eol()
.into_vec(),
core::WrapAlgorithm::FirstFit => core::wrap_first_fit(broken_words, line_lengths)
.terminate_eol()
.collect(),
}; As you mention, this vector is much bigger than the old per-line vector. This points to a problem I ran into with iterators: it's super annoying to match up the types between different iterators. Whereas one vector of let wrapped_words = match options.wrap_algorithm {
core::WrapAlgorithm::OptimalFit => core::wrap_optimal_fit(broken_words, line_lengths),
core::WrapAlgorithm::FirstFit => core::wrap_first_fit(broken_words, line_lengths),
}; since the two functions will return different iterators — but we can easily assign the results to the same variable when we collect into a vector. So if |
You will end up collecting at least once, but at the moment there are three more collects than necessary -
I wouldn't consider my
This is what I was referring to when I was talking about the collects earlier; I also suspect that.
Yes, agreed. I have a plan to make an enumeration type over |
Yes, and
I generated the above with
So that's indeed a nice win 👍 |
I see that it's roughly doing the same since the tests pass 😄 However, it's still somehow ~30-35% slower than before:
So this means this PR is offering more code 👎, more public API surface 👎, more future flexibility 👍, potential future savings 👍, and an immediate performance regression 👎. The balance of this seems to be potential future gains and we pay for this with more complexity now. I think the way forward is to simplify this change as much as absolutely possible. Rip out anything that isn't essential — send PRs with the uncontroversial parts such as your nice explanation of the cost matrix. You also added a TODO about making I cannot expect you to know the history of the library, but as I've mentioned before, the library changed a lot between version 0.12 and 0.13. I hated the way the code had grown to be in version 0.12.1: it was messy, overly complex, and inflexible. On the other hand, I'm so far very happy with the cleanups I've done in the last few months. So this is why I'm pushing back on what seems to be "regressions" compared to these PRs:
|
So it seems much of the conflict here is over whether to use First of all, an argument very similar to the ones I used in the comments above - lines of code != cost. Complexity is cost. And although manual iterator types introduce many more lines of code, its complexity is identical to Second, being able to name types is useful. Yes, this limitation won't exist in a few years once Third, precedent. Once again I will point to the standard library, which never uses Fourth, it allows us to implement traits conditionally. A trait impl like "only implement |
I think the main issue is that this change enables potential future flexibility and that the cost for this is paid now. The performance regression for I understand that this is a stepping stone towards a more streaming API. In the limit, this could allow wrapping text from an
This makes me say that we need some real use case to pay the price of the performance hit. Ideally, the change should be a performance improvement — which would justify what I see as downsides in terms of code organization and more types in the public API. I had a streaming API in the past — and crates like clap would simply wrap a
I tried to point to the old code above to show how it was much more annoying to work with — the artificial split between the function returning the iterator, the struct itself, and the
Is this not simply because the code was written long before |
I think that's the problem here - we have different definitions of "simple" and "complex". I want this API, because I consider iterators to be simpler than vectors, even if they have more lines of code. I believe that APIs, in order to be maximally simple, should do one thing and one thing only. An API that returns a sequence of values shouldn't do other things like allocate, it shouldn't prescribe a collection type for the user, it shouldn't require that slices of fragments are what the user needs to pass in - these are all out of scope and over-complicate the API. Iterators on the other hand, while they do need more code to implement, are theoretically simpler because they don't do anything other than their singular purpose. So if there's a slight performance loss or there are more types in the public API, so be it. To me it's far better to have a simple API than a fast one or a small one (unless it's really slow of course 😄). This situation is an interesting clash between the two philosophies of "do one thing and do it well" and using as few lines as possible 😆. I can understand why you'd subscribe to the latter, I just find it a bit reductionist.
Hmm, this is somewhat of a problem. Maybe more aggressive feature-flagging is the way to go? I see you're already doing that a bit with the other PRs in this repo.
Is having them next to each other in the source not enough? I agree it's not quite as together as if it was all in one function, but it's not that much harder to navigate.
No, the standard library still prefers named types. For example, the recently added |
.map(|(i, data)| { | ||
let eol = i + 1 == line_start; | ||
if eol { | ||
line_start = minima[line_start].0; | ||
} | ||
(data.fragment, eol) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you noticed how this changes the complexity of this piece of code to O(number_of_fragments
)? Before it was O(number_of_lines
). Because wrap_optimal_fit
returns an iterator which yields every single Fragment
, this seems to be built into the design — which is unfortunate. It feels like a "tax" that we should be able to avoid.
Could wrap_optimal_fit
and wrap_first_fit
instead return an Iterator<Item=Line>
where Line
is itself something which encapsulates, well, a line of text. This would basically mean grouping the Fragment
s according to the EOL bool
you have, but the rest of the layers would no longer deal with individual Fragment
s but instead deal in full lines. Similar to how the upper layers deal in &[Fragment]
today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem with a Line
type is that the complexity within the function would still be O(number_of_fragments
) as each fragment would have to be copied into the line, and it would incur a fairly large allocation cost - each line would probably need a Vec
.
Almost all code that uses the returned iterator currently iterates over each fragment in the line, so the overall complexity would remain the same most of the time. This approach also avoids double-loops, which might be faster overall, but I'm not sure.
#[rustfmt::skip] | ||
let line_lengths = |i| if i == 0 { initial_width } else { subsequent_width }; | ||
// This currently isn't as efficient as it could be as we collect into a vec even when it | ||
// isn't necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it will be possible to reconcile this somehow so that both wrapping functions return the same type of iterator? If not, then it seems we lose the efficiency here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to later introduce an enum iterator that can be WrapFirstFit
or WrapOptimalFit
, which will gain back the efficiency hopefully. But I want to keep that in a separate PR.
Yes, I think you hit the nail on the head there 😄 I agree that an However, I still think there are a bunch of trade-offs here:
I get the theoretical purity of this — however, I find it hard to agree that all the Similarly, I'm looking for something concrete which is improved by switching things to using It would be cool to see a function which can take input in an incremental fashion — that's something which would would require extra refactoring to support today (though the divide-and-conquer approach in #259 seems like it could be generalized).
One could argue that wrapping text is the goal of the library 😄 Until very recently, this was all it could do: wrap text. There was an underlying assumptions that the caller would be able to hold both the input and the output in memory. This makes That being said, #224 makes a case for keeping state while wrapping — I mentioned this above somewhere. So yeah, I would like textwrap to play well with situations where the input isn't know up-front or where the output isn't consumed all at once.
Yeah, I started that recently as a reaction to that discussion. I think it's already pretty modular now... While I don't like the idea of including a huge library for a tiny function, I also don't like the idea that people turn off useful functionality in order to make binaries faster to build — that's what we have caches for. I'm used to Bazel at work which makes it feasible to have both a lot of dependencies plus reasonable fast builds due to extensive caching. I hope that textwrap can still carry its own weight by providing more correct and better wrapping that what people can hand-roll in an afternoon. The corner case bugs I've fixed over the years tells me that wrapping text isn't completely trivial. The second commit already fixed a small bug: 0e07230 😄 In any case, we should keep an eye on this aspect too, even if it's a bit annoying.
Yeah, it's of course not that much harder, but it just feels like a small step backwards to me. I think it's a matter of ensuring that things are self-contained. If you see
Okay, thanks, I was not aware of this at all 👍 |
To sum up my concerns:
|
The natural API for
I could definitely do this - in fact, this can even be done with the current code. You'd just have to replace the for loop with a
So I think the solution here is to break up the modules - while it's confusing to have it return a named iterator type that's lower down in the next lines that could be for anything else, it makes it much less confusing if it's all in
I should mention the original motivation for this PR. I'm writing some code for a UI library whose API looks roughly like this: struct Text { ... }
impl Text {
fn new(s: String) -> Self { ... }
}
impl Component for Text {
fn draw(&self, canvas: Canvas) { ... }
} The |
This is the first mergeable PR using the ideas from #244.
Motivation:
no_std
. The current implementation ofwrap_first_fit
does not require allocation at any stage, and so can be used inno_std
contexts if we support that in the future.wrap_first_fit
andwrap_optimal_fit
now do not require the fragments to be stored contiguously, and instead simply require any iterator over fragments. This is done without losing any performance (in theory at least). This allows piping the result of splitting functions directly into wrapping functions without collecting into a vector first.Implementation notes:
Fragment
for&F
so that passing a slice intowrap_first_fit
won't break (as iterating over a slice produces references).WrapFirstFit
andWrapOptimalFit
iterators iterate over(F, bool)
tuples where the bool indicates whether it's the last fragment of its line; from experience of porting this library's code and tests to the new system it is about equally easy to use.