Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SequenceReader<T>: Round 2 #69
SequenceReader<T>: Round 2 #69
Changes from all commits
8232b90
b4aa05d
ab6feeb
09bc6ec
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why force a defensive copy span allocation if its not required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also should there be a return flag to indicate if the
copyBuffer
was used?Otherwise there is ambiguity. Is there is a writable version (or copy) in the copyBuffer? Can the copyBuffer be reused prior to consuming the
ReadOnlySpan
(i.e. is it part of the ReadOnlySpan or was it unused).Does it throw an exception if the
copyBuffer
is too small (assume so); but what if the copy buffer is not needed because the data is a single span, but the copy buffer is too small; does that still throw an exception?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To resolve the ambiguity; especially around exceptions, maybe it should be:
Then the pattern would be something like:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems really rare that you'd care, though, right? Presumably you'd just assume you can't use the supplied scratch buffer again until you're done with the returned span. And in the rare case where you actually cared, you could use Span.Overlaps to determine whether the returned span is part of the scratch buffer or not.
That said, I agree in general that this pattern is confusing, and it'd be better not to force the caller to allocate until they know they have to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem is if you want to modify the returned data (mask/unmask etc) then with
You need one scratch buffer for the call (which may or may not be used); then a second scratch buffer to copy the returned span into; whereas it would be easier just to jump straight to
.CopyTo
For just reading; then the scratch buffer is unnecessary for the fast-path; but you need a
.Length
sized buffer allocated just in case and if the wholeReadOnlySequence
is being read then the scratch buffer is unlikely to be used for more than a single read (e.g. returned to pool prior to awaiting for more data).Also question on what happens if
.Length
is greater than 2Bn elements.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Branchless made it worse:
Not terribly surprising- ripping the span out being costly is the only reason
SequenceReader
is ref after all. :) Flipping to return a span made it slightly better, but still slower thanTryAsSpan
and is pretty awkard imho:I'm not sure what you're asking here? Having the
SliceOrCopy
methods period is due to the performance improvement in the normal single segment case. Having one without a delegate is again about performance- as crossing segments is so unlikely,new T[]
is a good-enough solution for many. Given that it is expected to be common and is faster it makes sense to have an "even-faster" overload for "default" behavior.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the results I get copying and pasting the benchmarks you shared:
I ran it multiple times, the results always come out relatively the same, with the TwoStage_Array being faster than SliceOrCopy_Array. Note that that's with:
since it wasn't clear to me from what you wrote what _arraySequence was. If instead change it to be an instance, the versions end up being almost the same perf-wise for me:
We're talking nanosecond differences here, in either direction. Maybe it's just minor differences on the machine. Maybe it's the processor involved. Maybe it's something else. But let's be very careful not to add lots of APIs because we found a microbenchmark showed an improvement on one machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, why have these helpers at all? They're one-liners, using public APIs that anyone can use. When I run the same test with:
for example, I get:
So, if we're going for utmost in throughput, worried about developers concerned about this level of microoptimization, and if these numbers are to be believed, then why have the method at all? Let the developer use the ReadOnlySequence APIs however they need, rather than trying to come up with the various "helper" APIs on top of them that are all effectively one-liners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the complete code:
https://gist.github.com/JeremyKuhne/2eb803b80058fa8b108c81352bc31ba0
Ditto:
Well, sure, let's not be blind. But lets also not throw this out because some machines don't get a boost. For the record, my run is on a 6 core i8700K with a 2.1 .NET Core console app and the current BenchmarkDotNet (CTRL-F5ed of course) on Windows.
They are simple because these are meant to validate the overhead involved. The whole point of the Allocator is to allow plugging in whatever you want- which may be significantly more complicated. The ArrayPool tracker I mentioned above is one such example. I plan to move onto
The simple
SliceOrCopy_Array
with no arguments as it makes writing the typical case easy. It also helps discoverability of the fact that.ToArray()
is potentially inefficient. I wouldn't toss it simply because someone can easily rewrite the code.For the record, here is what I get plugging in your
OpenCoded
:Not as much as an improvement for me.
Having these methods does not preclude you from writing your own logic. Having them, however, allows easy piping of the Allocator approach (e.g. I can take one as input to
SequenceReader
), facilitates reusability of solutions, and makes things significantly faster on some machines. All of that makes it pretty compelling to me.If you could try to run again with the code in the gist and share your configuration perhaps we might find out what it is that makes your run slower than mine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And is slower on some machines apparently, and clutters the surface area (which can actually make it harder for a consumer of the API to do the "right thing'"), and adds surface area that may encourage developers to allocate when they don't actually need to, and a developer who's really concerned about eeking out the best possible throughout arguably won't use them, anyway, and for everyone else a nanosecond difference doesn't matter, and they're barely shorter than just writing out the ternary manually. As for "piping the allocator", that could still be used by ToArray if it's desirable, so that doesn't seem to actually change anything.
I will try the shared code again this weekend on a few machines. But at the moment I'm not seeing the benefit here (for ToArray, yes, for SliceOrCopy, no). I would rather we teach developers about the few core APIs that are there and how to best use thm efficiently, saving additional APIs we add for when they're truly meaningful (like the general reader APIs you've been working on).