Return EOF and ErrUnexpectedEOF correctly #64

MarcoPolo · 2022-03-10T00:02:00Z

Context

Fixes #62. See that issue for context.

Proposed solution

At first I considered trying to find all the spots after our first read and return ErrUnexpectedEOF, but that was error prone.

Instead I created a new wrapper type that implements io.Reader but keeps context on whether this is the first read or not. If it is the first read it returns what the inner reader returns. Otherwise, if it isn't the first read and the inner reader returns EOF, this reader knows that we ran into an EOF in the context of expecting to read more data from this inner reader, so the wrapper reader returns ErrUnexpectedEOF instead of EOF.

This does add overhead, so I'm curious to see if we notice it.

r? @Stebalien

Stebalien · 2022-03-10T00:39:33Z

I pushed a small fix to make link scanning a bit faster, but it's still pretty terrible.

Also, I think we need to update "has read once" in all the "read a byte" methods, which will make performance worse.

You can test performance with go test -bench . in the testing directory.

MarcoPolo · 2022-03-10T01:35:51Z

The ScanForLinks method is pretty small so we could avoid allocation there like in my latest commit.

Stebalien · 2022-03-10T02:14:14Z

Yeah, the performance seems reasonable now. It's slightly slower, but not by much.

Also regenerate tests, that's how I found this.

Stebalien · 2022-03-10T02:21:07Z

Hm. Actually, it's about 2x slower on decode and has 3x the number of allocations. I needed to regenerate the tests to find that.

Stebalien · 2022-03-10T02:41:43Z

So, I've done what I can, but it's still much slower. Remaining things are:

GetPeeker has lost some of it's magic because it can't see through the new reader type.
I'm not sure where the new unmarshal allocations are coming from.

I recommend testing with:

go test -bench . -benchmem -memprofile=mem.out -cpuprofile=cpu.out

Then look at the allocation counts and cpu profile with pprof.

MarcoPolo · 2022-03-10T04:13:56Z

oh I see my mistake, I didn't realize I needed to regen the _gen.go files

MarcoPolo · 2022-03-10T05:01:28Z

https://gist.github.com/MarcoPolo/b72fdd0e43e04d0721a5056b4f579190

Now we're at fewer allocs than before after the GetPeeker change, but still 8 more than before.

MarcoPolo · 2022-03-10T05:32:41Z

ah I see what's happening.

We pass in the reader from GetPeeker to the nested calls to UnmarshalCbor. And that isn't a ReaderWithEOFContext so we allocate a new one. (Also I think my last commit subtlety breaks this since it essentially unwraps the Reader from ReaderWithEOFContext and dropping the context).

I wonder if doing a defer at the top level would also solve this without requiring us to wrap the reader. I'll try this tomorrow.

Thanks for the tips :)

MarcoPolo · 2022-03-10T06:24:07Z

Alright I'm using defer now to remove the extra allocation from the wrapped reader. My understanding of simpler defer func statements is that they are inlined by the compiler (please correct me if I'm wrong).

I updated the benchmark results here: https://gist.github.com/MarcoPolo/b72fdd0e43e04d0721a5056b4f579190

So no extra allocation compared to master, but a smidge slower.

Also I think this needs a more careful review since it now relies on us doing this at the first read in Unmarshal/scanforlinks.

Stebalien

This actually seems quite a bit cleaner.

gen.go

Stebalien · 2022-03-10T16:27:04Z

Also fixed Deferred to handle EOF.

From my benchmarks, any perf difference is noise. But we should get a third-party review from someone not involved.

MarcoPolo · 2022-03-10T16:28:26Z

Let me add a test case here too

MarcoPolo · 2022-03-10T16:57:49Z

(test currently fails, I'll investigate in a bit)

Stebalien · 2022-03-10T18:13:25Z

Ah... we're doing error wrapping internally.

MarcoPolo · 2022-03-10T18:39:53Z

yup, I switched to using errors.Is. Which does make me a bit uneasy since it needs to traverse the error chain to see if an error matches.

The benchmarks don't show much though: https://gist.github.com/MarcoPolo/b72fdd0e43e04d0721a5056b4f579190

Stebalien · 2022-03-10T19:02:48Z

gen.go

-		if err == io.EOF {
-			err = io.ErrUnexpectedEOF
+		if errors.Is(err, io.EOF) {
+			err = xerrors.Errorf("%w: %v", io.ErrUnexpectedEOF, err)


This doesn't quite work. %w needs to be last.

It's probably fine to just leave it as-is? The only thing we really care about is not returning an EOF if we've read bytes.

Does it? https://go.dev/play/p/k2wOyoCk4WO makes it seem like both work.

I figured we cared enough to wrap the original error, so might be nice to keep that info.

Ah, being last only matters for Unwrap. (docs: https://pkg.go.dev/golang.org/x/xerrors#Errorf)

I figured we cared enough to wrap the original error, so might be nice to keep that info.

My point was: that we could just leave everything as-is. I.e., we don't want an EOF unless we read no data, but we're fine with any other error (including a wrapped EOF) otherwise.

But if this works and doesn't hurt performance too much, then it's fine by me.

Ah, being last only matters for Unwrap. (docs: https://pkg.go.dev/golang.org/x/xerrors#Errorf)

From that link

If the format specifier includes a %w verb with an error operand in a position other than at the end, the returned error will still implement an Unwrap method returning the operand, but the error's Format method will not return the wrapped error.

What is the "error's Format method" ?

anyways, I understand what you mean by as-is. I changed this to return the wrapped error and only replace if we return EOF. I also changed the test to only check that we aren't returning EOF exactly if it could read some bytes.

What is the "error's Format method" ?

🤷‍♂️ I'm having a lot of difficulty with those docs.

…on't get EOF

This reverts commit b23fed4.

This reverts commit ae38a94.

MarcoPolo · 2022-03-17T19:23:52Z

Friendly bump on this @Stebalien :)

Stebalien

LGTM!

MarcoPolo and others added 2 commits March 9, 2022 15:51

Use readerWithEOFContext to handle EOF vs ErrUnexpectedEOF cases

099ab3e

fix: specialize discard for readerWithEOFContext

6eb0095

Stebalien force-pushed the MarcoPolo/issue62 branch from c098cce to 6eb0095 Compare March 10, 2022 00:37

MarcoPolo added 2 commits March 9, 2022 17:19

Check hasReadOnce in readByte methods

0326313

Avoid allocation in ScanForLinks

eecee09

fix: export eof reader and fix compile

1fdee4f

Also regenerate tests, that's how I found this.

feat: avoid creating new ReaderWithEOFContexts where possible

3dcc3ab

Fix GetPeeker magic

4343a5f

Use defer and remove ReaderWithEOFContext

22488b2

MarcoPolo force-pushed the MarcoPolo/issue62 branch from f178403 to 22488b2 Compare March 10, 2022 06:19

Stebalien reviewed Mar 10, 2022

View reviewed changes

gen.go Outdated Show resolved Hide resolved

Stebalien added 2 commits March 10, 2022 08:24

feat: simplify "read once" checks

083da62

fix: check for unexpected EOF in deferred unmarshaling

86717ea

Add TestErrUnexpectedEOF

e120cdc

Use errors.Is to detect wrapped EOF errors

ae38a94

Wrap error at end so we don't lose error context

b23fed4

Stebalien reviewed Mar 10, 2022

View reviewed changes

MarcoPolo added 3 commits March 10, 2022 12:13

Remove expectation of ErrUnexpectedEOF we just want to make sure we d…

19357f6

…on't get EOF

Revert "Wrap error at end so we don't lose error context"

0396fd1

This reverts commit b23fed4.

Revert "Use errors.Is to detect wrapped EOF errors"

c353e0a

This reverts commit ae38a94.

Stebalien approved these changes Mar 17, 2022

View reviewed changes

Stebalien merged commit 87edca1 into whyrusleeping:master Mar 17, 2022

iand mentioned this pull request Mar 18, 2022

Replace scratch buffers with pools #65

Closed

MarcoPolo deleted the MarcoPolo/issue62 branch March 21, 2022 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return EOF and ErrUnexpectedEOF correctly #64

Return EOF and ErrUnexpectedEOF correctly #64

MarcoPolo commented Mar 10, 2022

Stebalien commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

Stebalien commented Mar 10, 2022

Stebalien commented Mar 10, 2022

Stebalien commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

Stebalien left a comment

Stebalien commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

Stebalien commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

Stebalien Mar 10, 2022

MarcoPolo Mar 10, 2022

Stebalien Mar 10, 2022

MarcoPolo Mar 10, 2022

Stebalien Mar 10, 2022

MarcoPolo commented Mar 17, 2022

Stebalien left a comment

Return EOF and ErrUnexpectedEOF correctly #64

Return EOF and ErrUnexpectedEOF correctly #64

Conversation

MarcoPolo commented Mar 10, 2022

Context

Proposed solution

Stebalien commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

Stebalien commented Mar 10, 2022

Stebalien commented Mar 10, 2022

Stebalien commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

Stebalien left a comment

Choose a reason for hiding this comment

Stebalien commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

Stebalien commented Mar 10, 2022

MarcoPolo commented Mar 10, 2022

Stebalien Mar 10, 2022

Choose a reason for hiding this comment

MarcoPolo Mar 10, 2022

Choose a reason for hiding this comment

Stebalien Mar 10, 2022

Choose a reason for hiding this comment

MarcoPolo Mar 10, 2022

Choose a reason for hiding this comment

Stebalien Mar 10, 2022

Choose a reason for hiding this comment

MarcoPolo commented Mar 17, 2022

Stebalien left a comment

Choose a reason for hiding this comment