Add (*Reader).CopyNext(w) (int64, error) #167

pwaller · 2016-11-11T17:56:10Z

It is useful to be able to efficiently copy objects without
decoding them.

My use case is filtering when I already know the indices of
the objects I want to keep, and for rewriting a dictionary
of objects as a column of objects.

This commit:

Exports GetNextSize.
Adds a method ReadNext(p) to *Reader.

I wasn't sure about exporting GetNextSize, but I can't see the
harm in it, and it may be useful for finer-grained control in
the copying.

I also experimented with a NextReader() io.Reader method which
returned io.LimitReader(m.R, GetNextSize()), but for my use case
this was twice as slow and also did not handle nested objects.
In principle that might be useful for very large simple objects,
but is not included in this PR.

This change is

philhofer · 2016-11-11T18:09:46Z

I think the errors returned from ReadNext need to implement msgp.Error to preserve that API contract. Beyond that, 👍

philhofer · 2016-11-11T18:15:04Z

Actually, as I squint harder at this, it's not clear that ReadNext does what you want it to do. The returned size value from getNextSize for an object encoded in a fixmap, for instance, is 1 byte; you don't get the size of the actual object.

pwaller · 2016-11-11T18:15:31Z

Ah. Any ideas on how to fix this or otherwise do it correctly?

pwaller · 2016-11-11T18:16:38Z

My implementation was inspired by Skip(), so, does skip have that problem or does it avoid that somehow?

philhofer · 2016-11-11T18:17:22Z

Skip operates recursively. You'd have to do the same thing.

pwaller · 2016-11-11T18:17:47Z

Oh, crap, I have not pushed the latest commit. Oops.

pwaller · 2016-11-11T18:18:10Z

Force pushed.

pwaller · 2016-11-11T18:18:41Z

Will fix Error return, too.

philhofer · 2016-11-11T18:20:37Z

Ok cool; this looks correct.

I might suggest ReadNext(w io.Writer) (int, error), though. That may make the API easier to use (e.g. buf := bytes.NewBuffer(...); r.ReadNext(buf); ...)

philhofer · 2016-11-11T18:21:46Z

Also, given the non-obvious nature of GetNextSize, perhaps it shouldn't be exported as-is.

pwaller · 2016-11-11T18:23:07Z

Nice idea for ReadNext would suit me well. How about CopyNext as a name? ReadNext as a thing taking a writer seems confusing.

For GetNextSize I agree, though I may re-propose that later if I find I do need it.

Still fixing the error contract.

philhofer · 2016-11-11T18:24:40Z

CopyNext sounds good to me.

pwaller · 2016-11-11T18:32:22Z

On second thought, Copy/io.Writer-style rather than Read means that the caller loses control of the buffer which is not good for allocation unless we do something ala io.CopyBuffer. Can we reconsider ReadNext as it is? Thoughts?
I pushed Error interface just now.
I am also unsure about whether the additional size check is necessary, do you have an opinion?

    n, err := m.R.ReadFull(p[:sz])
    if err != nil {
        return 0, err
    }
    if uintptr(n) != sz {
        return 0, fmt.Errorf("wrong # bytes read (%d != %d)", n, int64(sz))
    }

pwaller · 2016-11-11T18:32:57Z

(If (3) is necessary, does it want to be yet another named error in errors.go, or something else?)

Thanks for the rapid review by the way.

philhofer · 2016-11-11T19:05:08Z

The caller doesn't necessarily lose control of the buffer. bytes.NewBuffer(buf) will do the 'right' thing in the case where your buffer is large enough; you just need to call Reset() before copying the next object.

The size check should be unnecessary; it's a bug if ReadFull returns without error and doesn't fill the buffer.

philhofer · 2016-11-11T19:07:36Z

(The Copy API also doesn't require that you hold the object in memory, either; it could be used to proxy objects or sub-objects of arbitrary size.)

pwaller · 2016-11-11T19:09:41Z

The caller doesn't necessarily lose control of the buffer.

But in order to call the w.Write from inside CopyNext(w), I need to call ReadFull with a byte slice. From whence does this byte slice come?

philhofer · 2016-11-11T19:14:38Z

Yeah; you'd need to refactor the code a bit. (*Reader).Next() gives you a pointer directly into the reader's buffer.

philhofer · 2016-11-11T19:19:08Z

~~Also, *fwd.Reader implements WriteTo, so io.CopyN() shouldn't allocate a buffer.~~
EDIT: Never mind, io.CopyN doesn't use io.WriterTo. But still, if it were a bytes.Buffer, for instance, you'd still get the ReadFrom optimization.

It is useful to be able to efficiently copy objects without decoding them. My use case is filtering when I already know the indices of the objects I want to keep, and for rewriting a dictionary of objects as a column of objects.

pwaller · 2016-11-11T19:27:34Z

That was the bit I was missing. That is freaking awesome:

--- a/msgp/read.go
+++ b/msgp/read.go
@@ -180,6 +180,32 @@ func (m *Reader) ReadNext(p []byte) (int, error) {
    return tot, nil
 }

+// CopyNext reads the next object from m without decoding it and writes it to w.
+// It avoids unnecessary copies internally.
+func (m *Reader) CopyNext(w io.Writer) (int64, error) {
+   sz, o, err := getNextSize(m.R)
+   if err != nil {
+       return 0, err
+   }
+
+   // avoids allocating because m.R implements WriteTo.
+   n, err := io.CopyN(w, m.R, int64(sz))
+   if err != nil {
+       return 0, err
+   }
+
+   // for maps and slices, read elements
+   for x := uintptr(0); x < o; x++ {
+       var n2 int64
+       n2, err = m.CopyNext(w)
+       if err != nil {
+           return n, err
+       }
+       n += n2
+   }
+   return n, nil
+}
+
 // ReadFull implements `io.ReadFull`
 func (m *Reader) ReadFull(p []byte) (int, error) {
    return m.R.ReadFull(p)

I have fully squashed my patches and force pushed.

pwaller · 2016-11-11T19:28:53Z

Sorry, please hold, I forgot to remove ReadNext correctly.

Also, do we need to revisit in light of your strikeout in the above comment?

philhofer · 2016-11-11T19:30:44Z

I would use (*fwd.Reader).Next(), and then fall back to CopyN when that fails. The reasoning here is that very frequently, Next() will work, and it will only fail when you ask for more bytes than can fit in the read buffer, in which case the cost of CopyN will amortize well, even if it does have to allocate a buffer.

pwaller · 2016-11-11T19:41:07Z

Pushed. My current status: happy. Taking a break for a bit.

I learned some neat tricks tonight, thanks for that 😎

philhofer · 2016-12-13T04:54:28Z

@pwaller hey, did you want to finish this off? If not, I'm happy to do so myself.

pwaller · 2016-12-13T17:56:32Z

AFAIK, I think it's done? I've ticked "Allow edits from maintainers" if you would like to make any additional changes.

- only call (*Reader).Next() when we're sure it won't realloc its buffer - promote io.ErrUnexpectedEOF to msgp.ErrShortBytes

philhofer · 2016-12-13T19:38:38Z

Cool. Pushed a small change in the error handling to make it a little more consistent with the rest of the code.

Thanks again!

pwaller · 2016-12-13T19:52:48Z

👍

* Add (*Reader).CopyNext(w) (int64, error) It is useful to be able to efficiently copy objects without decoding them. My use case is filtering when I already know the indices of the objects I want to keep, and for rewriting a dictionary of objects as a column of objects. * Remove ReadNext * Add opportunistic optimization with m.R.Next * Remove unused ReadNextError * Remove commented code * small fixup - only call (*Reader).Next() when we're sure it won't realloc its buffer - promote io.ErrUnexpectedEOF to msgp.ErrShortBytes

pwaller force-pushed the read-next-object branch from 2cdb467 to 8a1bc12 Compare November 11, 2016 18:18

Add (*Reader).CopyNext(w) (int64, error)

5ca5615

It is useful to be able to efficiently copy objects without decoding them. My use case is filtering when I already know the indices of the objects I want to keep, and for rewriting a dictionary of objects as a column of objects.

pwaller force-pushed the read-next-object branch from 200fd71 to 5ca5615 Compare November 11, 2016 19:27

pwaller added 4 commits November 11, 2016 19:31

Remove ReadNext

2684df0

Add opportunistic optimization with m.R.Next

e5aa905

Remove unused ReadNextError

37ba031

Remove commented code

eef5f70

pwaller force-pushed the read-next-object branch from 2d1a0cb to eef5f70 Compare November 11, 2016 19:40

pwaller changed the title ~~Export GetNextSize and implement ReadNext~~ Add (*Reader).CopyNext(w) (int64, error) Nov 11, 2016

small fixup

bd807ca

- only call (*Reader).Next() when we're sure it won't realloc its buffer - promote io.ErrUnexpectedEOF to msgp.ErrShortBytes

philhofer merged commit d7c86ea into tinylib:master Dec 13, 2016

pwaller deleted the read-next-object branch December 13, 2016 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add (*Reader).CopyNext(w) (int64, error) #167

Add (*Reader).CopyNext(w) (int64, error) #167

pwaller commented Nov 11, 2016 •

edited

Loading

philhofer commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016 •

edited

Loading

philhofer commented Nov 11, 2016

philhofer commented Nov 11, 2016 •

edited

Loading

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Dec 13, 2016

pwaller commented Dec 13, 2016

philhofer commented Dec 13, 2016

pwaller commented Dec 13, 2016

Add (*Reader).CopyNext(w) (int64, error) #167

Add (*Reader).CopyNext(w) (int64, error) #167

Conversation

pwaller commented Nov 11, 2016 • edited Loading

philhofer commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016 • edited Loading

philhofer commented Nov 11, 2016

philhofer commented Nov 11, 2016 • edited Loading

pwaller commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Nov 11, 2016

pwaller commented Nov 11, 2016

philhofer commented Dec 13, 2016

pwaller commented Dec 13, 2016

philhofer commented Dec 13, 2016

pwaller commented Dec 13, 2016

pwaller commented Nov 11, 2016 •

edited

Loading

pwaller commented Nov 11, 2016 •

edited

Loading

philhofer commented Nov 11, 2016 •

edited

Loading