Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More flexible extension support #192

Open
crabmusket opened this issue Sep 21, 2021 · 0 comments
Open

More flexible extension support #192

crabmusket opened this issue Sep 21, 2021 · 0 comments

Comments

@crabmusket
Copy link

crabmusket commented Sep 21, 2021

TLDR:

I'd like to suggest a tweak to the extension API. As well as the current process where an extension returns a blob which the encoder copies, I propose allowing an extension to use the "private" API of the encoder to directly perform writes.

I have prototyped this extension mechanism in a fork. The changes are minimal. See comparison here.

If you agree that this idea is desirable I'm happy to clean up my branch, add tests and make a PR.


I came across this need when I was trying to design an extension to handle TypedArrays natively, but also to align them to the correct byte indexes so that they can be used efficiently after decoding. You can see my extension here - the readme explains the alignment situation. (And my original comment here.)

// Encoding and decoding a TypedArray with no boilerplate:
const floatArray = new Float32Array([1, 2, 3, 4, 5]);
const encoded = encode({ floatArray }, { extensionCodec });
assert.deepStrictEqual(decode(encoded, { extensionCodec }), { floatArray });

Using the current extension API to do this, I would need to:

  1. Do an extra copy, from the TypedArray to a Uint8Array containing my extra wrapper and alignment bytes
  2. Predict which of MsgPack's extension headers the base encoder will use when writing my extension data, and adjust alignment accordingly
  3. Use a reference to the encoder anyway, to get the current value of pos. I'd have to send this circular reference into the extension codec's context object

Problem 1 isn't such a big deal; there's a lot of copying during encoding anyway, though I minimise it by choosing appropriate initial buffer sizes. And if copying can be avoided, why not?

Problem 2 is the main issue I am interested in; I was frankly too lazy to do the maths required to predict the ext header size and then modify the alignment accordingly. My fork allows extensions to directly call the encoder's write methods, which means that I was able to always choose an ext 32 header. This makes it easy to predict the alignment requirements. A small amount of bytes are wasted, but my use-case is for large data arrays where a handful of bytes will make almost no difference to the final payload size.


In order to implement this extension efficiently, I added a write method to ExtData. My plugin returns a subclass of ExtData which has its own implementation of write.

This was just a rough cut to see if the concept worked. If I were designing this properly, I'd replace the use of the ExtData class with an interface to avoid mandatory inheritance hierarchies. You can see in my TypedArrayExtData I'm already being silly by calling the parent constructor with a new Uint8Array() which is completely unused.


Pros of this idea:

  • Allows fewer copies during encoding
  • Allows me to implement alignment much more easily by controlling which ext header is written

Cons of this idea:

  • Introduces an additional method call during the encoding process (extData.write, which currently calls straight back to encoder.encodeExtension. I think this is an example of "double dispatch" in OOP parlance?)
  • Extensions using this approach rely on the private API of the Encoder
  • I haven't thought about recursive encoding 🤔

For my use case, I didn't mind using the private API. This library seems stable enough. But for my use-case, I would also have to depend on internal implementation details when guessing alignment: if the library ever used different ext headers for the same data size, then my alignment guesses would be wrong. I see no reason why that logic would ever change, but it's still an implementation detail.

However, you might not want to allow clients to depend on this behaviour. It could be worked around, e.g. by providing an adapter with stable public methods which call the encoder's write methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant