Any descendant of the state can be a binary_type #1194

maartenbreddels · 2017-03-08T11:53:54Z

Before you could only do sth like this when serializing a state:

def array_to_binary(ar, obj=None):
    ar = ar.astype(np.float64)
    return memoryview(ar)

However, you'd like to be able to send shape along. This patch makes this possible, e.g.:

def array_to_binary(ar, obj=None):
	ar = ar.astype(np.float64)
	return {'data': memoryview(ar), 'shape': ar.shape}

To be able to do this, state is seperated in state_without_buffers (state on the js side), and a state_with_bufffers, where every dict items that is a buffer is pop'ed, and every list item replaced by None. If state_with_bufffers was not treated seperately, the state would first be deserialized completely, and later on the buffers were 'patched in. By keeping the state_with_bufffers` seperate, is will first put in all the buffers in place, and deserialize then afterwards.
Maybe it's not the cleanest solution, but my understanding of the code only goes this far.
This does not treat serialization (frontend -> backend), before continuing, I'd like to get 'approval' of this approach.

…ializing numpy arrays, in addition to shape, dtype etc

…gets/ipywidgets#1194

maartenbreddels · 2017-03-08T12:27:01Z

Oh, and respect for the ones responsible for thinking out the method on doing binary transfers, this is truly a good idea!

jasongrout · 2017-03-08T12:27:26Z

Why not just save the entire array+size+dtype into a binary format? For example, we could implement this: https://docs.scipy.org/doc/numpy/neps/npy-format.html#format-specification-version-1-0

maartenbreddels · 2017-03-08T12:31:28Z

That would be an option, but I also need serializing a dict of numpy arrays in the future, so even if you could solve some options with what you propose, this is more flexible. And I guess that requires a memcpy (although it's probably in the order of megabytes, so it shouldn't be an issue).

SylvainCorlay · 2017-03-08T12:35:34Z

Regardless of the method used for the case of numpy arrays, I think that allowing binary fields will be necessary for other use cases.

maartenbreddels · 2017-03-08T12:37:16Z

Yes, also other metadata, units, descriptions etc, you don't want to put it in binary form on each side of the wire, this would make life much easier.

jasongrout · 2017-03-08T12:40:20Z

I hesitate to start doing things that involve picking apart objects and replacing their innards. Are we replacing things arbitrarily deep in the nesting here?

SylvainCorlay · 2017-03-08T12:41:03Z

We can probably address all possible use cases by allowing only top-level binary keys.

maartenbreddels · 2017-03-08T12:42:12Z

Yes, any level deep, as far as lists and dicts go. We could not allow that, but that doesn't change the code at all (except for the recursive part). It is not any different than _widget_to_json.

maartenbreddels · 2017-03-08T12:43:28Z

It will actually require hardcoded checks, so as it is now, it's the easiest to maintain I think.

maartenbreddels · 2017-03-08T12:50:29Z

I think something that makes this complex (and it was already the case before this PR), is that first an object get created/deserialized here: https://github.com/ipython/ipywidgets/blob/master/jupyter-js-widgets/src/manager-base.ts#L346
And then afterwards, the buffer items gets pushed in here: https://github.com/ipython/ipywidgets/blob/master/jupyter-js-widgets/src/widget.ts#L175
and the rest gets deserialized.
Maybe if this can be done differently, like the deserialize part not being called at two places, the state wouldn't required to be split in two parts. Then buffer items could be popped, and list items set to null, and later filled in.

Because now, if say my numpy array 'x' (with the missing buffer value on the js side), get deserialized, and then later the buffer gets filled in, and then the deserialize isn't getting called anymore.

jasongrout · 2017-03-08T13:14:55Z

We can probably address all possible use cases by allowing only top-level binary keys.

This is essentially what we have now, one level up at the object's state keys. Some keys can be binary, some can be JSON.

What would we need to isolate this to a custom serializer (like the widgets serializer)? For example, what if we allowed multiple buffers to be associated with one key (to have possible zero-copy semantics).

(I'm not totally against this idea, but it certainly makes the serialization more complicated, so I want to explore some simpler ideas, or ways to move the complexity out of the base codebase)

maartenbreddels · 2017-03-08T13:21:17Z

Actually, that is also what @SylvainCorlay thought, but it is more like 0 levels deep.

state is a dict with all values, lets say we have a widget with a property 'x', then (igoring all the other meta data), state would be state = {'x': memoryview(ar)}, and that is allowed and get properly treated in _split_state_buffers and on the js side.
However, we'd like to allow at least state = {'x': {'data': memoryview(ar), 'shape': ar.shape, ....}}, so it's an extra level deeper we need to go. And well, 1 level or N levels, it's all the same code.

maartenbreddels · 2017-03-08T13:28:50Z

I agree that it makes it more complex, but I think the complexity start at the two places where the deserialization is happening. If, say at object construction we would already have the buffers available, that could make this simpler, we just pop buffers out of the dicts, and replace lists values with sentinels, and put them back in place directly before the object is constructed. Now the object is first constructed without anything that has a buffer in it missing (before it was only level 0 keys missing that had buffer object), and then an update is adding in the missing buffer objects.
Ah.... now that I am writing this, I understand where the issue is:
Widget.open first constructs the object without buffers, and then sends the state again. So my guess is, when this issue is fixed, it can at least be a bit simpler (as in not having state and state_with_buffers, but just 1 state)

SylvainCorlay · 2017-03-08T14:15:30Z

@jasongrout the current mechanism allows top-level properties of the object to by memory views, not top-level properties of a given trait attribute.

jasongrout · 2017-03-08T14:21:24Z

@jasongrout the current mechanism allows top-level properties of the object to by memory views, not top-level properties of a given trait attribute.

Right - any particular trait (i.e., a key/value in the object's state) can be binary.

jasongrout · 2017-03-08T14:40:12Z

However, we'd like to allow at least state = {'x': {'data': memoryview(ar), 'shape': ar.shape, ....}}, so it's an extra level deeper we need to go. And well, 1 level or N levels, it's all the same code.

Yes, I understand what we're trying to do. Another way to think about this proposal is that we are simply extending our base JSON encoder/decoder to handle binary objects anywhere in it. (The popping of binary buffers happens after the object's custom serializer has run, right?)

What if we allowed any particular trait to have arbitrary JSON metadata associated with it? We sort of have that now - if a trait is binary, some metadata is associated with it indicating which buffer the data is in. What if the serializer could return both a value and a metadata dict? And the deserializer could take both a data value and a metadata dict.

Also, the difference between what we have now (any trait can be encoded in binary) and a level-1 binary replacer is that the serialization/deserialization happens at the trait level, so with the level-1 binary replacer, a single trait can (very easily) be encoded as both binary data and JSON data, whereas with what we have now, there needs to be some extra magic going on to associate the JSON data with the binary data in two different traits.

maartenbreddels · 2017-03-08T14:59:13Z

I like the idea, but I can think of an example which doesn't really fit into this idea. Say a masked array, it would have two binary blobs associated with it, and I wouldn't call the array with the mask the meta data. It would make more sense to write it like this.

   ...
   return {'data': memoryview(data), 'mask': memoryview(mask), 'shape' data.shape}

jasongrout · 2017-03-08T15:58:57Z

Yes, I'm agreeing with this PR's idea more and more, especially as I think about it as being just a way of augmenting our JSON format to take advantage of the binary buffers we have available.

maartenbreddels · 2017-03-08T16:11:27Z

Ok, good to know. The split state and state_with_buffers can be avoided if the issue in Widget.open is fixed, but it's not a lot of code. I'll comment the code on what can be changed if this issue is fixed. So I guess the next step would be to implement serialize (from js to py)?

jasongrout · 2017-03-08T16:16:03Z

Can you show an example of what a deeper nesting path would look like? a state dictionary of {'x': {'arr': [ar, ar]}} would give [x, 'arr', 0], [x, 'arr', 1] as paths?

Implementing the serialization/deserialization in js definitely should be done too.

Thanks for tackling this! It's been a frustration we've had for a while (particularly with numpy arrays).

maartenbreddels · 2017-03-08T19:51:38Z

Ok, implemented, demo notebook here: https://gist.github.com/maartenbreddels/ac41cdcd22c8f84c9fc639dea7c90848
it's a model that takes x, and doubles every items, put it to y. As an extra check i also included a 'nested' property:

{'data': mv, 'shape': ar.shape, 'nested': {'list': [mv, 'test']}}

that gets passed along from x to y, just to see demonstate it handles nested items ok, in the last cell you see it is property serialize and deserialized.

maartenbreddels · 2017-03-09T08:37:53Z

I've answered your question in the docstring of _split_state_buffers. Futhermore, (in a previous commit) I've renamed buffer_keys by buffer_paths, since it better covers its usage.
Comments on the js/ts are welcome, since I'm not so comforable with it as I am with Python.

I also tried to keep the code simple (like a for loop in _handle_msg, instead of using reduce), to keep the code more readable for those who are not used to functional parts, but I don't know what you guidelines are on this.

maartenbreddels · 2017-03-10T21:29:57Z

FYI: Found some bugs in this PR, will commit new changes later.

SylvainCorlay · 2017-03-15T18:13:58Z

@maartenbreddels are you still making changes to this PR?

I did a rough review and this looks good. What issues did you encounter?

…ut_buffers for py/ts codes are equivalent, unittest for py side covers more cases also putting it back together, bugfix (covered by python test) for _remove_buffers, typo corrections

maartenbreddels · 2017-03-18T19:24:48Z

Ok, it's more consistent now. Both python and javascript sides have a (_)remove_buffers/(_)put_buffers. remove buffers doesn't modify existing dicts/objects (clones when needed), but references non-modifies ancestors. The js side has cyclic protection (needed to support Controller), the py side does not. _put_buffers does modify the state, but that should be fine since it coming off the wire.
I must say, the python unittest gives me a high level of trust in the code now, the fact that the ts/js code is practically the same makes me feel comfortable with it as well.

jasongrout · 2017-03-20T18:56:51Z

I think I figured out why you needed the circular dependency check on the js side. Basically, we use Javascript's toJSON feature (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#toJSON()_behavior) to serialize models, so the models are not serialized to "IPY_MODEL_..." strings until after the remove_buffers runs, so remove_buffers recurses into these complicated model objects. Looking into it...

maartenbreddels · 2017-03-20T18:58:45Z

Yes, I was surprised to see unpack_models, but not the inverse. But I think if the toJSON can handle it, we should be able to handle it as well.

jasongrout

I'll push a few commits with these changes...

jasongrout · 2017-03-20T17:51:17Z

ipywidgets/widgets/tests/test_traits.py

+        self.assertIn('shape', state['y'])
+        self.assertNotIn('ar', state['x'])
+        self.assertFalse(state['x']) # should be empty
+        self.assertNotIn('data', state['x'])


state['y']

jasongrout · 2017-03-20T17:52:23Z

ipywidgets/widgets/tests/test_traits.py

+        self.assertIsNot(state['y'], y)
+        self.assertIs(state['y']['shape'], y_shape)
+
+        # check that the buffers paths really points to the right buffer


jasongrout · 2017-03-20T17:56:19Z

ipywidgets/widgets/tests/test_traits.py

+        self.assertIn('plain', state)
+        self.assertIn('shape', state['y'])
+        self.assertNotIn('ar', state['x'])
+        self.assertFalse(state['x']) # should be empty


self.assertEqual(state['x'], {})

jasongrout · 2017-03-20T17:57:07Z

ipywidgets/widgets/widget.py

+            obj = obj[key]
+        obj[buffer_path[-1]] = buffer
+
+def _seperate_buffers(substate, path, buffer_paths, buffers):


jasongrout · 2017-03-20T18:07:08Z

ipywidgets/widgets/widget.py

+                buffers.append(v)
+                buffer_paths.append(path + [i])
+            elif isinstance(v, (dict, list, tuple)):
+                vnew = _seperate_buffers(v, path + [i], buffer_paths, buffers)


jasongrout · 2017-03-20T18:07:16Z

ipywidgets/widgets/widget.py

+                buffers.append(v)
+                buffer_paths.append(path + [k])
+            elif isinstance(v, (dict, list, tuple)):
+                vnew = _seperate_buffers(v, path + [k], buffer_paths, buffers)


jasongrout · 2017-03-20T18:08:08Z

ipywidgets/widgets/widget.py

+     [<memory at 0x107ffec48>, <memory at 0x107ffed08>])
+    """
+    buffer_paths, buffers = [], []
+    state = _seperate_buffers(state, [], buffer_paths, buffers)


jasongrout · 2017-03-20T18:10:01Z

ipywidgets/widgets/widget.py


-            args = dict(target_name='jupyter.widget', data=state)
+            args = dict(target_name='jupyter.widget', data={'state': state, 'buffer_paths': buffer_paths})


dict(..., buffers=buffers) (and delete the assignment below) to keep the buffer_paths with the buffers...

jasongrout · 2017-03-20T19:45:30Z

jupyter-js-widgets/src/utils.ts

+                        else {
+                            var new_value  = remove(value, path.concat([key]));
+                            // only assigned when the value changes, we may serialize objects that don't support assignment
+                            if(new_value != value) {


jasongrout · 2017-03-20T19:45:39Z

jupyter-js-widgets/src/utils.ts

+                    else {
+                        var new_value  = remove(value, path.concat([i]));
+                        // only assigned when the value changes, we may serialize objects that don't support assignment
+                        if(new_value != value) {


maartenbreddels · 2017-03-20T20:14:26Z

I agree, good you spotted those bugs in the unittest.

This removes the need for the visited node list.

jasongrout · 2017-03-20T21:14:13Z

@maartenbreddels - can you try this out? I think it should work, even with Controller...

maartenbreddels · 2017-03-21T14:08:51Z

With the toJSON the Controller should work fine, happy to see it get merged! Great work.

jasongrout · 2017-03-21T14:09:18Z

You did the great work here! Thanks!

maartenbreddels · 2017-03-21T14:15:25Z

Thanks for the careful and thorough review 👍

…ay, or python 3 bytes. We disregard the python 2 buffer type - it is obsolete, even on python 2. See jupyter-widgets#1194

…gets/ipywidgets#1194

Any descendant of the state can be a binary_type now, usefull for ser…

8803a05

…ializing numpy arrays, in addition to shape, dtype etc

maartenbreddels added a commit to widgetti/ipyvolume that referenced this pull request Mar 8, 2017

new: experimental binary transfer of numpy data, requires jupyter-wid…

d008749

…gets/ipywidgets#1194

maartenbreddels mentioned this pull request Mar 8, 2017

Speed improvement. widgetti/ipyvolume#13

Closed

serializing of nested buffer supported

035e75e

better docstring for _split_state_buffers

01e03ca

maartenbreddels mentioned this pull request Mar 12, 2017

Data send multiple times to backend #1195

Closed

_split_state_buffer renamed for ts/js consistency, _remove_buffers/_p…

cca2110

…ut_buffers for py/ts codes are equivalent, unittest for py side covers more cases also putting it back together, bugfix (covered by python test) for _remove_buffers, typo corrections

jasongrout reviewed Mar 20, 2017

View reviewed changes

jasongrout added 9 commits March 20, 2017 17:02

Support object toJSON serialization in js

33e7569

This removes the need for the visited node list.

!= -> !==

926d40a

We can assume hasOwnProperty in our supported browsers.

6ff791a

whitespace changes

03323a6

fix some unit tests

91e8497

seperate -> separate

abe3ee8

Add buffers to args dict on creation.

c3e13ec

var -> let

0a1dffb

add newline

036487c

jasongrout merged commit 17a3e45 into jupyter-widgets:master Mar 21, 2017

maartenbreddels mentioned this pull request Mar 22, 2017

Embeding working with performance == 1 or 2 widgetti/ipyvolume#18

Merged

jasongrout modified the milestone: 7.0 Apr 18, 2017

jasongrout mentioned this pull request Apr 18, 2017

Update Changelog #1279

Closed

gnestor mentioned this pull request Jun 14, 2017

ipywidgets 6 + jupyter-js-widgets 7 incompatibility #1425

Closed

scottdraves mentioned this pull request Jan 16, 2018

jupyter js widget image display broken from groovy twosigma/beakerx#6639

Closed

mpu-creare pushed a commit to mpu-creare/ipyvolume that referenced this pull request Mar 26, 2018

new: experimental binary transfer of numpy data, requires jupyter-wid…

91f9f82

…gets/ipywidgets#1194

github-actions bot added the resolved-locked Closed issues are locked after 30 days inactivity. Please open a new issue for related discussion. label Feb 15, 2021

github-actions bot locked as resolved and limited conversation to collaborators Feb 15, 2021


		args = dict(target_name='jupyter.widget', data=state)
		args = dict(target_name='jupyter.widget', data={'state': state, 'buffer_paths': buffer_paths})

Any descendant of the state can be a binary_type #1194

Any descendant of the state can be a binary_type #1194

Conversation

maartenbreddels commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

jasongrout commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

SylvainCorlay commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

jasongrout commented Mar 8, 2017

SylvainCorlay commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

jasongrout commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

SylvainCorlay commented Mar 8, 2017

jasongrout commented Mar 8, 2017

jasongrout commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

jasongrout commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

jasongrout commented Mar 8, 2017

maartenbreddels commented Mar 8, 2017

maartenbreddels commented Mar 9, 2017

maartenbreddels commented Mar 10, 2017

SylvainCorlay commented Mar 15, 2017 • edited Loading

maartenbreddels commented Mar 18, 2017

jasongrout commented Mar 20, 2017

maartenbreddels commented Mar 20, 2017

jasongrout left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maartenbreddels commented Mar 20, 2017

jasongrout commented Mar 20, 2017

maartenbreddels commented Mar 21, 2017

jasongrout commented Mar 21, 2017

maartenbreddels commented Mar 21, 2017

SylvainCorlay commented Mar 15, 2017 •

edited

Loading