Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any descendant of the state can be a binary_type #1194

Merged
merged 21 commits into from
Mar 21, 2017
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
8803a05
Any descendant of the state can be a binary_type now, usefull for ser…
maartenbreddels Mar 8, 2017
035e75e
serializing of nested buffer supported
maartenbreddels Mar 8, 2017
01e03ca
better docstring for _split_state_buffers
maartenbreddels Mar 9, 2017
7866d2c
fix: although state is a copy, it's ancestors may refer to original d…
maartenbreddels Mar 15, 2017
542940d
if a top level key enters twice, don't try to remove it twice
maartenbreddels Mar 15, 2017
761b62f
added unittest, make _split_state_buffer a function instead of method…
maartenbreddels Mar 15, 2017
1fa8e08
fix: toplevel buffers triggered a bug, since items were already remov…
maartenbreddels Mar 15, 2017
caadc22
test: using byte/string for memoryview for python27 compat
maartenbreddels Mar 15, 2017
1fccf08
fix unittest
maartenbreddels Mar 15, 2017
d5740b4
better docstring
maartenbreddels Mar 16, 2017
9d896f7
not using a seperate state with keeps the state which have buffer, mu…
maartenbreddels Mar 16, 2017
cca2110
_split_state_buffer renamed for ts/js consistency, _remove_buffers/_p…
maartenbreddels Mar 18, 2017
33e7569
Support object toJSON serialization in js
jasongrout Mar 20, 2017
926d40a
!= -> !==
jasongrout Mar 20, 2017
6ff791a
We can assume hasOwnProperty in our supported browsers.
jasongrout Mar 20, 2017
03323a6
whitespace changes
jasongrout Mar 20, 2017
91e8497
fix some unit tests
jasongrout Mar 20, 2017
abe3ee8
seperate -> separate
jasongrout Mar 20, 2017
c3e13ec
Add buffers to args dict on creation.
jasongrout Mar 20, 2017
0a1dffb
var -> let
jasongrout Mar 20, 2017
036487c
add newline
jasongrout Mar 20, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions ipywidgets/widgets/tests/test_traits.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@
# Distributed under the terms of the Modified BSD License.

"""Test trait types of the widget packages."""
import array

from unittest import TestCase
from traitlets import HasTraits
from traitlets.tests.test_traitlets import TraitTestBase
from ipywidgets import Color
from ipywidgets.widgets.widget import _split_state_buffers


class ColorTrait(HasTraits):
Expand All @@ -19,3 +21,19 @@ class TestColor(TraitTestBase):
_good_values = ["blue", "#AA0", "#FFFFFF"]
_bad_values = ["vanilla", "blues"]


class TestBuffers(TestCase):
def test_state_with_buffers(self):
mv1 = memoryview(b'test1')
mv2 = memoryview(b'test2')
state = {'plain': [0, 'text'], 'x': {'ar': mv1}, 'y': {'shape': (10, 10), 'data': mv1}, 'z': [mv1, mv2], 'top': mv1}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add one more item that is nested several levels deep? Perhaps something like:

'nested': {'a': mv1, 'b': [mv2, {'c': mv1, 'd': [mv1, mv2]}]}

state, state_with_buffers, buffer_paths, buffers = _split_state_buffers(state)
print("executed", state, state_with_buffers, buffer_paths, buffers)
self.assertIn('plain', state)
self.assertNotIn('x', state)
self.assertNotIn('y', state)
self.assertNotIn('z', state)
for path, buffer in [(['x', 'ar'], mv1), (['y', 'data'], mv1), (['z', 0], mv1), (['z', 1], mv2), (['top'], mv1)]:
self.assertIn(path, buffer_paths, "%r not in path" % path)
index = buffer_paths.index(path)
self.assertEqual(buffer, buffers[index])
79 changes: 64 additions & 15 deletions ipywidgets/widgets/widget.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,55 @@ def _json_to_widget(x, obj):
else:
_binary_types = (memoryview, buffer)


def _split_state_buffers(state):
"""Return (state_without_buffers, state_with_buffers, buffer_paths, buffers) for binary message parts

state_with_buffers is a dict where any of it's decendents is is a binary_type.

As an example:
>>> state = {'plain': [0, 'text'], 'x': {'ar': memoryview(ar1)}, 'y': {'shape': (10,10), 'data': memoryview(ar2)}}
>>> widget._split_state_buffers(state)
({'plain': [0, 'text']}, {'x': {}, 'y': {'shape': (10, 10)}}, [['x', 'ar'], ['y', 'data']],
[<memory at 0x107ffec48>, <memory at 0x107ffed08>])
"""

def seperate_buffers(substate, path, buffer_paths, buffers):
Copy link
Member

@jasongrout jasongrout Mar 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate - spelled with an 'a' in the middle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think this should be a top-level utility function - it seems wasteful to define it over and over in a function that is called very often.

# remove binary types from dicts and lists, and keep there key, e.g. {'x': {'ar': ar}, 'y': [ar2, ar3]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"keep there key" -> "keep their keys" ?

# where are ar* are binary types
# will result in {'x': {}, 'y': [None, None]}, [ar, ar2, ar3], [['x', 'ar'], ['y', 0], ['y', 1]]
# instead of removing elements from the list, this will make replacing the buffers on the js side much easier
if isinstance(substate, (list, tuple)):
for i, v in enumerate(substate):
if isinstance(v, (dict, list, tuple)):
seperate_buffers(v, path + [i], buffer_paths, buffers)
if isinstance(v, _binary_types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if -> elif

substate[i] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning None won't work if substate is a tuple.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need to handle the tuple case, since python happily converts tuples to JSON arrays. One approach would be to always convert a tuple to a list right before enumerating it, but that seems wasteful. Another way to deal with it is to keep track of the indices that would be replaced, and at the very end, if there are indices to replace and substate is a tuple, convert to a list and assign the indices. That way we only convert if there is a binary value, which is probably fairly rare.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tuple thing hit me when I was overthinking this in bed... good you catched it :)
I made it reflect the ts/js version more, clone only when needed. It's also reflected in the unittests, if we don't do this, someone will be bit by us modifying their data (see also comments below)

buffers.append(v)
buffer_paths.append(path + [i])
elif isinstance(substate, dict):
for k, v in list(substate.items()): # we need to copy to a list since substate will be modified
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My guess is that pulling out a binary buffer will be relatively rare. Perhaps we should not copy the list of items by default, but instead keep track below of the keys that need to be popped. If there are keys that need to be popped, do it once at the very end after iterating through the dict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running a few timing tests, it seems that, for example, this

%%timeit
d = {1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8, 9:9, 10:10,11:11,12:12,13:13,14:14,15:15}
for k,v in list(d.items()):
    if k % 100 == 0:
        del d[k]

is about 30% slower (after subtracting the overhead for creating the dictionary d) than

%%timeit
d = {1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8, 9:9, 10:10,11:11,12:12,13:13,14:14,15:15}
keys = []
for k,v in d.items():
    if k % 100 == 0:
        keys.append(k)
    pass
for k in keys:
    del d[k]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code will not be the bottleneck, and I think that deleting afterwards makes the code more difficults to read, let me know what you think of the (changed) code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also worried about the memory pressure of constantly creating new lists. You're probably right that it won't be the bottleneck.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me know what you think of the (changed) code.

Sure, soon as you push the changes.

if isinstance(v, (dict, list, tuple)):
seperate_buffers(v, path + [k], buffer_paths, buffers)
if isinstance(v, _binary_types):
substate.pop(k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del substate[k] is faster

buffers.append(v)
buffer_paths.append(path + [k])
else:
raise ValueError("expected state to be a list or dict, not %r" % substate)

buffer_paths, buffers = [], []
seperate_buffers(state, [], buffer_paths, buffers)
state_with_buffers = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need state_with_buffers anymore.

# any part of the state that has buffers needs to be treated seperately
# since of a issue as indicated in .open(..)
# also remove top level elements if they contain nested buffers (len(k) > 1)
for key in set([k[0] for k in buffer_paths if len(k) > 1]):
state_with_buffers[key] = state[key]
del state[key]
return state, state_with_buffers, buffer_paths, buffers


class CallbackDispatcher(LoggingConfigurable):
"""A structure for registering and running callbacks"""
callbacks = List()
Expand Down Expand Up @@ -213,7 +262,7 @@ def __del__(self):
def open(self):
"""Open a comm to the frontend if one isn't already open."""
if self.comm is None:
state, buffer_keys, buffers = self._split_state_buffers(self.get_state())
state, state_with_buffers, buffer_paths, buffers = _split_state_buffers(self.get_state())

args = dict(target_name='jupyter.widget', data=state)
if self._model_id is not None:
Expand All @@ -223,6 +272,10 @@ def open(self):
if buffers:
# FIXME: workaround ipykernel missing binary message support in open-on-init
# send state with binary elements as second message
# TODO: if this gets fixed, _split_state_buffers does not need to have a seperate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried it? It seems that Comm supports buffers in the init: https://github.com/ipython/ipykernel/blob/master/ipykernel/comm/comm.py#L50

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not, but I will try

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it was added about 1.5 years ago: ipython/ipykernel@64c6921

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, the GitHub UI shows which tags the commit is part of - shows that the fix is in ipykernel 4.2 and later...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, setup.py requires > 4.5 as well as the conda-forge recipe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, perhaps we should have a layer in front of both of those functions, right after we get the comm message off the wire, that puts back in the binary buffers, and then we're ready to go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and a few other places, basically we keep the serialized state as binary until just before we send it, and we deserialize the binary state just after we get the message.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like a good plan, I'll give it a go

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, we think of the binary serialization as just a detail of the actual wire format (which it is), so we keep those details as close to the wire as we can.

# state_with_buffers, this is needed since first the object is created without sending
# the buffers, then on the js side, it tries to serialize, while the buffer have not been
# patched in yet, see also widget.ts:_handle_comm_msg
self.send_state()

@observe('comm')
Expand Down Expand Up @@ -258,16 +311,6 @@ def close(self):
self.comm = None
self._ipython_display_ = None

def _split_state_buffers(self, state):
"""Return (state_without_buffers, buffer_keys, buffers) for binary message parts"""
buffer_keys, buffers = [], []
for k, v in list(state.items()):
if isinstance(v, _binary_types):
state.pop(k)
buffers.append(v)
buffer_keys.append(k)
return state, buffer_keys, buffers

def send_state(self, key=None):
"""Sends the widget state, or a piece of it, to the front-end.

Expand All @@ -277,8 +320,8 @@ def send_state(self, key=None):
A single property's name or iterable of property names to sync with the front-end.
"""
state = self.get_state(key=key)
state, buffer_keys, buffers = self._split_state_buffers(state)
msg = {'method': 'update', 'state': state, 'buffers': buffer_keys}
state, state_with_buffers, buffer_paths, buffers = _split_state_buffers(state)
msg = {'method': 'update', 'state': state, 'buffers': buffer_paths, 'state_with_buffers': state_with_buffers}
self._send(msg, buffers=buffers)

def get_state(self, key=None, drop_defaults=False):
Expand Down Expand Up @@ -453,8 +496,14 @@ def _handle_msg(self, msg):
if 'sync_data' in data:
# get binary buffers too
sync_data = data['sync_data']
for i,k in enumerate(data.get('buffer_keys', [])):
sync_data[k] = msg['buffers'][i]
if 'buffer_paths' in data:
for path, buffer in zip(data['buffer_paths'], msg['buffers']):
# we'd like to set say sync_data['x'][0]['y'] = buffer
# where path in this example would be ['x', 0, 'y']
obj = sync_data
for key in path[:-1]:
obj = obj[key]
obj[path[-1]] = buffer
self.set_state(sync_data) # handles all methods

# Handle a state request.
Expand Down
96 changes: 81 additions & 15 deletions jupyter-js-widgets/src/widget.ts
Original file line number Diff line number Diff line change
Expand Up @@ -172,11 +172,18 @@ class WidgetModel extends Backbone.Model {
case 'update':
this.state_change = this.state_change
.then(() => {
var state = msg.content.data.state || {};
var buffer_keys = msg.content.data.buffers || [];
// see Widget.open/_split_state_buffers about why we need state_with_buffers
var state = _.extend({}, msg.content.data.state || {}, msg.content.data.state_with_buffers);
var buffer_paths = msg.content.data.buffers || [];
var buffers = msg.buffers || [];
for (var i=0; i<buffer_keys.length; i++) {
state[buffer_keys[i]] = buffers[i];
for (var i=0; i<buffer_paths.length; i++) {
// say we want to set state[x][y[z] = buffers[i]
var obj = state;
// we first get obj = state[x][y]
for (var j = 0; j < buffer_paths[i].length-1; j++)
obj = obj[buffer_paths[i][j]];
// and then set: obj[z] = buffers[i]
obj[buffer_paths[i][buffer_paths[i].length-1]] = buffers[i];
}
return (this.constructor as typeof WidgetModel)._deserialize_state(state, this.widget_manager);
}).then((state) => {
Expand Down Expand Up @@ -399,23 +406,82 @@ class WidgetModel extends Backbone.Model {
// get binary values, then send
var keys = Object.keys(state);
var buffers = [];
var buffer_keys = [];
for (var i=0; i<keys.length; i++) {
var key = keys[i];
var value = state[key];
if (value) {
if (value.buffer instanceof ArrayBuffer
|| value instanceof ArrayBuffer) {
buffers.push(value);
buffer_keys.push(key);
delete state[key];
var buffer_paths = [];
// keep track of what we visited, to avoid endless loops, when an object ancester
// refers to itself, we cannot use an object, since using objects for keys don't work
var visited_set = [];
// this function goes through lists and object and removes arraybuffers
// they will be transferred seperately, since they don't json'ify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separately

// on the python side the inverse happens
// if we need to remove an object from a list, we need to clone that list, otherwise we may modify
// the internal state of the widget model
// however, we do not want to clone everything, for performance
function seperate_buffers(obj, path) {
if(visited_set.indexOf(obj) != -1) { // we already visited this object
return obj;
}
visited_set.push(obj);
if (_.isArray(obj)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of our supported browsers support Array.isArray

var is_cloned = false;
for (var i = 0; i < obj.length; i++) {
var value = obj[i];
if(value) {
if (value.buffer instanceof ArrayBuffer || value instanceof ArrayBuffer) {
if(!is_cloned) {
obj = _.map(obj, _.identity);
is_cloned = true;
}
buffers.push(value);
buffer_paths.push(path.concat([i]));
// easier to just keep the array, but clear the entry, otherwise we have to think
// about array length
obj[i] = null;
}
else {
var new_value = seperate_buffers(value, path.concat([i]));
if((new_value != value) && !is_cloned) { // only clone when we have to
obj = _.map(obj, _.identity);
is_cloned = true;
obj[i] = new_value;
}
}
}
}
}
else if(_.isObject(obj)) {
for (var key in obj) {
var is_cloned = false;
if (!obj.hasOwnProperty || obj.hasOwnProperty(key)) {
var value = obj[key];
if(value) {
if (value.buffer instanceof ArrayBuffer || value instanceof ArrayBuffer) {
if(!is_cloned) {
obj = _.mapObject(obj, _.identity); // clone only once for performance
is_cloned = true;
}
buffers.push(value);
buffer_paths.push(path.concat([key]));
delete obj[key]; // for objects/dicts we just delete them
}
else {
var new_value = seperate_buffers(value, path.concat([key]));
if((new_value != value) && !is_cloned) { // only clone when we have to
obj = _.mapObject(obj, _.identity);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_.extend(obj, {}) would be a more familiar idiom to widget developers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thanks, actually, would _.clone not be better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost wrote that, but in this case, we know we're not a primitive or an array, right? _.clone degenerates to _.extend after it checks for other types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it reads better.

is_cloned = true;
obj[key] = new_value;
}
}
}
}
}
}
return obj;
}
state = seperate_buffers(state, []); // could return a clone
this.comm.send({
method: 'backbone',
sync_data: state,
buffer_keys: buffer_keys
buffer_paths: buffer_paths
}, callbacks, {}, buffers);
}).catch((error) => {
this.pending_msgs--;
Expand Down