-
Notifications
You must be signed in to change notification settings - Fork 947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any descendant of the state can be a binary_type #1194
Changes from 9 commits
8803a05
035e75e
01e03ca
7866d2c
542940d
761b62f
1fa8e08
caadc22
1fccf08
d5740b4
9d896f7
cca2110
33e7569
926d40a
6ff791a
03323a6
91e8497
abe3ee8
c3e13ec
0a1dffb
036487c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -50,6 +50,55 @@ def _json_to_widget(x, obj): | |
else: | ||
_binary_types = (memoryview, buffer) | ||
|
||
|
||
def _split_state_buffers(state): | ||
"""Return (state_without_buffers, state_with_buffers, buffer_paths, buffers) for binary message parts | ||
|
||
state_with_buffers is a dict where any of it's decendents is is a binary_type. | ||
|
||
As an example: | ||
>>> state = {'plain': [0, 'text'], 'x': {'ar': memoryview(ar1)}, 'y': {'shape': (10,10), 'data': memoryview(ar2)}} | ||
>>> widget._split_state_buffers(state) | ||
({'plain': [0, 'text']}, {'x': {}, 'y': {'shape': (10, 10)}}, [['x', 'ar'], ['y', 'data']], | ||
[<memory at 0x107ffec48>, <memory at 0x107ffed08>]) | ||
""" | ||
|
||
def seperate_buffers(substate, path, buffer_paths, buffers): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. separate - spelled with an 'a' in the middle. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, I think this should be a top-level utility function - it seems wasteful to define it over and over in a function that is called very often. |
||
# remove binary types from dicts and lists, and keep there key, e.g. {'x': {'ar': ar}, 'y': [ar2, ar3]} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "keep there key" -> "keep their keys" ? |
||
# where are ar* are binary types | ||
# will result in {'x': {}, 'y': [None, None]}, [ar, ar2, ar3], [['x', 'ar'], ['y', 0], ['y', 1]] | ||
# instead of removing elements from the list, this will make replacing the buffers on the js side much easier | ||
if isinstance(substate, (list, tuple)): | ||
for i, v in enumerate(substate): | ||
if isinstance(v, (dict, list, tuple)): | ||
seperate_buffers(v, path + [i], buffer_paths, buffers) | ||
if isinstance(v, _binary_types): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if -> elif |
||
substate[i] = None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Assigning None won't work if substate is a tuple. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we do need to handle the tuple case, since python happily converts tuples to JSON arrays. One approach would be to always convert a tuple to a list right before enumerating it, but that seems wasteful. Another way to deal with it is to keep track of the indices that would be replaced, and at the very end, if there are indices to replace and substate is a tuple, convert to a list and assign the indices. That way we only convert if there is a binary value, which is probably fairly rare. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The tuple thing hit me when I was overthinking this in bed... good you catched it :) |
||
buffers.append(v) | ||
buffer_paths.append(path + [i]) | ||
elif isinstance(substate, dict): | ||
for k, v in list(substate.items()): # we need to copy to a list since substate will be modified | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My guess is that pulling out a binary buffer will be relatively rare. Perhaps we should not copy the list of items by default, but instead keep track below of the keys that need to be popped. If there are keys that need to be popped, do it once at the very end after iterating through the dict. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Running a few timing tests, it seems that, for example, this %%timeit
d = {1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8, 9:9, 10:10,11:11,12:12,13:13,14:14,15:15}
for k,v in list(d.items()):
if k % 100 == 0:
del d[k] is about 30% slower (after subtracting the overhead for creating the dictionary %%timeit
d = {1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8, 9:9, 10:10,11:11,12:12,13:13,14:14,15:15}
keys = []
for k,v in d.items():
if k % 100 == 0:
keys.append(k)
pass
for k in keys:
del d[k] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This code will not be the bottleneck, and I think that deleting afterwards makes the code more difficults to read, let me know what you think of the (changed) code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was also worried about the memory pressure of constantly creating new lists. You're probably right that it won't be the bottleneck. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Sure, soon as you push the changes. |
||
if isinstance(v, (dict, list, tuple)): | ||
seperate_buffers(v, path + [k], buffer_paths, buffers) | ||
if isinstance(v, _binary_types): | ||
substate.pop(k) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
buffers.append(v) | ||
buffer_paths.append(path + [k]) | ||
else: | ||
raise ValueError("expected state to be a list or dict, not %r" % substate) | ||
|
||
buffer_paths, buffers = [], [] | ||
seperate_buffers(state, [], buffer_paths, buffers) | ||
state_with_buffers = {} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't need |
||
# any part of the state that has buffers needs to be treated seperately | ||
# since of a issue as indicated in .open(..) | ||
# also remove top level elements if they contain nested buffers (len(k) > 1) | ||
for key in set([k[0] for k in buffer_paths if len(k) > 1]): | ||
state_with_buffers[key] = state[key] | ||
del state[key] | ||
return state, state_with_buffers, buffer_paths, buffers | ||
|
||
|
||
class CallbackDispatcher(LoggingConfigurable): | ||
"""A structure for registering and running callbacks""" | ||
callbacks = List() | ||
|
@@ -213,7 +262,7 @@ def __del__(self): | |
def open(self): | ||
"""Open a comm to the frontend if one isn't already open.""" | ||
if self.comm is None: | ||
state, buffer_keys, buffers = self._split_state_buffers(self.get_state()) | ||
state, state_with_buffers, buffer_paths, buffers = _split_state_buffers(self.get_state()) | ||
|
||
args = dict(target_name='jupyter.widget', data=state) | ||
if self._model_id is not None: | ||
|
@@ -223,6 +272,10 @@ def open(self): | |
if buffers: | ||
# FIXME: workaround ipykernel missing binary message support in open-on-init | ||
# send state with binary elements as second message | ||
# TODO: if this gets fixed, _split_state_buffers does not need to have a seperate | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you tried it? It seems that Comm supports buffers in the init: https://github.com/ipython/ipykernel/blob/master/ipykernel/comm/comm.py#L50 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not, but I will try There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like it was added about 1.5 years ago: ipython/ipykernel@64c6921 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool, the GitHub UI shows which tags the commit is part of - shows that the fix is in ipykernel 4.2 and later... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nice, setup.py requires > 4.5 as well as the conda-forge recipe There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, perhaps we should have a layer in front of both of those functions, right after we get the comm message off the wire, that puts back in the binary buffers, and then we're ready to go. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we just insert the binary deserialization at https://github.com/ipython/ipywidgets/blob/master/jupyter-js-widgets/src/manager-base.ts#L187 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (and a few other places, basically we keep the serialized state as binary until just before we send it, and we deserialize the binary state just after we get the message.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sounds like a good plan, I'll give it a go There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Basically, we think of the binary serialization as just a detail of the actual wire format (which it is), so we keep those details as close to the wire as we can. |
||
# state_with_buffers, this is needed since first the object is created without sending | ||
# the buffers, then on the js side, it tries to serialize, while the buffer have not been | ||
# patched in yet, see also widget.ts:_handle_comm_msg | ||
self.send_state() | ||
|
||
@observe('comm') | ||
|
@@ -258,16 +311,6 @@ def close(self): | |
self.comm = None | ||
self._ipython_display_ = None | ||
|
||
def _split_state_buffers(self, state): | ||
"""Return (state_without_buffers, buffer_keys, buffers) for binary message parts""" | ||
buffer_keys, buffers = [], [] | ||
for k, v in list(state.items()): | ||
if isinstance(v, _binary_types): | ||
state.pop(k) | ||
buffers.append(v) | ||
buffer_keys.append(k) | ||
return state, buffer_keys, buffers | ||
|
||
def send_state(self, key=None): | ||
"""Sends the widget state, or a piece of it, to the front-end. | ||
|
||
|
@@ -277,8 +320,8 @@ def send_state(self, key=None): | |
A single property's name or iterable of property names to sync with the front-end. | ||
""" | ||
state = self.get_state(key=key) | ||
state, buffer_keys, buffers = self._split_state_buffers(state) | ||
msg = {'method': 'update', 'state': state, 'buffers': buffer_keys} | ||
state, state_with_buffers, buffer_paths, buffers = _split_state_buffers(state) | ||
msg = {'method': 'update', 'state': state, 'buffers': buffer_paths, 'state_with_buffers': state_with_buffers} | ||
self._send(msg, buffers=buffers) | ||
|
||
def get_state(self, key=None, drop_defaults=False): | ||
|
@@ -453,8 +496,14 @@ def _handle_msg(self, msg): | |
if 'sync_data' in data: | ||
# get binary buffers too | ||
sync_data = data['sync_data'] | ||
for i,k in enumerate(data.get('buffer_keys', [])): | ||
sync_data[k] = msg['buffers'][i] | ||
if 'buffer_paths' in data: | ||
for path, buffer in zip(data['buffer_paths'], msg['buffers']): | ||
# we'd like to set say sync_data['x'][0]['y'] = buffer | ||
# where path in this example would be ['x', 0, 'y'] | ||
obj = sync_data | ||
for key in path[:-1]: | ||
obj = obj[key] | ||
obj[path[-1]] = buffer | ||
self.set_state(sync_data) # handles all methods | ||
|
||
# Handle a state request. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,11 +172,18 @@ class WidgetModel extends Backbone.Model { | |
case 'update': | ||
this.state_change = this.state_change | ||
.then(() => { | ||
var state = msg.content.data.state || {}; | ||
var buffer_keys = msg.content.data.buffers || []; | ||
// see Widget.open/_split_state_buffers about why we need state_with_buffers | ||
var state = _.extend({}, msg.content.data.state || {}, msg.content.data.state_with_buffers); | ||
var buffer_paths = msg.content.data.buffers || []; | ||
var buffers = msg.buffers || []; | ||
for (var i=0; i<buffer_keys.length; i++) { | ||
state[buffer_keys[i]] = buffers[i]; | ||
for (var i=0; i<buffer_paths.length; i++) { | ||
// say we want to set state[x][y[z] = buffers[i] | ||
var obj = state; | ||
// we first get obj = state[x][y] | ||
for (var j = 0; j < buffer_paths[i].length-1; j++) | ||
obj = obj[buffer_paths[i][j]]; | ||
// and then set: obj[z] = buffers[i] | ||
obj[buffer_paths[i][buffer_paths[i].length-1]] = buffers[i]; | ||
} | ||
return (this.constructor as typeof WidgetModel)._deserialize_state(state, this.widget_manager); | ||
}).then((state) => { | ||
|
@@ -399,23 +406,82 @@ class WidgetModel extends Backbone.Model { | |
// get binary values, then send | ||
var keys = Object.keys(state); | ||
var buffers = []; | ||
var buffer_keys = []; | ||
for (var i=0; i<keys.length; i++) { | ||
var key = keys[i]; | ||
var value = state[key]; | ||
if (value) { | ||
if (value.buffer instanceof ArrayBuffer | ||
|| value instanceof ArrayBuffer) { | ||
buffers.push(value); | ||
buffer_keys.push(key); | ||
delete state[key]; | ||
var buffer_paths = []; | ||
// keep track of what we visited, to avoid endless loops, when an object ancester | ||
// refers to itself, we cannot use an object, since using objects for keys don't work | ||
var visited_set = []; | ||
// this function goes through lists and object and removes arraybuffers | ||
// they will be transferred seperately, since they don't json'ify | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. separately |
||
// on the python side the inverse happens | ||
// if we need to remove an object from a list, we need to clone that list, otherwise we may modify | ||
// the internal state of the widget model | ||
// however, we do not want to clone everything, for performance | ||
function seperate_buffers(obj, path) { | ||
if(visited_set.indexOf(obj) != -1) { // we already visited this object | ||
return obj; | ||
} | ||
visited_set.push(obj); | ||
if (_.isArray(obj)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think all of our supported browsers support Array.isArray |
||
var is_cloned = false; | ||
for (var i = 0; i < obj.length; i++) { | ||
var value = obj[i]; | ||
if(value) { | ||
if (value.buffer instanceof ArrayBuffer || value instanceof ArrayBuffer) { | ||
if(!is_cloned) { | ||
obj = _.map(obj, _.identity); | ||
is_cloned = true; | ||
} | ||
buffers.push(value); | ||
buffer_paths.push(path.concat([i])); | ||
// easier to just keep the array, but clear the entry, otherwise we have to think | ||
// about array length | ||
obj[i] = null; | ||
} | ||
else { | ||
var new_value = seperate_buffers(value, path.concat([i])); | ||
if((new_value != value) && !is_cloned) { // only clone when we have to | ||
obj = _.map(obj, _.identity); | ||
is_cloned = true; | ||
obj[i] = new_value; | ||
} | ||
} | ||
} | ||
} | ||
} | ||
else if(_.isObject(obj)) { | ||
for (var key in obj) { | ||
var is_cloned = false; | ||
if (!obj.hasOwnProperty || obj.hasOwnProperty(key)) { | ||
var value = obj[key]; | ||
if(value) { | ||
if (value.buffer instanceof ArrayBuffer || value instanceof ArrayBuffer) { | ||
if(!is_cloned) { | ||
obj = _.mapObject(obj, _.identity); // clone only once for performance | ||
is_cloned = true; | ||
} | ||
buffers.push(value); | ||
buffer_paths.push(path.concat([key])); | ||
delete obj[key]; // for objects/dicts we just delete them | ||
} | ||
else { | ||
var new_value = seperate_buffers(value, path.concat([key])); | ||
if((new_value != value) && !is_cloned) { // only clone when we have to | ||
obj = _.mapObject(obj, _.identity); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah thanks, actually, would _.clone not be better? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I almost wrote that, but in this case, we know we're not a primitive or an array, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it reads better. |
||
is_cloned = true; | ||
obj[key] = new_value; | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
return obj; | ||
} | ||
state = seperate_buffers(state, []); // could return a clone | ||
this.comm.send({ | ||
method: 'backbone', | ||
sync_data: state, | ||
buffer_keys: buffer_keys | ||
buffer_paths: buffer_paths | ||
}, callbacks, {}, buffers); | ||
}).catch((error) => { | ||
this.pending_msgs--; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add one more item that is nested several levels deep? Perhaps something like:
'nested': {'a': mv1, 'b': [mv2, {'c': mv1, 'd': [mv1, mv2]}]}