Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building a notebook extension that allows to register other cells as its execution dependencies #1193

Closed
benelot opened this issue Jan 12, 2018 · 4 comments

Comments

@benelot
Copy link
Contributor

benelot commented Jan 12, 2018

Hello everyone,

I am thinking about writing an extension that allows for each cell to be annotated with cells on which its execution depends on. Working in a modular fashion could be simplified because it would allow to write short code segments that need to be run as initialization for others. For example, I can write a dataset download and extract section, which is then set as a dependency for a graph drawing cell. If I run the graph drawing cell, the dependent cells will be run before it to ensure its successful execution.

Can anybody give me some hints on where to start (what part of the doc I should read, not even sure if this is a front-end or back-end extension maybe both?) and how this could be done (as rough pseudocode like "you catch the event notification of the cell to be run and trigger all cells that are in its json tag called deps. The deps you can collect for each cell by inserting deps as special tags").

@juhasch
Copy link
Member

juhasch commented Jan 13, 2018

From the technical side this should not be too difficult to implement. You can take look at the runtools extensions for example, where you can mark cells and execute them.

It is not difficult to programmatically execute a given cell, or modify the execution behavior. This can be done using a custom execution command (called from a hotkey, menu entry or button), or you can change to logic by catching events, or overwriting the default execution function, implementing your own logic. Unfortunately there is no "official" API. Best start by looking at existing extensions in this or other repositories.

I think the more difficult part is to create a user interface that allows doing what you want in an intuitive and comprehensible manner. You might want to look at cell tags. You can add self-defined tags to a cell. Executing cells with this tag is a simple for loop.

@jcb91
Copy link
Member

jcb91 commented Jan 13, 2018

As @juhasch says, this could be implemented as a pure-javascript frontend nbextension without huge difficulty.

I think the more difficult part is to create a user interface that allows doing what you want in an intuitive and comprehensible manner.

Definitely. The simplest thing I can think of would be to use cell tags (items in cell.metadata.tags, can be set/removed using the cell toolbar as of jupyter/notebook#2048), specifying dependencies as a list of tags stored in another metadata item. Then patch the cell execute method so that when executing a given cell, its dependencies are found (searching the tags) and executed first. You may find some slight complexity in preventing running things lots of times (e.g., both cell 2 & cell 3 depend on cell 1, but you don't want to run cell 1 twice in order to run all three). In this case, you could add some record of which cells have been executed at least once, and not re-run them, but you'd need to be careful to make clear to users what was happening, and be careful to e.g. reset the record when restarting the kernel. You can find an example of patching the codecell execute function at freeze/main.js#L42-L53.

Something like

define([
    'base/js/namespace',
    'notebook/js/codecell'
], function (
    Jupyter,
    codecell
) {
    "use strict";

    return {
        load_ipython_extension: function () {
            console.log('[exec_deps] patching CodeCell.execute');
            var orig_execute = codecell.CodeCell.prototype.execute;
            CodeCell.prototype.execute = function (stop_on_error) {
                var root_cell = this;
                var dep_tags = root_cell.metadata.exec_deps || [];
                var dependency_cells = Jupyter.notebook.get_cells().filter(function (cell, idx, cells) {
                    if (cell === root_cell) return false;
                    var tags = cell.metadata.tags || [];
                    for (var ii = 0; ii < dep_tags.length && tags.length > 0; ii++) {
                        if (tags.indexOf(dep_tags[ii]) >= 0) {
                            console.log('[exec_deps] executing cell', idx, 'for dependency tag', dep_tags[ii]);
                            return true;
                        }
                    }
                    return false;
                });
                dependency_cells.forEach(function (cell) { cell.execute(stop_on_error); });
                orig_execute.call(this, stop_on_error);
            };
            console.log('[exec_deps] loaded');
        }
    };
});

plus some UI stuff to allow you to specify which tags are dependencies for a given cell

@benelot
Copy link
Contributor Author

benelot commented Jan 13, 2018

I think I would directly use the UI for the tags to represent the dependencies. Something like #A for the id and =>B for the dependency could be a simple representation of the dependency graph. Thanks for the hints, that is EXACTLY the kind of help I hoped for. I will keep you posted on the progress.

@jcb91
Copy link
Member

jcb91 commented Jan 15, 2018

Something like #A for the id and =>B for the dependency could be a simple representation of the dependency graph.

Ah yes, good plan! Nice, neat idea, and simple to implement :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants