Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add offline ChartWidget based on AnyWidget #3108

Closed
wants to merge 16 commits into from
Closed

Conversation

jonmmease
Copy link
Contributor

This PR is a WIP example of adding a ChartWidget Jupyter Widget for Altair charts based on AnyWidget. See #3106 for some discussion.

In this first iteration, I've used the exact widget implementation from the AnyWidget blog post. There's a lot I want to add to this implementation, but starting simple for the sake of discussion.

Usage

Define a chart

import altair as alt
from vega_datasets import data

source = data.cars()
brush = alt.selection_interval()

points = alt.Chart(source).mark_point().encode(
    x="Horsepower",
    y="Miles_per_Gallon",
    color=alt.condition(brush, "Origin", alt.value("lightgray"))
).add_params(
    brush
)

bars = alt.Chart(source).mark_bar().encode(
    y="Origin",
    color="Origin",
    x="count(Origin)"
).transform_filter(
    brush
)

chart = points & bars

Wrap in a ChartWidget and wire up a selection callback

from altair.widget import ChartWidget
chart_widget = ChartWidget(spec=chart.to_json())

# Prints updates log console (JupyterLab: View > Show Log Console)
chart_widget.observe(lambda selection: print(selection.new), names=["selection"])

chart_widget

Building

To support offline usage, I adapted the blog example to use the esbuild bundler as recommend in AnyWidget docs.

@manzt, is this the right idea for using esbuild to bundle everything for offline support?

To build the JavaScript portion of the widget:

cd widget/
npm install
npm run build

This writes a file to altair/widget/static/index.js which includes all dependencies inline (it's around 800k minified).

The ChartWidget class in altair/widget/__init__.py references the bundle using the _esm property.

Notes

To get from here to the initial widget functionality discussed in #3106, it would mostly be a matter of adding functionality to the render JavaScript function in widget/src/index.js and to the ChartWidget Python class in altair/widget/__init__.py.

I'll keep working on the functionality over the coming weeks, but how do folks feel about including this widget in the Altair repo and package?

@jonmmease jonmmease marked this pull request as draft July 16, 2023 19:54

class ChartWidget(anywidget.AnyWidget):
_esm = _here / "static" / "index.js"
spec = traitlets.Unicode().tag(sync=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't spec also a dict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it could be. This was just copied from the blog post.

At this point I'm only really looking for feedback on the the project layout. The Python ChartWidget and JavaScript render functions are going to be rewritten.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I still feel this could live in ipyvega since there is nothing that is strictly Altair specific in the renderer but I don’t feel strong either way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My hope is, but please correct me if I'm wrong here, that by incorporating anywidget it is possible to add a new template or upgrade the current existing html/js-templates in https://github.com/altair-viz/altair/blob/master/altair/utils/html.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to deprecate ipyvega when this is ready then? I'd think so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manzt, what does it take to write an anywidget to a self-contained html file? I may be mis-remembering, but I thought that was something that you said was possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to deprecate ipyvega when this is ready then? I'd think so.

Yeah, I think this could cover the same use-cases as the VegaWidget in ipyvega. I'm not sure if the mimerenderer portion of ipyvega is covered by Altair right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to use use the ipywidgets.embed.embed_minimal_html:

chart_widget = ChartWidget(...)

from ipywidgets.embed import embed_minimal_html

embed_minimal_html("chart.html", views=[chart_widget], title="my chart")

But this doesn't seem to be working for any third-party widgets i've tested at the moment... I'll try to have a look and figure out what's going on.

@manzt
Copy link
Contributor

manzt commented Jul 18, 2023

@manzt, is this the right idea for using esbuild to bundle everything for offline support?

Yup, exactly.

You should also be able to run

npm run build -- --watch

during developmen to enable esbuild's watch-mode and get live-reloading from anywidget.

@mattijn
Copy link
Contributor

mattijn commented Jul 19, 2023

Another 2 questions @jonmmease, I became a bit scared of this package-lock.json and thought we will have to do the dependency management and version control of the correct (~125) node modules.

But in my current understanding these node modules are indirectly defined through this line:

  "dependencies": {
    "vega-embed": "^6.22.1"
  },

When doing npm install.

Since you also didn't commit the node_modules folder, would it make sense to place the package-lock.json in the .gitignore file?

Currently the used vega-embed version within the Altair html-templates is defined by this line:

VEGALITE_VERSION: Final = SCHEMA_VERSION.lstrip("v")
VEGA_VERSION: Final = "5"
VEGAEMBED_VERSION: Final = "6"

Would it make sense to somehow make sure these versions are in-sync with each other?

@jonmmease
Copy link
Contributor Author

But in my current understanding these node modules are indirectly defined through this line:

Yes, package-lock.json is autogenerated by the npm install command (if it doesn't already exist) from the dependencies section of package.json. It's standard practice to commit this file to git to ensure that builds are repeatable across the team, and so that they don't break silently when when transitive dependency updates are released (you opt-in to updates of transitive dependencies with npm update).

node_modules is typically not committed to git because it's often somewhat large and can be exactly regenerated based on the package-lock.json file. If package-lock.json were not committed to the repo, the contents of node_modules could be different for each member of the team.

Currently the used vega-embed version within the Altair html-templates is defined by this line:

Yes, we should come up with some kind of approach to either sync these automatically, or test that the versions match. We would also want to do the same for the versions of Vega-Lite and Vega that Altair is using.

@mattijn
Copy link
Contributor

mattijn commented Jul 19, 2023

Thanks for explaining, in my understanding we always want the exact same versions of the sub-dependencies that were used in the releases of the corresponding Vega-Embed/Vega and Vega-Lite packages. Do you think this package-lock.json can provide this? Or this wouldn't matter?
I feel a bit ignorant, but can we also make use of the build-versions of Vega-Embed/Vega/Vega-Lite (so that we don't need to do npm run build)?

@jonmmease
Copy link
Contributor Author

in my understanding we always want the exact same versions of the sub-dependencies that were used in the releases of the corresponding Vega-Embed/Vega and Vega-Lite packages. Do you think this package-lock.json can provide this?

To match what Altair is using, I'll add vega-lite and vega entries to the dependencies object in package.json that locks down their versions.

For transitive dependencies, we may not get the exact same versions that are used in the vega/vega-lite bundles that we pull from the CDN. This shouldn't be a problem, but if it ever comes up, we can pin the versions of these transitive dependencies by adding them to package.json with specific versions.

can we also make use of the build-versions of Vega-Embed/Vega/Vega-Lite (so that we don't need to do npm run build)?

What do you mean by "the build-versions"? Right now, I don't think we bundle Vega/Vega-Lite with Altair. This is why altair_viewer is a dependency for offline html export.

At this point, I don't think there's a way around running npm install & npm run build if we want the widget to work offline (Is that accurate @manzt?). As long as we commit the bundle to the repo, then we'll only need to run npm run build when making changes to the widget (or it's bundled dependencies). Developers who aren't making widget changes will be able to use the widget without having npm installed.

@mattijn
Copy link
Contributor

mattijn commented Jul 19, 2023

This is why altair_viewer is a dependency for offline html export.

I was wondering indeed if we could make use of these here.

Thanks for the clarifications!

@jonmmease
Copy link
Contributor Author

I've made some updates to the widget to get it into (what I consider to be) an MVP state.

  1. ChartWidget can now wrap an Altair chart directly.
  2. The top-level params are extracted from the chart and split between "selection params" and "regular params" (more on that below), and stashes them into the private _selection_watches and _param_watches traitlets.
  3. The client installs Vega callbacks for the *_watches signals and sync their values back to Python using the _selections and _params private traitlets.
  4. The public selections traitlet wraps _selections in the SelectionParam data class and the public params traitlet wraps the _params params in the Param dataclass.
  5. Widget is compatible with the "vegafusion" data transformer (in this case the Vega spec is sent to the client after pre-transforming).

I chose to separate regular params from selection params because I want to work toward a future where we can automatically filter a chart's input data using the active selection. This is why I also included the {selection}_store dataset in the SelectionParam data class. This dataset is what Vega uses internally to actually apply the selection to datasets, so having it available will make it easier to use in the future for automatic filtering.

There's a lot more functionality I want to add (in particular binary serialization of datasets in arrow format and listening to and updating arbitrary signals and datasets), but I think it makes sense to do that in follow on PRs.

I think this is already extremely useful, especially given the amount of code that was required. Please give it a try and see what you think!

Example:

Here's an example chart with two selection params and one regular param:

import altair as alt
from altair.widget import ChartWidget
from vega_datasets import data

source = data.cars()
brush = alt.selection_interval(name="brush")
legend_selection = alt.selection_point(
  name="legend", fields=["Origin"], bind="legend"
)

corner_radius = alt.param(name="cornerRadius", value=0, bind=alt.BindRange("range"))

points = alt.Chart(source).mark_point().encode(
    x="Horsepower",
    y="Miles_per_Gallon",
    color=alt.condition(brush, "Origin", alt.value("lightgray")),
    size=alt.condition(legend_selection,  alt.value(80), alt.value(20))
).add_params(
    brush, legend_selection
)

bars = alt.Chart(source).mark_bar(
        cornerRadius=corner_radius
    ).encode(
    y="Origin",
    color="Origin",
    x="count(Origin)"
).transform_filter(
    brush
).add_params(corner_radius)

chart = points & bars

chart_widget = ChartWidget(chart=chart)
chart_widget

visualization (5)

Now make a box selection and a legend selection and look at the chart_widget.selections property:

visualization (6)

chart_widget.selections
{
    'brush': SelectionParam(
        name='brush', 
        value={'Horsepower': [69.4583251953125, 143.0583251953125], 'Miles_per_Gallon': [18.716435750325523, 35.04976908365885]}, 
        store=[{'unit': 'view_2', 'fields': [{'field': 'Horsepower', 'channel': 'x', 'type': 'R'}, {'field': 'Miles_per_Gallon', 'channel': 'y', 'type': 'R'}], 'values': [[69.4583251953125, 143.0583251953125], [35.04976908365885, 18.716435750325523]]}]
    ),
    'legend': SelectionParam(
        name='legend', 
        value={'Origin': ['Europe'], 'vlPoint': {'or': [{'Origin': 'Europe'}]}}, 
        store=[{'fields': [{'type': 'E', 'field': 'Origin'}], 'values': ['Europe']}]
    )
}

Then drag the slider to change the bar corner radius and view the chart_widget.params property
visualization (7)

chart_widget.params
{'cornerRadius': Param(name='cornerRadius', value=34)}

@jonmmease jonmmease marked this pull request as ready for review July 20, 2023 16:46
@jonmmease jonmmease changed the title [WIP] Add ChartWidget based on AnyWidget Add ChartWidget based on AnyWidget Jul 20, 2023
@manzt
Copy link
Contributor

manzt commented Jul 20, 2023

At this point, I don't think there's a way around running npm install & npm run build if we want the widget to work offline (Is that accurate @manzt?).

Yeah, that's correct. The assets must be sent from the jupyter kernel for things to work offline.

@manzt
Copy link
Contributor

manzt commented Jul 21, 2023

@jonmmease if you want to avoid checking in the built artifact (static/index.js), I notice that your are using hatchling now as a build-system and there is a nice plugin to run npm build. When you publish to PyPI, the artifacts will be included if you add:

[tool.hatch.build]
artifact = "altair/static/index.js"

[tool.hatch.build.hooks.jupyter-builder]
build-function = "hatch_jupyter_builder.npm_builder"
ensured-targets = ["altair/static/index.js"]
skip-if-exists = ["altair/static/index.js"]
dependencies = ["hatch-jupyter-builder>=0.5.0"]

[tool.hatch.build.hooks.jupyter-builder.build-kwargs]
npm = "npm"
build_cmd = "build"
path = "widget"

to you pyproject.toml.

This is how jupyter-scatter is setup. Just depends if you want to avoid checking-in the artifacts, but the compromise is a slightly more complex build system.

@@ -0,0 +1,104 @@
import anywidget # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting, are anywidget's types not working with mypy? maybe I forget a py.typed?

"main": "index.js",
"scripts": {
"build": "esbuild src/index.js --bundle --format=esm --outfile=../altair/widget/static/index.js --minify",
"watch": "esbuild src/index.js --bundle --format=esm --watch --outfile=../altair/widget/static/index.js --minify"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want, you can reuse the prior build command and supply extra flags like:

Suggested change
"watch": "esbuild src/index.js --bundle --format=esm --watch --outfile=../altair/widget/static/index.js --minify"
"watch": "npm run build -- --sourcemap=inline --watch"

The inline sourcemaps are nice during dev so that you get nice tracebacks in the console for any errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

@jonmmease
Copy link
Contributor Author

Just depends if you want to avoid checking-in the artifacts, but the compromise is a slightly more complex build system.

Thanks for the example. The hatch approach is really nice!

@mattijn @binste @joelostblom , do you all have a preference between these two options?:

  1. Commit the ~1 MB bundle to git (and recommit when we make changes to the widget), but don't require developers to have npm installed unless they are making changes to the widget.
  2. Use hatch to build the bundle automatically (without committing the bundle to Git), but require developers have npm installed to work on Altair (Only for developers, npm is not required to use the widget).

@mattijn
Copy link
Contributor

mattijn commented Jul 21, 2023

Its great to see these dicts of the selections and params coming out of the chart in Python. Big accomplishment!

I'm leaning toward (2) at the moment. I've nodejs already in my environment and I'm fine to include it in the development dependencies.

I tried it a bit and naively thought that the change_params function also could set params, but this didn't work:

class ChangePar:
    def __init__(self, new_params):
        self.new = new_params
        
par = chart_widget.params['cornerRadius']
par.value = 40
change_par = ChangePar(new_params={par.name: {"value": par.value}})

chart_widget.change_params(change_par)
chart_widget.params['cornerRadius']
Param(name='cornerRadius', value=40)
# no change when rendering chart_widget 

Is the change_params more functioning as a watcher or setting is also in the scope?

Can this also be an attempt to enhance the capabilities of alt.Chart() by incorporating (some additional) functionality through the use of ChartWidget? Eg. using supercharging?

@jonmmease
Copy link
Contributor Author

Is the change_params more functioning as a watcher or setting is also in the scope?

Yeah, this should be a private function named something more like _on_param_change(). It's called when the front-end makes a change to the private _params traitlet. I'll give it a better name and make it private.

I'm planning to add support for updating params in the future (VegaFusion will need this). Maybe a method like widget.set_params({"cornerRadius": 2}) that would send a message to the front end that would call view.sigal("cornerRadius", 2). Or maybe widget.set_params(Param(name="cornerRadius", value=2)) to reuse the Param dataclass.

I'm leaning toward (2) at the moment. I've nodejs already in my environment and I'm fine to include it in the development dependencies.

FWIW, that's my preference as well. One thing to clarify though is that npm is not installable with pip (though it is with conda), so I don't know if there's a way to have it automatically installed by hatch (unless the plugin Trevor mentioned above does that).

@mattijn
Copy link
Contributor

mattijn commented Jul 21, 2023

I don't have good experiences with mixing hatch and conda in workflows at Github Action.
It seems jupyter-scatter is doing this without conda and relying on hatch-jupyter-builder, not exactly sure what this npm_builder function is doing.

@joelostblom
Copy link
Contributor

Wow so much progress being made quickly here, thank you @jonmmease ! Regarding your question, I would also lean towards 2; if npm is only required for development then I think it makes sense to require it as a developmental dependency rather than increasing the size of the package to make the development install easier. I don't feel strongly here though, and could easily be convinced if there are good argument for option 1.

@manzt
Copy link
Contributor

manzt commented Jul 21, 2023

I don't have good experiences with mixing hatch and conda in workflows at Github Action.

Just to be clear, this build does not require the hatch CLI. The hatch-jupyter-builder is a plugin for the PEP 518-compliant hatchling build-system. You can build and publish this package in a Python environment (including one created with conda) via:

python -m build .
twine upload dist/

not exactly sure what this npm_builder function is doing.

The npm_builder is a hook for hatchling build-system that runs npm (or other pkg manager) install + some build command defined in package.json when installing/building the python package. This provides a mechanism to statically declare the steps to create altair/static/index.js within pyproject.toml.

FWIW, I think I would also prefer option 2.

@jonmmease
Copy link
Contributor Author

Thanks for the input all. I'll work on incorporating the hatchling npm plugin

pyproject.toml Outdated
@@ -94,6 +94,7 @@ allow-direct-references = true

[tool.hatch.build]
include = ["/altair"]
artifact = "altair/widget/static/index.js"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be “artifacts”, sorry for the typo in my suggestion

@binste
Copy link
Contributor

binste commented Jul 22, 2023

This is awesome! 🔥 Really cool feature and great to see so many people providing inputs.

A question regarding putting this into Altair vs. ipyvega. In past discussions such as #2807 it was noted that the Vega-Lite and Vega JS code was kept separate in altair_viewer to keep Altair small. I think there were other discussions but I can't find them right now. An Altair wheel file is currently 470KB which is a great advantage in the WASM world and I'm wondering if it's worth keeping it this way. Ipyvega could be installed, together with other optional dependencies such as vl-convert, using pip install "altair[full]" as described in #2818.

Apart from speed, another upside would be that we don't need npm for development. I don't mind it personally but it's already now not straightforward to start contributing to Altair although it got a lot easier since we no longer need altair_saver. We would again loose part of that advantage as it's no longer a simple pip install ".[dev]".

I'm also on board with including it in Altair and want to reiterate that I find this an awesome feature. Just want to hear your thoughts on these points. Maybe they were already discussed somewhere and I missed it.

@mattijn
Copy link
Contributor

mattijn commented Jul 22, 2023

Good questions @binste!

I think we all like the provision of improved access to the View API of Vega within Altair. Regarding added LOC, I think this PR is small for the functionality it returns.

In my understanding this PR would enable (eventually) resolving the three most upvoted issues in the Altair Github repository, namely #426, #1153, #290 (and also #435, #1801). Is that true @jonmmease?

Regarding WASM, I'm not sure if size is the bottleneck. I was in favor of including vl-convert as a hard dependency, see #2816 and willing to accept an increase of 20mb in size. But the inability of installing a compiled library (vl-convert) within WASM and other places with certain arm architectures was the argument to not do it. For me this extra few KBs, even if this an x2 increase of total size, is acceptable. Not sure if we need this few extra KB at this moment though, see below.

If we follow this route through inclusion of a bundled version for offline rendering will this release the dependency on altair_viewer? The current released version of altair_viewer provides no support for Altair version 5 and higher and I'm not certain if there will come a new release.

Maintenance burden is a good point. Will npm/nodejs still be required as dependency if we first adopt online-rendering only? In that case we could keep this PR to online-rendering only and discuss how to include offline rendering in another PR.

And while I like the hatch-bundling approach, it is currently not yet working for me. This is the current result if I do: hatch run test:

  error: subprocess-exited-with-error

  Preparing editable metadata (pyproject.toml) did not run successfully.
  exit code: 1

  [138 lines of output]
  INFO:hatch_jupyter_builder.utils:Running jupyter-builder
  INFO:hatch_jupyter_builder.utils:Building with hatch_jupyter_builder.npm_builder
  INFO:hatch_jupyter_builder.utils:With kwargs: {'npm': 'npm', 'build_cmd': 'build', 'path': 'widget'}
  INFO:hatch_jupyter_builder.utils:Installing build dependencies with npm.  This may take a while...
  INFO:hatch_jupyter_builder.utils:> D:\Software\mambaforge\envs\stable\npm.CMD install
  npm WARN vega-embed@6.22.1 requires a peer of vega@^5.21.0 but none is installed. You must install peer dependencies yourself.
  npm WARN vega-embed@6.22.1 requires a peer of vega-lite@* but none is installed. You must install peer dependencies yourself.
  npm WARN vega-themes@2.14.0 requires a peer of vega@* but none is installed. You must install peer dependencies yourself.
  npm WARN vega-themes@2.14.0 requires a peer of vega-lite@* but none is installed. You must install peer dependencies yourself.
  npm WARN altair_widget@0.1.0 No repository field. 
...

@manzt
Copy link
Contributor

manzt commented Jul 22, 2023

Will npm/nodejs still be required as dependency if we first adopt online-rendering only?

It shouldn’t be. In this case you’d remove widget/ and check-in altair/static/index.js that imports dependencies from a CDN.

@jonmmease
Copy link
Contributor Author

Thanks for the thoughtful comments @binste and @mattijn.

Added functionality

In my understanding this PR would enable (eventually) resolving the three most upvoted issues in the Altair Github repository, namely #426, #1153, #290 (and also #435, #1801). Is that true @jonmmease?

Yes, I think there's a clear path to covering all of these. The only one I didn't already have in mind was #435 (Having the ability to stream data to the client). But ipyvega already has the ability, so it's certainly something we can get working here.

Bundle size

The bundle size is a valid concern. Altair's current wheel size is dramatically smaller than alternative Python visualization libraries.

Altair 5.0.1 wheel: 456k
Matplotlib 3.7.2 wheel: ~7.5MB
Bokeh 3.2.1 wheel : 7.8MB
Plotly 5.15 wheel: 15.5MB

At the moment, this PR would increase the Altair wheel size to 765k. This is a ~70% size increase. But this would go up further when adding the apache-arrow JavaScript library for binary deserialization (maybe up to around a 1MB?). On the one hand, this is still 7x smaller than the next smallest option. On the other hand, that's not necessarily justification for doubling the wheel size.

Since part of the discussion here is about Altair's use in wasm environments, it's worth noting that I think the widget should be functional in JupyterLite, so there would be some benefit to some Python wasm users.

Online vs Offline

You make a good point, @mattijn, that reworking this PR to be online-only might be a good compromise as a starting point. This would eliminate the bundle size increase and remove the need for npm and package.json.

An interesting consequence of how AnyWidget dynamically loads JS is that it should be possible to swap the online JS for an offline bundle without changing the Python portion of the widget. Maybe we eventually add an altair-offline package that includes the JS bundles we need for offline html rendering (instead of altair_viewer) and the offline bundle for the ChartWidget. Then we could add an offline flag to the ChartWidget constructor that would load the JS from altair-offline package.

There are details to figure out, but I'm liking the idea of including an online version of the widget directly in the altair package and then figuring out how to offer offline support using a separate package in the future.

Next steps

Given the change in direction (both of bundling and ChartWidget itself), I think I'll close this PR and then open a new online-only version where we can review the widget implementation itself.

Thanks again for all of the engagement on this issue!

@jonmmease jonmmease closed this Jul 22, 2023
@jonmmease jonmmease changed the title Add ChartWidget based on AnyWidget Add offline ChartWidget based on AnyWidget Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants