Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to Vega-Lite 5 #2425

Closed
mattijn opened this issue Mar 2, 2021 · 25 comments
Closed

Upgrade to Vega-Lite 5 #2425

mattijn opened this issue Mar 2, 2021 · 25 comments

Comments

@mattijn
Copy link
Contributor

mattijn commented Mar 2, 2021

How can we prepare Altair and its community to upgrade towards the new release of Vega-Lite 5?

It's probably wishful thinking that @jakevdp has the time available to do this completely by himself, but it's probably also wishful thinking that someone can do this completely without the help of Jake.

The skills and knowledge required to change the Vega-Lite schema into workable Python code is quite high. While Altair's codebase is flexible by nature to accommodate little changes within Vega-Lite, bigger changes such as datum and the recently introduced params, require more changes in the codebase.

Maybe someone is just about to finish all the work required for VL5 or maybe we have to become creative¹.

I'm sure more people have thoughts on this!

Useful references:
Interesting comments regarding development in the API of Altair can be found in these issues/PRs: #2148, #1567, #1629

And here is the maintenance guide for Altair internals for people that already like to give it a go: https://github.com/altair-viz/altair/blob/master/NOTES_FOR_MAINTAINERS.md

¹ such as:

  • asking Jake for a code-explanation tour in a virtual setting so people with an interest in the project have a guideline to succeed in core-work PRs
  • set up an Altair sponsor page to establish a fund for paid upgrading
@joelostblom
Copy link
Contributor

Thanks for starting this discussion! I would love to see Altair being updated to work with the new exciting additions to Vega-Lite! I am not sure if I can contribute meaningfully in terms of the needed core changes, but I am more than happy to help out with other tasks that would free up the time for those working on this.

In terms of sponsorship, maybe Altair and/or Vega-Lite would qualify to become a Numfocus project? They support Python and JS projects among others and both Matplotlib and Bokeh receive their sponsorship.

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 2, 2021

Thanks! As far as Altair internals go... there's unfortunately not much of a process beyond what's documented in https://github.com/altair-viz/altair/blob/master/NOTES_FOR_MAINTAINERS.md

When it the update script leads to errors, the process is essentially:

  • play around until you fix the code generation errors
  • run the unit tests to see what backward-incompatible changes are implied by the updated Vega-Lite spec
  • do your very best to accommodate those without breaking the existing Python API

Unfortunately, Vega-Lite 4.9 made some pretty significant changes that have resulted in it being quite difficult to build a new Python wrapper... I've set out to try it a few times in the last year or so, and always had to give up before getting close to solving all the issues. Unfortunately, I don't think this is a type of problem that lends itself well to sharing the burden across distributed collaborators. Someone will just have to sit down and figure out what assumptions the Altair API makes that Vega-Lite 4.9 broke, and figure out options for moving forward while breaking as few users as possible.

Trying to build a stable API on an ever-shifting specification is pretty difficult, it turns out 😁.

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 2, 2021

Side-note: all of these difficulties are caused by extra Python bells and whistles on top of the raw specification... so if you're one of the people who has been disappointed over the years by me turning down your API improvement idea, this is exactly why: our API is already far too "improved" to allow Altair to be easily maintainable in the long-run.

@mattijn
Copy link
Contributor Author

mattijn commented Mar 3, 2021

Thanks for the response! First of all, nobody is disappointed here! Fan for a lifetime!

I hope that we can we reduce the first hurdle so more people will try to generate the API. The output generated by the generate_schema_wrapper.py is good to follow, but the generator itself is a bit more complicated. Not just to understand, but also to implement changes with confidence.

With 6-years codebase experience this might feel like playing, but for motivated newcomers it is probably a bit more scary. Python code in comments? Mixins? Directed acyclic graph? Shorthands?

When introducing changes, in which parts should one be careful, not too-careful, or can one even be careless?

Over the years numerous shorthands have been introduced within the generator, which seems to bite us here, since they do not completely align anymore with an evolving VL specification.
How important are which shorthands? Can one make a first step without these? Can they still be introduced outside the generator proces? Might maintainability improves by reducing complexity with the cost of increasing tediosity?

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 3, 2021

shorthands are things like

chart.encode(x='sum(x):Q')

which is expanded to

chart.encode(x=alt.X(field='x', type='quantitative', aggregate-'sum'))

I don't think things like this can be removed, because they are front-and-center in the API that most users use for basic charts.

@mattijn
Copy link
Contributor Author

mattijn commented Mar 3, 2021

I love the shorthands, but I'm not against releasing an alpha release without these bells and whistles in order to get a better feeling what the new functionalities of VL mean for Altair.

All current viszs can still be made and more, albeit a bit more verbose. Users with existing specifications including shorthands have to wait a bit more.

By that time @joelostblom is up and running and has discovered some new routes to bring back the shorthands:)

@jakevdp
Copy link
Collaborator

jakevdp commented Mar 3, 2021

@mattijn interesting idea, but note that would break most tests that Altair uses to validate new releases. You could work around that by rewriting all the tests (which would be a lot of work that would later have to be undone) or disabling all the tests (which would leave us unable to validate whether the simplified wrappers even work).

All told, I think time would be better spent supporting the full API at the start.

@domoritz
Copy link
Member

Thank you all for kicking off this conversation. I would like to offer full support from the Vega-Lite team to answer any questions you will run into.

@ramayer-fli
Copy link

Why not target full-Vega instead of Vega-Lite? Seems to me Vega-Lite is short-and-easy enough to just use in a IPython.display.IFrame(...) directly; and a bigger benefit of a pythonic API would be to allow more complex full-vega json to be created more elegantly.

@mattijn
Copy link
Contributor Author

mattijn commented Apr 20, 2021

Naming is hard, consider Vega-Lite as Vega-Compact and it make more sense to stick to Vega-Lite instead of Vega(-Verbose).

@domoritz
Copy link
Member

Vega-Lite (yes, not a great name we admit) has gotten substantially more flexible and now supports a lot what previously only worked in Vega. Vega still needs an order of magnitude or two more spec for the same chart.

@jakevdp
Copy link
Collaborator

jakevdp commented Apr 20, 2021

altair does have wrappers for full vega specs: from altair import vega. Nothing high-level, but all the low-level definitions are there for you to use if you wish.

@jbloom
Copy link

jbloom commented Jul 19, 2021

I was just curious about this upgrade and the general future of altair, as I've noticed commits and code updates have slowed a lot over last year or so. I certainly don't have ability to maintain package myself, but do think it's an awesome package. However, I'm trying to decide whether to use it for some long-term projects I'm starting, and hence am sort of curious whether it's likely altair continues to undergo a lot of active development and growth, or if it's sort of in maintenance mode now?

@mattijn
Copy link
Contributor Author

mattijn commented Oct 25, 2021

I'm sure we can move this a bit more forward if there are more pointers on the table in order to get some rightly aimed twilight shots (instead of shots in the dark) from awesome people (one is enough).

@domoritz, I noticed this tweet from you stating a potential project around automatic API generation. Is this something to wait for for Altair?

Also I noticed that Vega-Lite API has VL5 support for some time already, are there any pointers there, that could be useful for going forward here?

@jakevdp, I'm sure you have been trying as well to push things more forward. I hope for you (and us😉) that this tweet is not about Altair, but I won't be surprised it is. If it is, could you maybe unveil some more tips of the iceberg where one need to focus on when approaching a serious attempt? For example: is the current codebase a good starting-point?

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 25, 2021

Regarding automatic API generation... that's what Altair uses at its core, and if it were only that, we'd have no problem upgrading automatically. The problems arise with all the syntactic sugar that makes up the bulk of the API. When someone writes alt.Chart('data.txt').encode(x='x:Q') there is a lot of additional logic to choose what schema elements to generate, and when the schema changes that logic must also change.

Early on we decided that writing something like this:

alt.Chart('data.txt').mark_point().encode(x='x:Q')

would be far more usable than the automatically-generated raw API, which would look something like this:

alt.Chart(
    data=alt.UrlData(url='data.txt'),
    mark=alt.MarkDef(type='point'),
    encoding=alt.Encoding(
        x=alt.PositionFieldDef(field='x', type='quantitative')
    )
)

(which, by the way, is valid code in Altair! You can use the automatically-generated API if you wish)

Maybe it would have been better not to add the syntactic sugar... it certainly would have made staying in sync with Vega-Lite much easier, but I don't think people would have been as excited about that version of Altair.

@jakevdp
Copy link
Collaborator

jakevdp commented Oct 25, 2021

The main challenge with upgrading beyond vega-lite 4.8 is that the encode(x='x') shorthand is built around the assumption that x='x' can either refer to a PositionFieldDef or an XValueDef. Vega-Lite 4.9 made it so that it can also be a PositionDatumDef, and there is some logic that pre-supposed only two options.

So, along with FieldChannelMixin, and ValueChannelMixin, we'll also probably need a DatumChannelMixin, and to generate all the associated classes, and then update infer_encoding_types to infer the correct class for whatever argument the user specifies.

I don't think it's that hard, it will just take some time and some experimentation. I estimate I could probably upgrade Altair to VL4.9 with maybe a ~week of engineering work.

@ChristopherDavisUCI
Copy link
Contributor

I'm very excited about this conversation!

May I ask a naive question? I understand that tools/generate_schema_wrapper.py is one of the most important files to update (and it in turn automatically updates several others). What are the other most important files to update? Is altair/vegalite/v4/api.py the clear second-most important?

@mattijn
Copy link
Contributor Author

mattijn commented Oct 25, 2021

Thanks for the prompt and detailed response!

What is VL-API doing different then? In these examples I do see compact-formed code.
There is then also added lot of sugaring syntax code to make these compact forms possible, which needs precise maintenance as well? Or there it is somehow auto-generated?

@domoritz
Copy link
Member

@domoritz, I noticed this tweet from you stating a potential project around automatic API generation. Is this something to wait for for Altair?

I'm thinking about a project to generate APIs directly from TypeScript (instead of from JSON schema). The goal is to support multiple languages as well. @jheer and I talked about it briefly last week. However, it's not more than an idea right now so Altair should definitely not wait for it. The tool would probably be built on ideas from this thread: vega/ts-json-schema-generator#101.

@firasm
Copy link

firasm commented Oct 25, 2021

I don't think it's that hard, it will just take some time and some experimentation. I estimate I could probably upgrade Altair to VL4.9 with maybe a ~week of engineering work.

I'd love to sponsor this work with money (since I have 0 capacity to take it on myself)!

I'm not sure how much a week of engineering costs, but if there are others willing, I'm sure we could put our wallets together for this.

Let me know if this is something we should explore as a model to continue supporting this amazing library.

@ChristopherDavisUCI
Copy link
Contributor

I've been experimenting with trying to allow vega-lite schema 4.17.0 with Altair. Adding to some code of @mattijn from a few months ago, I have some things working in this branch.

For example, this code works

import pandas as pd
import altair as alt

df = pd.DataFrame({"values":[12, 23, 47, 6, 52, 19]})

c1 = alt.Chart(df).mark_arc(innerRadius=20,stroke="#fff")

c2 = alt.Chart(df).mark_text(radiusOffset=10).encode(
    text = "values:Q"
)

(c1+c2).encode(
    theta = alt.Theta("values:Q", stack=True),
    radius = alt.Radius("values", scale=alt.Scale(type="sqrt",zero=True,rangeMin = 20)),
    color = "values:N"
)

to produce this chart:
offset

and this code works

import altair as alt
from vega_datasets import data

source = data.stocks()

chart1 = alt.Chart(source).mark_line().encode(
    x = alt.X("date"),
    y = alt.Y("price"),
    color = "symbol"
)

chart2 = alt.Chart(source).mark_rule().encode(
    y = alt.YDatum(350)
)

chart1+chart2

to produce this chart:
rule2

Most of the errors I've introduced seem to come from RepeatChart. I can't get the tests to run in a helpful way unless I remove choropleth_repeat.py from the examples section. Once I remove that from the examples section, if I then run pytest --doctest-modules altair (not sure if that's the natural thing to run or not... I was copying some code from GitHub tests), then I get this response:

FAILED altair/utils/tests/test_core.py::test_parse_shorthand_all_timeunits - AssertionError: assert {'field': 'we...quantitative'} == {'field':...
FAILED altair/vegalite/v4/tests/test_api.py::test_chart_from_dict - AssertionError: assert <class 'dict'> is <class 'altair.vegalite.v4.api.Rep...
FAILED altair/vegalite/v4/tests/test_api.py::test_resolve[repeat] - ValueError: <class 'altair.vegalite.v4.api.RepeatChart'> object has no attr...
=================================================== 3 failed, 432 passed, 145 skipped in 9.72s ===================================================

I have enjoyed working on it as a learning experience, and am very open to any guidance. If anyone wants to comment without spamming the rest of the thread (I will try not to post often), my email address is daviscj@uci.edu

@mattijn
Copy link
Contributor Author

mattijn commented Oct 26, 2021

Great progress for v4.17 @ChristopherDavisUCI!

@jakevdp, can you, and are you willing to allocate time for one-off issues like these during normal working hours? If yes, this probably requires organizing sponsors from the Altair community to hire you through your employer?

I don't know you and your employer position, but like @firasm said, I think the funds can be organized.

@PaleNeutron
Copy link

Hey, more than one year since this issue opened and I have see GitHub actions testing vega-light version 5.2.

Is there any plan to upgrade altair to 5.x?

@mattijn
Copy link
Contributor Author

mattijn commented Aug 4, 2022

Thanks for checking in. As realized early on, the change from VL4 tot VL5 is significant.

Over the last year there has been quite extensive progress.

This means that the development version of altair repo on GitHub is already on VL5.

Basic charts do work fine and most of the compound charts are working fine too, see this discussion for the path to release a first release candidate.

Remaining is maintenance, refactoring and restyling docs.

@joelostblom
Copy link
Contributor

There is now a release candidate of Altair 5, which supports Vega-Lite 5! Details and info on how to install can be found in #2937

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants