Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: Add polyglot specs to example gallery ? #7758

Open
behrica opened this issue Oct 16, 2021 · 46 comments
Open

Documentation: Add polyglot specs to example gallery ? #7758

behrica opened this issue Oct 16, 2021 · 46 comments

Comments

@behrica
Copy link

behrica commented Oct 16, 2021

The Clojure Data Science community is making heavy use of vega lite for plotting.
It has become the dominant technology for doing plots in Clojure.

The adaption of vega (lite) in Clojure would be eased if there would be somewhere a gallery of examples, using the Clojure syntax for the Json lite specs, as an example:

{:$schema "https://vega.github.io/schema/vega-lite/v5.json"
 :data {:url "data/cars.json"}
 :description "A scatterplot showing horsepower and miles per gallons for various cars."
 :encoding {:x {:field "Horsepower" :type "quantitative"}
            :y {:field "Miles_per_Gallon" :type "quantitative"}}
 :mark "point"}

We could either create an Clojure specific example galery ourselves (reusing/copying the VL examples content of this repository),
or the official example gallery could be changed, to become "polyglot",
so showing the examples "not only in Json".

This could be done similar to some "polyglot examples of some APIs, just for illustration:

image

My question / proposal would be if you think it would be a good idea to change basically all the "individual example web pages", like: https://vega.github.io/vega-lite/examples/point_2d.html

to become "polyglot" and show the example not only in Json.

Clojure could be a first starting point.

@domoritz
Copy link
Member

Thank you for submitting the issue. @kanitw and I have been talking about this before. It would be lovely to have examples/docs in Vega-Lite, Altair, Vega-Lite-API and others. The question is how we best set this up and maintain it. Ideally, we would translate all examples automatically to another language. Were you thinking of implementing that or some way to manually add an example in a different language? That could also work and the community could gradually contribute examples.

@behrica
Copy link
Author

behrica commented Oct 23, 2021

In Clojure it easily possible to translate into "EDN".
I have done this here:

https://github.com/behrica/vl-galery/blob/577329ab0c81e4bd61eccda7bc07a6aa7110104c/src/convert.clj#L46

This is the most common usage of vega in Clojure.
As it was so easy, I made a toy web page with all Vega lite examples:
https://behrica.github.io/vl-galery/convert/

So EDN could be fully automated.

@behrica
Copy link
Author

behrica commented Oct 23, 2021

In the Clojure community, we are currently testing an other library for Vega lite, which is a bit higher level then Vega lite itself:
We call them "Hanami templates", see here:
https://github.com/jsa-aerial/hanami#simple-cars

These would be needed to be done by hand I suppose.

There is some discussion here: https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/vega-lite.20examples.20in.20Clojure

For this I would think indeed of having a specific folder somewhere on your website, in which the community could contribute over time one example after the other.

This is basically the best solution, even in case of automation for EDN. Then we would create the needed files automatically in one go.

@behrica
Copy link
Author

behrica commented Oct 23, 2021

In any case, we would be very happy about this.

There seems to be agreement that the possibility of simple copy/paste into Clojure code files from the very complete examples vega-lite gallery would boost the adoption of Vega in Clojure.

@domoritz
Copy link
Member

domoritz commented Oct 23, 2021

Sounds good. Do you have a proposal for a specific setup? Could you send a pr when we agree one one?

We could also have a separate page similar to how Altair handles it. Notice the quick link to Altair from the menu bar of the Vega and Vega-Lite websites.

@behrica
Copy link
Author

behrica commented Oct 24, 2021

One part of the setup could be, that we put in the folder
vega-lite/tree/master/examples/spec

one extra file for each sample, just with suffix ".edn",
so arc_facet.edn

on this I could do a PR, for sure.

@behrica
Copy link
Author

behrica commented Oct 24, 2021

I have no experience in coding websites, so the next part I could probably not do.
Which would be ro change the mechanism, which generates the examples.

In this script ?

function createPage(example) {

To generate a "tabbed view", where the current spec is the first tab, and there is a second tab called "edn",
which is just the content of the files like "arc_facet.edn" from folder vega-lite/tree/master/examples/spec

@behrica
Copy link
Author

behrica commented Oct 24, 2021

For "edn" it could in theory work to contribute a "script" as well, which generates the edn files from the .json specs.
But it needed to be written in "Clojure", which would add a new tool to your "web site creation".

And this approach would not work for "hanami templates", they needed to be "done by hand by the Clojure community" over time.

@behrica
Copy link
Author

behrica commented Oct 24, 2021

I have no experience in Javascript and Typescript neither.

@behrica
Copy link
Author

behrica commented Oct 24, 2021

I think this "approach" so:

  • a tabbed view for the examples
  • a mechanism which copies the different examples (edn, altair, hanami templates) into the right place.
  • we differentiate the different languages by "file extension"
  • the different languages are always "Optional", so the "tab stays empty", if file not found

would scale for other languages.

Main drawback, is that any change in the examples (an addition of an example specifically), need to trigger some work by the "language community" to add the example in each of the languages.
But if the tab handles "missing examples" gracefully, this should be fine.

@behrica
Copy link
Author

behrica commented Oct 24, 2021

I don't think that the number of potential languages could be a problem over time.
How many programming languages do use vega / vega lite actively ?
< 10, I would think.
--> 10 tabs max over time

This approach could be proposed as well to the "R community",
looking here: https://cran.r-project.org/web/packages/vegawidget/vignettes/vegawidget.html

same for python, Altair

@behrica
Copy link
Author

behrica commented Oct 24, 2021

I got something working, in an prototype way:

image

Only the syntax highlighting inside the tab is not working.
So I left the original spec as-is.
The json tab contains the same, while the edn tab contains Clojure:

image

@behrica
Copy link
Author

behrica commented Oct 24, 2021

But maybe this is even the most useful arrangement, to be able to see Json Spec and EDN together.

@domoritz
Copy link
Member

domoritz commented Oct 25, 2021

I think tabs make sense. I've seen a similar pattern on other documentation websites.

The style and setup needs some work, though. It would definitely be great if we could avoid another dependency or at least have a clean automatic process for generating everything (maybe even in gh actions). I don't have the cycles to spend much time on this right now as I have to work on other things that keep the lights on.

@behrica
Copy link
Author

behrica commented Oct 25, 2021

Clojure dependencies could be avoided by adding the "edn" files into the right place.
"examples/specs/*.edn"
Going this route would then only need an additional change in a single file:

site/_includes/example.html

@behrica
Copy link
Author

behrica commented Oct 25, 2021

For the "hanami" specs, which would be contributed over time, my proposal would be to:

  • have them as well all present in the vega-lite/tree/master/examples/spec folder, initially containing text "TODO"
    • an initial PR could contain them all with "TODO" and over time the community would make PRs to replace the placeholder files with the real content:

image

The same could be done for "Altair" and "R", once the community starts to create the examples.

@behrica
Copy link
Author

behrica commented Oct 25, 2021

This would result in a bit more work for new contributors of new examples.
They needed to provide as well the empty dummy files for all languages.

But "forgetting" those, would make the build of the site fail, so it would be easy to spot.

@domoritz
Copy link
Member

I think we should only show tabs when there are examples. Otherwise, the website will be incomplete and cause frustration. Not having a tab is a good indicator.

I would expect a proposal pull request to have examples for Altair, JS API, R, Closure, and also have docs for how to add examples in the various languages. We should also have some infrastructure for testing the examples so that we don't accidentally break things.

@behrica
Copy link
Author

behrica commented Oct 26, 2021

I contacted the people in vegawidget (R) and altair, but I am no sure, if I can bring them on board.
Specially the Altair docu is very complete already now.
I could do one example in all languages myself and work a bit on the instructions for contributions

@behrica
Copy link
Author

behrica commented Oct 26, 2021

I would expect a proposal pull request to have examples for Altair, JS API, R, Closure, and also have docs for how to add examples in the various languages. We should also have some infrastructure for testing the examples so that we don't accidentally break things.

The testing infrastructure seems to be a lot of work.
For Clojure there is not a single library (but multiple) which are able to "render" the edn files into screen or svg or image buffers. (and they all go via Json + existing javascript libraries)
I can understand that you want to guarantee "quality" of the examples.

I will think about it.

@behrica
Copy link
Author

behrica commented Oct 26, 2021

I made an draft PR here: #7774

@metasoarous
Copy link
Contributor

Thanks for considering this @domoritz! This would be super helpful for us.

Just confirming here that for the Clojure community we have a bunch of different tools that use Vega(-Lite), and often each can be used in different ways (APIs for static compilation, live browser views via a repl, embedding in more complex documents, etc). So really focusing on the type of data representation (JSON, EDN, etc) makes the most sense for us. However, if there's already testing infrastructure for the JSON examples, I think it would suffice to simply test that the EDN examples translate back to the equivalent JSON, since all of our libraries more or less do that, and then pass the JSON off to the Vega libs running on a JS environment. Does this seem reasonable to you?

I agree that it would be ideal to have GH actions for simple data translations that can be performed, like JSON -> EDN. Presumably we could also do JSON -> R list representations (al. a the vegawidget API), though this may be trickier, since R list data doesn't (easily) print in a way that can be evaluated as literals, as with Clojure/EDN or JS/JSON (may require some code that manually traverses the nested list structures and prints out list(x = list(field = "wt", type = "quantitative"), y = list(...)), etc.). I can potentially help adapt @behrica's work into some GH actions for EDN, but we may need to enlist the help of the R community to see if the same would be possible for them (I started a StackOverflow post asking about this).

Thanks again!

@metasoarous
Copy link
Contributor

Looks like deparse (and dput, which just prints out deparse lines) may get close to what we want for R, but falls a bit short IMHO, as it doesn't always output code you'd actually expect to see in the wild. E.g.

> spec <- list(
  description="An mtcards example",
  data = list(),
  mark = "point",
  encoding = list(
    x = list(field = "wt", type="quantitative"),
    y = list(field="mpg", type="quantitative"), 
    color = list(field="cyl", type="nominal")))

> dput(spec)
list(description = "An mtcards example", data = list(), mark = "point",
    encoding = list(x = list(field = "wt", type = "quantitative"),
        y = list(field = "mpg", type = "quantitative"), color = list(
            field = "cyl", type = "nominal")))

Probably not the end of the world to see multiple list entries per line when there has to be a split, but I wouldn't expect to see a single list entry split across multiple lines when it could fall on a single line following (i.e. color = list( ....). Maybe this will be a fine fallback in case there isn't a manual translation though, in case no one is able to provide a better solution?

@domoritz
Copy link
Member

I think it would suffice to simply test that the EDN examples translate back to the equivalent JSON, since all of our libraries more or less do that, and then pass the JSON off to the Vega libs running on a JS environment. Does this seem reasonable to you?

Yes, that makes sense.

@behrica
Copy link
Author

behrica commented Oct 27, 2021

FYI , I started as well 2 discussions here:

vega/altair#2511
vegawidget/vegawidget#172

@behrica behrica changed the title Documentation: Add Clojure specific specs to example gallery ? Documentation: Add polyglot specs to example gallery ? Oct 27, 2021
@metasoarous
Copy link
Contributor

Looks like the pieces do already exist to do this automatically with R; In short f <- function(x) formatR::tidy_source(text = capture.output(dput(x)), args.newline = TRUE) (see comments in SO post above).

For the JSON -> EDN con version GH action, I think we may actually just be able to use jet, which makes this pretty darn easy (jet --from json --to edn -k -p), and is available as a simple binary (compiled with GraalVM, I believe). I just verified that it preserves the order of map/object entries, and so should be a pretty good fit. The only reason I can see us not wanting to use it verbatim is that I frequently like to use keywords for values, instead of strings, since it's quicker/quicker to type that way, and makes things like autocompletion and grepping around a file for attribute keywords easier (e.g. {:x {:field :some-parameter ...} ...} instead of {:x {:field "some-parameter" ...} ...}). So if we feel like it's worth it, we can write something a bit more custom that turns value strings (without spaces; or maybe only strings mapped to by certain attributes) into keywords. Curious to hear thoughts on this...

Thanks again

@behrica
Copy link
Author

behrica commented Oct 28, 2021

Yes EDN is automatable.
R maybe,
Hanami not
Altair not.

So I still think that "file generation" is better then "automation during site build". but I am open to GH actions, but I never used them.

My code for the "file generation" is here:
https://github.com/behrica/vl-galery/blob/e89e6d039a85493166338ce84225f87700d1a025/src/convert.clj#L45
"jet" produces commas, which I don't like, but there is workaround:

borkdude/jet#94

@behrica
Copy link
Author

behrica commented Oct 28, 2021

(That's why I used pugget, can do maps without commas)

@metasoarous
Copy link
Contributor

Agreed; I don't mind commas too much, but I tend not to use them, so prefer copying code that leaves them out, for consistency's sake.

As pointed to in the comment you linked to, puget is also available as a binary, so the simple solution is still pretty simple.

@kanitw
Copy link
Member

kanitw commented Oct 31, 2021

Sorry I'm a bit late to the party here. Thanks so much for doing this.

Polyglot example is a GREAT initiative and I think it would be an amazing contribution to our communities. I've been hoping we'll have this for a long time.

I hope we can find a mechanism that can make this sustainable without introducing significant maintenance burdens to Vega-Lite maintainers and contributors.
After all, we're mostly maintaining this library in our personal time (not at work).
So we're trying to use our time efficiently, so we can continue to contribute features/bug fixes that benefit everyone.

Here are some thoughts:

  • I hope we can minimize code reviews in the main Vega-Lite repos. Note that Vega-lite maintainers do not have expertise for many of the wrapper languages. So if we include code examples in other languages in the main repo, we will have to review a number of PRs, but practically we'll just do "rubber stamps". Basically we don't really understand the diff, but have more burdens to look at them and probably will click merge buttons without understanding much (and if we're busy, we may unintentionally block people when we are busy).

  • For this reason, I strongly prefer that we don't include examples for other languages in the main Vega-Lite repo. I'd prefer that we do automation during site build` or even automation within the website (just run JS code to convert).

  • I propose that contributors can contribute a transpiler module that Vega-Lite's website build can run and generate the examples. (But these generated files shouldn't be committed, and thus won't add significant review burdens.)

  • Testing infrastructure for each transpolar should be in the transpiler module.

  • In each language's example tab,

    • We should set expectation that the example is a community contribution point people to the transpiler repo for issues. So if the generated example doesn't work, they can file an issue in the transpiler repo. (If we don't do so, we'll get a number of issues filed in Vega-Lite repo, but we'll have no idea how to solve.)
  • As a transpiler module gets improved and has a new version released, we're happy to review PRs to upgrade the transpiler for the website. This should be relatively simple PRs, which just update the version number in our website build's dependencies. I know that Dominik says ideally we shouldn't have dependency, but I think dependency is better than flood of examples PR for different languages that the Vega-Lite maintainers don't know very well.

  • Unlike what @domoritz suggests, I wouldn't worry about adding one example per language. In fact, I'd rather include only languages that have a comprehensive convertor. To make this polyglot example effort sustainable, I strongly don't want to encourage that the community should keep submitting with PRs that contribute one example in one language, which isn't a scalable/sustainable solution for the main Vega-Lite repo IMHO. (We could still encourage people to submit examples to each wrapper's website, but I think we should only use scalable solution for the main Vega-lite website.)

I know that this is a bit different from what Dominik suggests, but I wanna make sure that we can do this in a sustainable fashion.

@jwoLondon
Copy link
Contributor

Good to see the number of languages expanding. I can see the logic of a transpiler solution for several of them as the 'other' language can map well onto its Vega-Lite JSON equivalent. I can also see the reason to avoid a PR-per-example solution which is clearly not scalable for the core team.

I develop (and have done for the last 4 years), a parallel set of examples in the Elm language (and also do the same for Vega). Additionally I have a set of literate visualization examples that map closely to the vega-lite examples. For elm-vegalite at least, I don't think a transpiler solution would work. Not only because there are a few differences that aren't easily mapped computationally, but also to take advantage of the language, one might deliberately chose a different approach (e.g. perform data-shaping externally rather than as a Vega-lite transform).

I wonder therefore if a slightly more flexible approach would be to have pointers on the main Vega-Lite page to other 'approved' language versions, that may in some cases be the transpiled examples, in others to approved repos or web pages. This would require a one-off and occasional periodic review (to check sites hadn't stagnated) so should be scalable.

@behrica
Copy link
Author

behrica commented Oct 31, 2021

For the EDN format (the main one used by Clojure),
full automatic conversion works perfectly.

for Hanami (the second format used by Clojure), automatic conversion does not work at all.

Going this way (and importing the files during the build of the site (or via Github actions) is for sure possible,
but requires deep understanding of the vega site build process, which I don't have.

So the Clojure community could indeed maintain "somewhere" some files in Github, which get imported into the vega-site build in some form

@behrica
Copy link
Author

behrica commented Oct 31, 2021

Specially if this should be done via "git hub actions" vs "bash scripts" I cannot really make a proposal.
It seems that the maintainers here would strongly suggest to

  • have the language specific spec files hosted outside vega-lite repo
  • have them pulled it in some form during build

This is fine for me.

If somebody gives me some hints how this should be done, I could have a look at this.
This would likely be a complete new PR.

@behrica
Copy link
Author

behrica commented Oct 31, 2021

Maybe to add here that the Clojure community does not maintain a proper Clojure vega-lite specific example page.
(one reason being that json -> EDN conversion is very easy / automatable)
I did one very rough version here:
https://github.com/behrica/vl-galery

But in this repo the EDN specs are not present "as files", they get for the moment generated "on the fly" by some Clojure code which generates the html page: https://github.com/behrica/vl-galery/blob/e89e6d039a85493166338ce84225f87700d1a025/src/convert.clj#L46
-> html side: https://behrica.github.io/vl-galery/convert/

But this repo could be "extended" easily to host the END / Hanami example specs as files.
So the maintenance of them would happen via PRs there.

We needed probably agree on a "file system structure".
This could be some think link like:

  • EDN/arc_donut.vl
    bar.vl

-hanami/arc_donut.vl
bar.vl

If understand correct the full list of "examples" comes from this file: https://github.com/behrica/vega-lite/blob/polyglot-examples/site/_data/examples.json
and the "name" is the base file name.

@behrica
Copy link
Author

behrica commented Oct 31, 2021

This would then somehow lead that a build of the vega lite site would:

  • read a (new) "language configuration file" listing the language specific git hub sites
  • do a "git clone" of https://github.com/behrica/vl-galery (and oher language specific repos)
  • copy the files in the right place for "yarn site" to work

@behrica
Copy link
Author

behrica commented Oct 31, 2021

Such a "language configuration" file (to be present in the vega-lite repo), could look like this

{
"EDN" : { "git-clone-url": "https://github.com/behrica/vl-galery", "spec-directory": "specs/EDN"}
 "hanami" : { "git-clone-url": "https://github.com/behrica/vl-galery", "spec-directory": "specs/hanami"}
 }

@metasoarous
Copy link
Contributor

Thanks for sharing your thoughts and concerns @kanitw. I absolutely understand where you're coming from with respect to keeping the maintenance burden manageable for the Vega dev teams.

I'd prefer that we do automation during site build` or even automation within the website (just run JS code to convert).

In this case, I think the best way to facilitate EDN/Clojure transpilation would be in the build phase. Assuming it's easy enough to call out to a shell command from the build phase, we can just use the binary command line tools jet and puget, which are already very well tested for doing exactly what we need (translating and formatting JSON -> EDN).

If it's preferable not to be calling out to shell commands in the site build phase, we could either:

  • Use ClojureScript to compile a JS module for making these translations, and use this at either the build phase or dynamically on the website.
  • Use a GitHub action (@behrica: which can basically just be a shell command that automatically gets run based on whatever triggers we set up) with jet & puget to create/update the EDN files after PR's have been merged, meaning the Vega team won't have to review the changes.

What seems most reasonable to the Vega team?

Thanks again

@behrica
Copy link
Author

behrica commented Oct 31, 2021

@metasoarous
Your proposal could indeed solve it for EDN.
But I would like to have as well a way to have hanami specs, and they cannot be translated automatically.
Other languages (python + Altair) cannot be done automatically neither

So the question is, if to have a common way for all non json specs, or different ways for the "automatable" formats vs "hand made formats".

@metasoarous
Copy link
Contributor

Agreed; I too would love to have Elm, Altair, etc. examples supported on the main site. However, I'm not sure of the right way to do that given the concerns @kanitw has expressed.

If I understand your suggestion @behrica, it's to manually compile examples for each of these non-automatable languages/APIs in separate repositories, and import them as modules for the example site. Is this what you're thinking?

This would potentially reduce the number of PRs that the Vega team has to review, since they could just update their pointers to these separate repositories once all the translations were complete, and then periodically when updates are necessary. However, this still poses a challenge as far as making sure that the examples are kept in sync, and I'm not sure how to get around this issue. What are the Vega team's thoughts on this?

@domoritz
Copy link
Member

domoritz commented Nov 2, 2021

Thank you for your comment @kanitw. I absolutely agree that we need to keep the maintenance burden for the core developers low.

I'm not sure a transpiler will completely eliminate that overhead, though. We will still get issues against the examples even if they are generated by a transpiler. So whether we use a transpiler or hard-coded examples (pulled in from another repo so we move issues there?) doesn't make a huge difference to me. I do prefer a transpiler, though, since it would automatically work for all examples. You have a good point, though, that we don't want to have to review pull requests so I also agree with you now to pull the examples out into another repo.

I agree that we do need a fully automated process built into the site build script. That's going to take some digging into the build scripts and GitHub actions and is not something the core developers will have time for (but are of course happy to help with any advice!).

@kanitw
Copy link
Member

kanitw commented Nov 2, 2021

@metasoarous

compile examples for each of these non-automatable languages/APIs in separate repositories, and import them as modules for the example site. Is this what you're thinking?

I think this will work fine for my concern. :)

However, this still poses a challenge as far as making sure that the examples are kept in sync, and I'm not sure how to get around this issue. What are the Vega team's thoughts on this?

If example authoring requires manual authoring, this will pose a challenge whether the examples are in the main Vega-Lite repo or not.

That's why I think we should try to do automatic conversion for wrappers/language that could use the approach..
For wrappers that need manual example authoring, I think should make it set clear expectation that the examples are community-driven and thus can be outdated. Users are welcomed to contribute by submitting changes to <the repo>.


@behrica

So the question is, if to have a common way for all non json specs, or different ways for the "automatable" formats vs "hand made formats".

I think we can do a bit differently. For the automatable one, we can make Vega-Lite's website build already re-run the example auto-generation script and thus we will mostly have the examples generated. For the manual ones, then we will indeed need to.

@kanitw
Copy link
Member

kanitw commented Nov 2, 2021

@domoritz

That's going to take some digging into the build scripts and GitHub actions

I think you already setup the Github actions to call the website build scripts when we release a new version?
If so, I think people only need to add one extra script in the "presite" command in package.json?

@domoritz
Copy link
Member

domoritz commented Nov 2, 2021

Possibly. Adding anything to the scripts always needs some amount of fiddling and I recommend keeping it as simple as possible.

@behrica
Copy link
Author

behrica commented Nov 15, 2021

Adding this:

ls examples/specs/*.vl.json | parallel "cat {} | jet --from json --to edn --keywordize | puget --opts '{:map-delimiter \"\" :print-color false}' > {.}.edn "

which needs 2 command line tools installed: "jet" and "puget"
would generate the Clojure edn files for all examples

@behrica
Copy link
Author

behrica commented Nov 15, 2021

This could be added as last line in "build-examples.sh"

@behrica
Copy link
Author

behrica commented Nov 15, 2021

I restarted the PR here: #7774

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants