Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: should Layer() not derive from BaseObject? #93

Closed
jakevdp opened this issue May 19, 2016 · 34 comments
Closed

API: should Layer() not derive from BaseObject? #93

jakevdp opened this issue May 19, 2016 · 34 comments
Labels

Comments

@jakevdp
Copy link
Collaborator

jakevdp commented May 19, 2016

Since Layer is the main interface, it would be nice if tab completion on the object only listed relevant pieces of the API so that you can quickly find what plot types are available (e.g. point(), bar(), text(), etc.)

Currently, since it derives from BaseObject the namespace is polluted with all sorts of traitlet stuff that the user probably doesn't care about.

I'd propose something like this:

class LayerObject(BaseObject):
    # traitlet-related stuff goes here
    def __init__(self, *args, **kwargs):
        super(LayerObject, self).__init__(**kwargs)

    # etc.

class Layer(object):
    # non-traitlet-related Layer methods here
    def __init__(self, *args, **kwargs):
        if len(args)==1:
            self.data = args[0]
        self._layerobject = LayerObject(**kwargs)

    def point(self):
        self.mark = 'point'
        return self

    # etc.

The only problem would be if we ever want to pass Layer to some other class this would complicate things. What do you think?

@ellisonbg
Copy link
Collaborator

What about defining a dir

Method?

Sent from my iPhone

On May 18, 2016, at 10:08 PM, Jake Vanderplas notifications@github.com wrote:

Since Layer is the main interface, it would be nice if tab completion on the object only listed relevant pieces of the API so that you can quickly find what plot types are available (e.g. point(), bar(), text(), etc.)

Currently, since it derives from BaseObject the namespace is polluted with all sorts of traitlet stuff that the user probably doesn't care about.

I'd propose something like this:

class LayerObject(BaseObject):
# traitlet-related stuff goes here
def init(self, _args, *_kwargs):
super(LayerObject, self).init(**kwargs)

# etc.

class Layer(object):
# non-traitlet-related Layer methods here
def init(self, _args, *_kwargs):
if len(args)==1:
self.data = args[0]
self._layerobject = LayerObject(**kwargs)

def point(self):
    self.mark = 'point'
    return self

# etc.

The only problem would be if we ever want to pass Layer to some other class this would complicate things. What do you think?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 19, 2016

Ah – that's a good idea! I'll try that.

@ellisonbg
Copy link
Collaborator

Traitlets even offers a method for getting a list of all the traitlets attributes - we could combine that with a list of the public methods

Sent from my iPhone

On May 19, 2016, at 9:22 AM, Jake Vanderplas notifications@github.com wrote:

Ah – that's a good idea! I'll try that.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 19, 2016

Another option: what if we change point(), bar(), etc. to draw_point(), draw_bar(), etc. Then users could tab-complete draw_<TAB> to see all the mark methods available.

@ellisonbg
Copy link
Collaborator

Or maybe mark_* to be consistent with the VL naming of things ?

Sent from my iPhone

On May 19, 2016, at 9:30 AM, Jake Vanderplas notifications@github.com wrote:

Another option: what if we change point(), bar(), etc. to draw_point(), draw_circle(), draw_bar(), etc. Then users could tab-complete draw to see all the mark methods available.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

@kanitw
Copy link
Member

kanitw commented May 20, 2016

I have alternative syntax to propose here. I'm not married to this idea yet, but think it's worth considering.

Currently Altair rely on method chaining to describe mark.

.encode(x=X('Horsepower'), y=Y('Miles_per_Gallon')).point()

What if we make mark first parameter of .encode()?

.encode('point', X('Horsepower'), Y('Miles_per_Gallon'))
// or 
.encode(point(), x=X('Horsepower'), y=Y('Miles_per_Gallon'))

Especially if we call the top-level object a View

View(data).encode('point', X('Horsepower'), Y('Miles_per_Gallon'))

I feel like this seems more natural, in a sense that we encode points by mapping Horsepower to X and Miles per Gallon to Y. (Not sure how much work would this introduce though.)

@ellisonbg
Copy link
Collaborator

One of the core abstractions in altair is that everything is a strongly
typed attribute and that we only add extra methods that provide a
convenience layer on top one attribute at a time.

Thus, the Layer.encoding attribute has a encode method. The
Layer.config has a configure method. Putting the mark type into the
encoding confuses that model and also forces us to do more complex parsing
or positional args.

Lastly, tab completion of the API is pretty important as in the notebook it
is one of the main ways that users discover APIs. Thus, we are looking at
renaming the current mark methods to things like mark_line and
mark_point to allow users to type l.mark_<tab> to see all of the mark
types. Putting the mark as a positional inside the encode method makes
that entire part of the API much less discoverable.

On Thu, May 19, 2016 at 5:32 PM, Kanit Wongsuphasawat <
notifications@github.com> wrote:

I have alternative syntax to propose here. I'm not married to this idea
yet, but think it's worth considering.

Currently Altair rely on method chaining to describe mark.

.encode(x=X('Horsepower'), y=Y('Miles_per_Gallon')).point()

What if we make mark first parameter of .encode()?

.encode('point', X('Horsepower'), Y('Miles_per_Gallon'))
// or
.encode(point(), x=X('Horsepower'), y=Y('Miles_per_Gallon'))

Especially if we call the top-level object a View

View(data).encode('point', X('Horsepower'), Y('Miles_per_Gallon'))

I feel like this seems more natural, in a sense that we encode points by
mapping Horsepower to X and Miles per Gallon to Y. (Not sure how much
work would this introduce though.)


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/ellisonbg/altair/issues/93#issuecomment-220488589

Brian E. Granger
Associate Professor of Physics and Data Science
Cal Poly State University, San Luis Obispo
@ellisonbg on Twitter and GitHub
bgranger@calpoly.edu and ellisonbg@gmail.com

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 20, 2016

I make the mark_point change in ellisonbg@8a593b5

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 20, 2016

On the tab-completion question – is there a way to make the traitlets keyword arguments show up in tab completion? That would be extremely helpful!

@kanitw
Copy link
Member

kanitw commented May 20, 2016

Lastly, tab completion of the API is pretty important as in the notebook it
is one of the main ways that users discover APIs. Thus, we are looking at
renaming the current mark methods to things like mark_line and
mark_point to allow users to type l.mark_<tab> to see all of the mark types.

I like the tab idea. So please ignore my comment above. :)

That said, I wonder why you have to put mark after encode in the examples?

For example,
View/Layer(data).encode(x=X('Horsepower'), y=Y('Miles_per_Gallon')).mark_point()

Why not:
View/Layer(data).mark_point().encode(x=X('Horsepower'), y=Y('Miles_per_Gallon'))

I think the latter is more consistent with how we write specs in Vega-Lite.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 20, 2016

To be clear, either order works currently.

@kanitw
Copy link
Member

kanitw commented May 20, 2016

To be clear, either order works currently.

Yep. I mean, unless you have strong preference, can we put mark_point() before encode() in the all the example so we can establish convention? (For the same reason, in all VL examples, we put mark before encoding although order does not really matter in JSON. :) )

@ellisonbg
Copy link
Collaborator

I don't feel too strongly about the ordering but like the idea of being consistent with the vegalite examples. So +1 to marking first

Sent from my iPhone

On May 20, 2016, at 9:44 AM, Kanit Wongsuphasawat notifications@github.com wrote:

To be clear, either order works currently.

Yep. I mean, unless you have strong preference, can we put mark_point() before encode() in the all the example so we can establish convention? (For the same reason, in all VL examples, we put mark before encoding although order does not really matter in JSON. :) )


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 20, 2016

Somewhat related: how would you feel about the mark_*() methods take optional arguments that pass to MarkConfig?

For example, here's the current API for creating a red scatterplot:

Layer(data, config=Config(mark=MarkConfig(color='red'))).mark_circle(
).encode(
          x='Horsepower',
          y='Miles_per_Gallon',
)

Here's how we might do the same thing with mark keywords:

Layer(data).mark_circle(
         color='red'
).encode(
          x='Horsepower',
          y='Miles_per_Gallon',
)

@kanitw
Copy link
Member

kanitw commented May 21, 2016

how would you feel about the mark_*() methods take optional arguments that pass to MarkConfig?

I would avoid adding syntax that diverge too much from Vega-Lite syntax because that would lead to confusion when you know one version and want to transition to use another.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2016

The mark_*() functions already diverge from Vega-Lite syntax... if you want to stick to exact mirroring of Vega-Lite you can feel free to do Layer(data, mark='point', ...) instead. You could build an entire plot using the low-level object wrappers for the vega-lite spec, without ever calling any of these methods!

My feeling is that Vega-lite is a specification (not an API), while Altair is an API (not a specification), so we should feel free to make these sorts of choices to make life easier for the user.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2016

Now you might ask why we have Layer().mark_point() at all, when Layer(mark='point') is almost identical as far as keystrokes and information content. In my mind, the first is much more usable than the second, because it allows the user to take advantage of IPython's tab-completion.

In the first, if you forget all the mark options you type Layer.mark_<TAB> and they will appear in a list. In the second, if you forget all the mark options, you have to stop what you're doing and go search for the right section of the code documentation. Our goal here should be to avoid the need for that type of context switching as much as possible.

@kanitw
Copy link
Member

kanitw commented May 21, 2016

My feeling is that Vega-lite is a specification (not an API), while Altair is an API (not a specification)

I don't think so, for direct Vega-Lite users, our syntax is our API as well.
Also eventually Altair code that calls Altair API will be a visualization specification as well (in form of Python) -- just like D3 code is a form of visualization specification.

So keeping similarity between the two projects would benefit both projects in longer term.
IMHO, we should design Altair in a way that we can write a function that converts VL JSON to Altair easily in the future and use this function to add an "Altair" tab to Vega-Lite docs in the future.
(For now, let's do docs in a notebook form first.)

Now you might ask why we have Layer().mark_point() at all

I actually don't feel against mark_point() because of tabbing reason like you just say. At least structurally they are at the same level of the syntax tree. (no matter Layer(mark='point') or Layer().mark_point() -- it's still a top level property.)

For config, this is a much bigger jump. ( config.mark.color becomes mark.color in the syntax tree in this proposal.)
Plus, there are many things under config and if only config.mark get shortcut like this (but not other type of config.*), it also introduces inconsistency that can be very confusing and would make it impossible to add an "Altair" tab to Vega-Lite docs in the future.

For color's case, you can actually do

{ 
   ...
   encoding: {
     color: {value: 'red'}
   }
}

so I guess in Altair it could still be

.encode(Color(value='red'))

@kanitw
Copy link
Member

kanitw commented May 21, 2016

I guess one tricky part is that we can't do

.encode(color='red')  

because we can't distinguish between red as a field or red as a string value.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2016

If a Python version of Vega-lite is what you're after, why use Altair at all? You could define things as Python dictionaries, and use json.dumps directly.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2016

All I know is that if I tell people that to make points red simply requires

Layer(config=Config(mark=MarkConfig(color='red'))).mark_circle()

they'll laugh and uninstall the package.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2016

IMHO, we should design Altair in a way that we can write a function that converts VL JSON to Altair easily in the future and use this function to add an "Altair" tab to Vega-Lite docs in the future.

I'm working on this now – it doesn't require the one-to-one mapping you have in mind.

@kanitw
Copy link
Member

kanitw commented May 21, 2016

Ok. Your point about

Layer(config=Config(mark=MarkConfig(color='red'))).mark_circle()

or slightly better but still very terrible

Layer().mark_circle().configure(mark=MarkConfig(color='red'))

is a very good one. But I don't know if the proposed solution above is the right one.
(Other types of config would still be super annoying for the same reason.)

Is there a way to solve this that works for other type of config as well?

(Btw, we have a Vega slack channel. If you want we can create Altair channel there so @jheer @ellisonbg and @domoritz can be in the same room.)

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2016

I imagine that as we hit other pain points in configuration, we can come up with more API solutions similar to mark_point(color='red'). I landed on this as I went through the basic examples and tried to implement them in Altair. I don't know what the right solution will be as we go deeper into configs, but I believe that API-wise, exactly mirroring the VegaLite spec should not be our preferred solution.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2016

Maybe one way to have the best of both worlds is to make everything a top-level method. For example we could do something along the lines of

Layer().configure_mark_color('red')

which would be an alias of

Layer().configure_mark(color='red')

which would be an alaias of

Layer().configure(mark=MarkConfig(color='red'))

which would be an alias of

Layer(config=Config(mark=MarkConfig(color='red')))

A benefit would be that Layer.configure_<TAB> would give you all the available options – we'd just have to adjust the completer to make sure things come in the order that's convenient to the users.

@kanitw
Copy link
Member

kanitw commented May 21, 2016

The comment above is very interesting. Let's see what others think.

If we do this alias thing, do we still do the mark_point(color='red') thing? (If we do both, are we introducing too many ways to do the same thing?)

Below the HR are my old reply before reading your latest comment


Ok. I look at a few other vis spec language and think that maybe mark_point(color='red') is what we can settle on.

Here the problem is we want to distinguish between mapping to field and mapping to value.
In ggplot, they use aes to for field-mapping as in geom_line(aes(group=1), colour="#000099") and property of the mark/geom for mapping to value.

Given discusion above, we can't do like ggplot here because it introduces type inconsistency, which will screw up autocompletion. Thus, this solution maybe the right one. :)

I imagine that as we hit pain points in configuration, we can come up with more API solutions similar to mark_point(color='red').

Maybe that's the way to go.

FYI, I look at our configs.

The only ones that we have to think about are

  • Top-level Config
  • Cell Config
    • (TBH – I think our cell.width and height config is too deep in the syntax -- we probably want top level .width() and .height() syntax for Altair. In fact, I might introduce similar change in the JSON syntax, especially after we are done with Responsive Ordinal Scale (Ordinal Fit Mode) vega-lite#194) .
  • Axis / Scale / Legend Config: these three are not really need for one visualization. You can do similar things with encoding.{x,y,color,...}.{scale,axis,legend} but we add this config more for theming. (Correction: but this is still useful if you want to customize both axes at the same time)
  • Facet Config -- Facet will become an operator in the new composition syntax so we can use similar trick to mark_point() if we proceed down this route.

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 21, 2016

OK – interesting to hear that there are other places where Vega-Lite allows configurations in multiple places. It might make sense for the spec to move to doing that more: e.g. rather than mark being a string enum, make it a definition that includes configuration (that's essentially what we're bootstrapping here).

I think a good goal would be to make the config option more like a set of defaults for theming, with the goal that normal usage should never have to touch it directly. If you want the Vega-Lite spec to be an API, that could be very useful.

Another unrelated thought as you talk about the spec evolving: does the spec have any place to specify the version used? If not, it's going to be hard to let it evolve without breaking a lot of people's visualizations.

@ellisonbg
Copy link
Collaborator

Thinking a bit more...

I like the idea of having a set of configure_* methods that provide a shorthand notation for configuring each section of the config:

configure_axis()
configure_cell()
configure_mark()
configure_facet()
configure_legend()
configure_scale()

So then you can do things like Chart(data).mark_point().encode().configure_mark(color='red'). The benefit of this approach is that we never have to worry about name collision in the different config sections with a syntax like Chart(data).mark_point().encode().configure(attr1='foo', attr2='bar').

However, for the case of the mark_* methods, because there is no ambiguity about which section of the config you are interested, I would also support the mark_point(color='red') syntax. It will really help users and there is no risk of name collision.

I am not very fond of the Layer().configure_mark_color('red') syntax as it starts to hide the attribute based approach of altair a bit too much.

@kanitw
Copy link
Member

kanitw commented May 21, 2016

interesting to hear that there are other places where Vega-Lite allows configurations in multiple places.

Yes. To be fair, I was a little inaccurate last night. Scale config is really for theming. (But when you set scale/axis/legend config, it applies for scale/axis/legend of every applicable channel.)

It might make sense for the spec to move to doing that more: e.g. rather than mark being a string enum, make it a definition that includes configuration (that's essentially what we're bootstrapping here).

For Vega-Lite, I don't think we will do it. mark in Vega-Lite is just mark type. And encoding is for mapping either field or value to the channel (mostly visual properties) of the marks. Plus, config.mark isn't that bad to specify in JSON form. Not to mention the backward compatibility issue.

I think a good goal would be to make the config option more like a set of defaults for theming, with the goal that normal usage should never have to touch it directly. If you want the Vega-Lite spec to be an API, that could be very useful.

Right now it's sort of like that. Many mark config (that are properties of the marks) will eventually become encoding channel and support mapping to field -- but that requires implementing default scale range, and legend support. Since the most useful channels are already supported, we do not plan to implement this yet and this might happen over time.

But I guess we won't go to an extreme point where normal usage should "never" have to touch it directly (but yes, probably "rarely"). For example, some rarely used configs (e.g., config.mark.stacked) don't really have better place to be.

Another unrelated thought as you talk about the spec evolving: does the spec have any place to specify the version used? If not, it's going to be hard to let it evolve without breaking a lot of people's visualizations.

Having a version would be useful. Right now we don't have it yet but we discussed about it in vega/vega#448.

In any case, we will mostly add new properties more than removing. In general, we consider multiple alternative syntaxes (esp. the core part) before deciding on a syntax so this help avoid deprecating properties / major restructuring to support backward compatibility. But as you know, we can't 100% prevent this, so we definitely should add versioning in some form.


However, for the case of the mark_* methods, because there is no ambiguity about which section of the config you are interested, I would also support the mark_point(color='red') syntax. It will really help users and there is no risk of name collision.

okay.

I am not very fond of the Layer().configure_mark_color('red') syntax as it starts to hide the attribute based approach of altair a bit too much.

I agree that this should not be shown in examples (as the recommended way).
But I guess this would come for free from auto-generation? If preventing it is more work, I'm okay with having it. (I'm fine either way.)

Note that config.facet has one level deeper of properties. So there should still be config_facet_axis(), config_facet_cell(), config_facet_grid(), config_facet_scale().

@ellisonbg
Copy link
Collaborator

Let's make a decision on these points:

  • Should we rename Layer -> Chart?
  • Should we implement the extra configure_* methods?

I am +1 on both...

@jakevdp
Copy link
Collaborator Author

jakevdp commented May 24, 2016

+1 if we're talking about the one-level-deep options... Not things like configure_mark_color().

@ellisonbg
Copy link
Collaborator

OK, I have to shift back to other stuff, feel free to work on this stuff...

@kanitw
Copy link
Member

kanitw commented May 24, 2016

  • Should we rename Layer -> Chart?
  • Should we implement the extra configure_* methods?

+1 too.

+1 if we're talking about the one-level-deep options... Not things like configure_mark_color().

I agree. configure_mark_color() is not good.

I guess what we actually don't want are methods for the leaves of the config tree?

We probably still want configure_facet_scale(bandSize=...) which is two-level-deep but its leaves are at level-three.

@ellisonbg
Copy link
Collaborator

This is done in master...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants