Reduce the number of entries available in the top level API #2918

joelostblom · 2023-02-22T20:26:14Z

Currently, the top level of the Altair API is huge, there are 573 entries in the tab completion menu of alt.<TAB>. This can make it rather hard to find what you're looking for unless you know the first few character of the name already, and exploring the API from scratch as a new user must be intimidating. Many of these seem unhelpful to have exposed at the top-level, e.g. there are 70 different entries that relate to conditions, but I only use alt.condition myself. I suggest that the top level API only includes the most relevant classes and functions that we think are helpful on a day to day basis, and the rest go under sub-modules.

The Altair API consists of four parts:

API functions
Top level objects
Encoding channels
Low level schema wrappers

I have listed my comments below on how we could reduce these, and I would love to hear what you think @mattijn, @ChristopherDavisUCI, @binste (and anyone else that wants to chime in)

Note: In all the changes I suggest below we should not break any old code, just remove the class/function from tab completion and link it to the new place in the API where this class/function now resides.

API functions

Introduced specifically by Altair as conveniences and we should keep all of them (although maybe check_fields_and_encodings is meant to be a private function based on the name and since there is no docstring?).

(just showing the first few)

Top level objects

The are only eight of these and we can probably keep all of them. Alternatively, since all except Chart have corresponding API functions and are rarely used directly we could move them to a charts sub domain: alt.charts.HConcatChart, etc, to make it more clear that the preferred usage is the API method alt.hconcat, etc.

Encoding channels

There is wasted space here by having each channel repeated three times, e.g. Angle, AngleDatum, AngleValue. Especially as the most common use of datum and value is by using the top level API functions alt.datum and alt.value. I see to it possible ways of improving the situation:

Move the top level classes to methods on the corresponding encoding class: instead of AngleDatum we have Angle.datum.
Moved the top level classes to methods on the API function: instead of AngleDatum, we would have alt.datum.Angle.

Maybe the second is lightly more natural in syntax, but it will probably make it confusing since you can use datum.<colum_name> in expression strings, so the first would be better for that reason.

(just showing the first few)

Low level schema wrappers

This is where we could make the most progress. There are around 400 low level schemer wrappers of which I've only used around 10 myself, most of which are now unnecessary since we added the method based syntax for channel encoding options.

We should identify which of these are just internal helper classes for Altair that end users rarely need to know about. To me it seems like almost everything here could be moved to a submodule, except the classes that end in Params (although they could also be moved if we introduce aliases).

I don't know what a good name of of this submodule might be: maybe alt.extras, alt.schema_wrappers, alt.misc, alt.wrappers, or similar?

(just showing the first few)

Again, none of these changes should break old code. For example, if we move FillDatum to alt.Fill.datum, then there should no longer be any tab completion for alt.FillDatum, but old code that use it should work as it just calls alt.Fill.datum under the hood. The tab completion should be on alt.Fill.<tab> instead, which also seems more natural to me.

The text was updated successfully, but these errors were encountered:

binste · 2023-02-23T11:08:20Z

I really like the idea of cleaning up the top-level API! Thanks for bringing this up. I also find it quite overwhelming to go through the API references in the docs and to use autocompletion if you don't specifically know what you are looking for (and therefore I never use completion on alt.Chart to browse options).

In favour of cleaning up the API references in the docs which we can maybe do without any code changes except to generate_api_docs.py. I can't quite follow your example of refactoring FillDatum without breaking old code, could you expand on this a bit? Would you rename that class or just move it to a new submodule and still import it in the top-level?

Exclusion from tab completion is a bit tricky. As we noticed in #2814, hiding in Jupyter works great but we haven't yet found a way to do the same in VS Code. Of course still worth the effort to adjust __all__ if at least Jupyter users can profit from it.

joelostblom · 2023-02-23T21:07:38Z

Would you rename that class or just move it to a new submodule and still import it in the top-level?

My hope was to move it to a submodule and import it at top level, but hide it from the top level tab completion and only allow tab completion via the submodule. I think this would allow for old code to work, while also encouraging people to use the submodules from now on as well as reduce the top-level tab completion.

Exclusion from tab completion is a bit tricky. As we noticed in #2814, hiding in Jupyter works great but we haven't yet found a way to do the same in VS Code.

I think it would be great to support VS Code too, but if there isn't a way that pylance offers for hiding individual classes/functions, I am not sure there is much we can do until they add that. I tried asking in the discussion you linked before, let's see if there is a way.

joelostblom · 2023-02-23T23:22:47Z

Based on the helpful reply in microsoft/pylance-release#3709 (reply in thread) it seems like we need to get #2614 or similar merged, but maybe it is worthwhile if it resolves some of the VS Code issues we have been seeing and makes it easier to maintain both VS Code and JupyterLab support?

binste · 2024-09-06T12:11:03Z

Some notes from an offline discussion with @mattijn and @arvind:

There is potential in Vega-Lite to simplify the types, i.e. reduce the number of types. This would automatically improve the situation here. This can happen in VL as a first step of a potentially bigger rewrite/refactoring. However, this first requires the appropriate funding and resources so it's not around the corner.
Instead of doing it in VL first, we have the flexibility of doing it directly in Altair as discussed above.

What is very unlikely to break users code but would improve the situation a lot is removing classes from the top-level namespace which are never used by users anyway as they just represent a primitive datatype such as int, float. Examples: ConditionalAxisPropertyTextBaselinenull, DictSelectionInit, ... There are hundreds of these.

A first step could be to analyze which class names only appear ones in our code base, i.e. are defined but never used anywhere else, not even in a type hint. This can be a manual list and with every VL release, we could check in the PR if new entries get added to the top-level, if we really want them there.

Maybe we can remove these classes completely instead of just not making them available in the top-level namespace. This would make the Altair wheels even smaller. It might require a bit of a rewrite of from_dict.

binste · 2024-09-06T12:11:53Z

Due to the many objects, Altair takes up 23MB of memory (if I measure this correctly?) and this excludes it's requirements which are small:

from memory_profiler import profile


@profile
def import_packages():
    import typing_extensions
    import jinja2
    import jsonschema
    import packaging
    import narwhals

    import altair


if __name__ == "__main__":
    import_packages()

dangotbanned · 2024-09-06T12:59:10Z

Some notes from an offline discussion with @mattijn and @arvind:

There is potential in Vega-Lite to simplify the types, i.e. reduce the number of types. This would automatically improve the situation here. This can happen in VL as a first step of a potentially bigger rewrite/refactoring. However, this first requires the appropriate funding and resources so it's not around the corner.

Exciting to hear this being discussed on the Vega-Lite-side!

@binste is there any chance this relates to better support for generic types in more recent drafts of JSON Schema?

A while back I tried exploring what we could do in altair to reduce the number of symbols, but it started getting tricky due to https://github.com/vega/vega-lite/blob/main/scripts/rename-schema.sh

Screenshots

If there is a v6 on the table, I do think it would be worth reviewing how we can better utilise typing features - that wouldn't have been available in earlier major versions of altair

binste · 2024-09-06T13:34:15Z

If it makes it easier for us to generate type hints in Python, we should definitely look at it once we get to that rewrite, thanks for flagging!

In general, I think their is some consensus around simplifying the type hierarchy in VL. At least for Altair, I don't think we need ConditionalParameter, LogicalAnd, etc. as no user will use this. Or, in case we need it for schema generation, we would burry them deep in the codebase as an implementation detail.

binste · 2024-09-06T13:49:20Z

When we look into this, let's also review SPEC 1 which has recommendations on how to do lazy-loading submodules and functions.

dangotbanned · 2024-09-06T14:44:46Z

When we look into this, let's also review SPEC 1 which has recommendations on how to do lazy-loading submodules and functions.

Will find the link later, but polars has a good solution for this which preserves typing

Edit

In terms of LOC, not much more than utils._importers and is doing a whole lot more:

https://github.com/pola-rs/polars/blob/f493f37fde031cf7210d600151747cf2b7ba724a/py-polars/polars/dependencies.py

dangotbanned · 2024-09-29T10:31:26Z

I haven't seen this mentioned anywhere.

We also have several modules available at multiple levels, which is making it complicated to resolve #3610

I discovered this yesterday and have written a summary of my findings here:

refactor: alt.themes -> alt.theme #3618 (comment)

dangotbanned · 2024-10-03T13:16:12Z

@joelostblom @binste #2918 (comment)

I think I can see a path forward here.

Building on some of the ideas from #3366 (comment) and #3618-theme.py

`alt.value`

alt.AngleValue -> alt.value.Angle
This change wouldn't cause issues with existing code, as alt.value is currently a function.
The actual implementation could look something like:

Definitions

from __future__ import annotations

from typing import Any

import altair as alt


class _ValueMeta(type):
    """Metaclass for :class:`value`."""

    @property
    def Angle(cls) -> type[alt.AngleValue]:
        return alt.AngleValue

    @property
    def Color(cls) -> type[alt.ColorValue]:
        return alt.ColorValue

    # ... for all the other `...Value` channels


class value(metaclass=_ValueMeta):
    def __new__(cls, value: Any, **kwds: Any) -> dict[str, Any]:
        """Specify a value for use in an encoding."""
        return dict(value=value, **kwds)

Usage

examples = (
    value(1),
    value(1, offset=-5),
    value.Angle(0),
    value.Color("red"),
    value.Angle(0.5).condition(),
    value.Color("steelblue").condition(),
)
>>> examples
({'value': 1},
 {'value': 1, 'offset': -5},
 AngleValue({
   value: 0
 }),
 ColorValue({
   value: 'red'
 }),
 AngleValue({
   condition: {},
   value: 0.5
 }),
 ColorValue({
   condition: {},
   value: 'steelblue'
 }))

Screenshot

So this change alone is just a very lightweight rewrapping of the existing API.
A v6 version might want to look at changing the wrapped class names and/or reprs.

`alt.datum`

As mentioned by @joelostblom

Moved the top level classes to methods on the API function: instead of AngleDatum, we would have alt.datum.Angle.

Maybe the second is lightly more natural in syntax, but it will probably make it confusing since you can use datum.<colum_name> in expression strings, so the first would be better for that reason.

I originally mistook this for talking about a naive approach at supporting both.

For reference, that would look like:

Definitions

from __future__ import annotations

from typing import Any

import altair as alt
from altair.expr.core import GetAttrExpression, GetItemExpression


class _DatumMeta(type):
    """Metaclass for :class:`datum`."""

    @property
    def Angle(cls) -> type[alt.AngleDatum]:
        return alt.AngleDatum

    @property
    def Color(cls) -> type[alt.ColorDatum]:
        return alt.ColorDatum

    def __repr__(self) -> str:
        return "datum"

    def __getattr__(self, attr) -> GetAttrExpression:
        if attr.startswith("__") and attr.endswith("__"):
            raise AttributeError(attr)
        return GetAttrExpression("datum", attr)

    def __getitem__(self, attr) -> GetItemExpression:
        return GetItemExpression("datum", attr)


class datum(metaclass=_DatumMeta):
    def __new__(cls, datum: Any, **kwds: Any) -> dict[str, Any]:
        """Specify a datum for use in an encoding."""
        return dict(datum=datum, **kwds)

Usage

examples_datum = (
    datum,
    datum(1),
    datum(1, type="quantitative"),
    datum.Angle(1),
    datum.Color("green"),
    datum.Angle(0.2).bandPosition(0.8),
    datum.column_name,
    datum["column with spaces"],
)
>>> examples_datum
(datum,
 {'datum': 1},
 {'datum': 1, 'type': 'quantitative'},
 AngleDatum({
   datum: 1
 }),
 ColorDatum({
   datum: 'green'
 }),
 AngleDatum({
   bandPosition: 0.8,
   datum: 0.2
 }),
 datum.column_name,
 datum['column with spaces'])

Screenshot

Issues

What if you have a column named "Color", or any of the other channel names?

Introspection

v6_color, v5_color = datum.Color, alt.datum.Color
>>> print(
    f"v6 returns a class: {v6_color.__name__!r}\n  "
    f"value: {v6_color!r}\n"
    f"v5 returns an instance of: {type(v5_color).__name__!r}\n  "
    f"value: {v5_color!r}\n"
)
v6 returns a class: 'ColorDatum'
  value: <class 'altair.vegalite.v5.schema.channels.ColorDatum'>
v5 returns an instance of: 'GetAttrExpression'
  value: datum.Color

Works for v5

import polars as pl

df = pl.DataFrame(
    {
        "x": range(8),
        "y": [2, 5, 4, 7, 8, 3, 4, 5],
        "Color": [
            "red",
            "blue",
            "green",
            "steelblue",
            "olive",
            "yellow",
            "red",
            "green",
        ],
    }
)

base = alt.Chart(df).mark_bar().encode(x="x:N", y="y")

v5_predicate = alt.datum.Color == "red"

v5_chart = base.encode(
    color=alt.when(v5_predicate)
    .then(alt.value("black"))
    .otherwise("Color", legend=None)
)

Would break for v6

>>> base.encode(
    color=alt.when(datum.Color == "red")
    .then(alt.value("black"))
    .otherwise("Color", legend=None)
)
TypeError: Expected a predicate, but got: 'bool'

From `predicate=False`.

Would still work for v6

>>> base.encode(
    color=alt.when(datum["Color"] == "red")
    .then(alt.value("black"))
    .otherwise("Color", legend=None)
)

Possible solutions

Do (almost) nothing.
- alt.datum.___ will mean different things, depending on if the name you use collides.
- Explain in the docs about this edge case.
- Advise using datum[...] to avoid the collision.
- Similar situation to:
  - improve feedback to user with "." in column #3451
  - User Guide
Deprecate alt.datum.__getattr__(...) for this purpose
- Maybe controversial, but very possible given a MAJOR version is expected for this anyway.
- Unless I'm missing something, it seems the same functionality is provided by alt.datum.__getitem__(...).
  Additionally:
  - Support for spaces and python keywords
  - (Soon) Support for alt.Parameter
- Additional benefit of there being only one obvious way to reference a column via alt.datum.
Return some kind of a proxy object from alt.datum.___
- Acts like the wrapped channels class when called
- Acts like GetAttrExpression in all other cases
- Handling of this behavior would need to be added everywhere an Expression is currently expected
Add another metaclass, to use for alt.___Datum classes
- Somewhat like OperatorMixin, but these methods would be available on the class
- I can see this working on a technical-level, but probably not intuitive to use

Of these options I strongly prefer [2].
Despite any potential disruption, it makes things less ambiguous and requires the least explanation to new users.

I think [1] would be short-sighted and a point of confusion for even trivial cases like: alt.datum.color vs alt.datum.Color.

Both [3] and [4] would introduce more complexity internally - even if they solved the issue of not breaking existing code.

Summary

I know there is a lot go through here, so no rush on a response 😅

Hope the examples are helpful.
Even if we don't move forward with this idea; maybe they spark an idea for someone else.

joelostblom added the enhancement label Feb 22, 2023

This was referenced Feb 25, 2023

Restrict top-level autocompletion for VS Code (PyLance) #2927

Draft

Add type hints to tools.schemapi.schemapi #2614

Closed

binste mentioned this issue Jun 10, 2023

Support extracting transformed chart data using VegaFusion #3081

Merged

joelostblom mentioned this issue Nov 2, 2023

Type hints: Infer types from vegalite schema for autogenerated code #3208

Merged

binste mentioned this issue Jul 2, 2024

Deprecation/breaking changes policy & process #3454

Closed

dangotbanned mentioned this issue Jul 26, 2024

feat(RFC): Adds agg, field utility classes #3505

Draft

dangotbanned added the v6 Aim to be completed for version 6 label Aug 13, 2024

binste mentioned this issue Sep 6, 2024

Improve expr method signatures #3563

Closed

dangotbanned mentioned this issue Sep 6, 2024

Preparing for Python 3.8 EOL #3497

Closed

dangotbanned mentioned this issue Sep 13, 2024

feat: Adds ThemeConfig (TypedDict) #3536

Merged

10 tasks

This was referenced Oct 27, 2024

add a method for binding_color? #2760

Open

feat: Support Chart.transform_filter(*predicates, **constraints) #3664

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the number of entries available in the top level API #2918

Reduce the number of entries available in the top level API #2918

joelostblom commented Feb 22, 2023 •

edited

Loading

binste commented Feb 23, 2023

joelostblom commented Feb 23, 2023

joelostblom commented Feb 23, 2023 •

edited

Loading

binste commented Sep 6, 2024

binste commented Sep 6, 2024

dangotbanned commented Sep 6, 2024 •

edited

Loading

binste commented Sep 6, 2024

binste commented Sep 6, 2024

dangotbanned commented Sep 6, 2024 •

edited

Loading

dangotbanned commented Sep 29, 2024

dangotbanned commented Oct 3, 2024 •

edited

Loading

Reduce the number of entries available in the top level API #2918

Reduce the number of entries available in the top level API #2918

Comments

joelostblom commented Feb 22, 2023 • edited Loading

API functions

Top level objects

Encoding channels

Low level schema wrappers

binste commented Feb 23, 2023

joelostblom commented Feb 23, 2023

joelostblom commented Feb 23, 2023 • edited Loading

binste commented Sep 6, 2024

binste commented Sep 6, 2024

dangotbanned commented Sep 6, 2024 • edited Loading

binste commented Sep 6, 2024

binste commented Sep 6, 2024

dangotbanned commented Sep 6, 2024 • edited Loading

Edit

dangotbanned commented Sep 29, 2024

dangotbanned commented Oct 3, 2024 • edited Loading

alt.value

alt.datum

Issues

Possible solutions

Summary

joelostblom commented Feb 22, 2023 •

edited

Loading

joelostblom commented Feb 23, 2023 •

edited

Loading

dangotbanned commented Sep 6, 2024 •

edited

Loading

dangotbanned commented Sep 6, 2024 •

edited

Loading

dangotbanned commented Oct 3, 2024 •

edited

Loading

`alt.value`

`alt.datum`