Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Improve the syntax for conditions with multiple predicates #3427

Merged
merged 122 commits into from
Jul 18, 2024

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented May 29, 2024

Fixes #2759

Throughout this PR I frequently reference polars.when, and I want to make sure any reviewers have the origin handy.

Tasks

Review

Core

Stretch

These would be nice to have and were demonstrated in #2759 (comment). Each idea comes from polars.when.

However, they need further discussion due to deviating from alt.condition's original behaviour.

  • Wrap literals passed to then, otherwise in alt.value. 38d425f (#3427)
  • Wrap kwargs passed to when as alt.datum.key.__eq__(value). 38d425f (#3427)
  • Modify when's predicate argument to be variadic, reducing all predicates as an AND reduction. 38d425f (#3427)
    • Probably the least contentious
      • alt.condition already accepts the result of this.
      • when(predicate, **kwargs) -> when(*predicate, **kwargs) doesn't change the meaning of arguments other than predicate
        • A concern that I raised when proposing making this change within alt.condition

Returned type is only `core.SchemaBase` under one very specific case.
Appears to have been there since at least 5.0.0
Should cover all valid combinations, including that only one of `if_true`, `if_false` may be `str`.
Required for an annotation in later commit, but the import complicated the diff
- `expr.Expression` -> `_expr_core.Expression` fixes the type appearing as `Any`
- `dict` -> `TypingDict[str, Any]` fixes the type appearing as `dict[Unknown, Unknown]`
- Generally, the hierarchy of `Union`'s is allowing reuse and trailing docstrings, which I think improves readability.

These will be used in `alt.condition` and related functions
Split out the `_PredicateType`/`_StatementType` parsing for reuse in `when-then-otherwise`.
Unsure if `expr` is still needed here as a side effect, but it is no longer needed internal to `api`.
Purely a conservative step.
This operation was performed on every call, but was entirely static.
These need to be decoupled for the following cases:
- Only `if_true` provided
- multiple (`if_true`,`if_false`)
  - Ending of an `if_false`
  - Ending of an `if_true`

Also adds a placeholder `TypeError`.
Resolves:

> FAILED tests/utils/test_core.py::test_infer_encoding_types_with_condition - AssertionError: assert {'color': Col...  value: 2})} == {'color': Col...  value: 2})}
- Sufficient for solving [vega#3301](vega#3301)
- Still a work-in-progress
Will need updating with tests on `_Then`. Currently the wrapped `dict` should be a parseable input, but the rest of `altair` doesn't know what to do with `_Then` yet.
`list` fails on 3.8
…python>=3.12.0`

Any objects using `__getattr__`, e.g. `Parameter` break on older versions [see PR](python/cpython#103034)
Within a recursive call, an `str` would always be checked for each case - despite the last always being false.
Somewhat duck-typing as `SchemaBase`.
Temporary solution while `when-then-otherwise` is not related to any other `altair` classes.
@dangotbanned dangotbanned marked this pull request as draft July 18, 2024 18:31
@dangotbanned dangotbanned marked this pull request as ready for review July 18, 2024 18:31
@dangotbanned
Copy link
Member Author

👍😊

@mattijn All of #3427 (comment) is finished and ready for review.
Did a little bit of tidying up as well.

If you'd like to defer the docs part to another PR, I'd be happy with merging as-is.

@mattijn
Copy link
Contributor

mattijn commented Jul 18, 2024

Thanks! Looks great! A few question for clarifications.

I was confused about these two new tests:

when_constraint = alt.when(Origin="Europe")

when_constraints = alt.when(
    Name="Name_1", 
    Color="Green", 
    Age=25, 
    StartDate="2000-10-01"
)

expected_constraint = alt.datum.Origin == "Europe"

expected_constraints = (
    (alt.datum.Name == "Name_1")
    & (alt.datum.Color == "Green")
    & (alt.datum.Age == 25)
    & (alt.datum.StartDate == "2000-10-01")
)

This is syntax I have not seen before. I noticed this is mentioned in the polars docs, that mentions this:

constraints
    Apply conditions as `col_name = value` keyword arguments that are
    treated as equality matches, such as `x = 123`. As with the predicates
    parameter, multiple conditions are implicitly combined using `&`.

And

Pass conditions as keyword arguments:

>>> df.with_columns(val=pl.when(foo=4, bar=0).then(99).otherwise(-1))
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ val │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i32 │
╞═════╪═════╪═════╡
│ 1   ┆ 3   ┆ -1  │
│ 3   ┆ 4   ┆ -1  │
│ 4   ┆ 0   ┆ 99  │
└─────┴─────┴─────┘

Do you know where this type of syntax comes from? Is it unique for polars or also used elsewhere?

This only functions with a single = operator? Ie. a predicate like foo<4 is not supported? And what about alt.when('my column has spaces'=4)

@dangotbanned
Copy link
Member Author

dangotbanned commented Jul 18, 2024

Thanks! Looks great! A few question for clarifications.

I was confused about these two new tests:

    when_constraint = alt.when(Origin="Europe")
    when_constraints = alt.when(
        Name="Name_1", Color="Green", Age=25, StartDate="2000-10-01"
    )
    expected_constraint = alt.datum.Origin == "Europe"
    expected_constraints = (
        (alt.datum.Name == "Name_1")
        & (alt.datum.Color == "Green")
        & (alt.datum.Age == 25)
        & (alt.datum.StartDate == "2000-10-01")
    )

This is syntax I have not seen before. I noticed this is mentioned in the polars docs, that mentions this:

    constraints
        Apply conditions as `col_name = value` keyword arguments that are treated as
        equality matches, such as `x = 123`. As with the predicates parameter, multiple
        conditions are implicitly combined using `&`.

And

    Pass conditions as keyword arguments:

    >>> df.with_columns(val=pl.when(foo=4, bar=0).then(99).otherwise(-1))
    shape: (3, 3)
    ┌─────┬─────┬─────┐
    │ foo ┆ bar ┆ val │
    │ --- ┆ --- ┆ --- │
    │ i64 ┆ i64 ┆ i32 │
    ╞═════╪═════╪═════╡
    │ 1   ┆ 3   ┆ -1  │
    │ 3   ┆ 4   ┆ -1  │
    │ 4   ┆ 0   ┆ 99  │
    └─────┴─────┴─────┘

Do you know where this type of syntax comes from? Is it unique for polars or also used elsewhere?

This only functions with a single = operator? Ie. a predicate like foo<4 is not supported? And what about alt.when('my column has spaces'=4)

@mattijn I'm confused also but for a different reason.
This has been in here for a while and it is described with an example in the doc for alt.when

It was also part of the original proposal #2759 (comment)

I also listed it under the Stretch Tasks #3427 (comment)

It utilises **kwargs, so you can technically do it for columns with spaces like this:

alt.when(**{'my column has spaces':4})

I would maybe note this in the User Guide as an option, but I'd probably lean on a different feature for those cases.

Apologies, I'm doing this from my phone - but I'm pretty sure one of the later tests has this syntax of unpacking.

foo<4 will work if foo defines __lt__. It would be collected in *more_predicates, instead of **constraints.

I'll update this tomorrow with a more complete response, but appreciate you checking this out so quickly

@mattijn
Copy link
Contributor

mattijn commented Jul 18, 2024

Great, this docstring you refer to is very clear. Thanks for linking.

It would be great if we can extend the documentation as well in a follow up PR on this new feature's scope to manage user expectations.

Maybe without realizing, but there have been many new things in this PR! Thanks for all these efforts.

I'm approving, but please comment when you think this PR can be merged!

@dangotbanned
Copy link
Member Author

Great, this docstring you refer to is very clear. Thanks for linking.

It would be great if we can extend the documentation as well in a follow up PR on this new feature's scope to manage user expectations.

Maybe without realizing, but there have been many new things in this PR! Thanks for all these efforts.

I'm approving, but please comment when you think this PR can be merged!

@mattijn thank you, I'm ready for merging

Will absolutely be following up with another PR w/ more docs, especially examples on the methods (not just alt.when)

@mattijn mattijn merged commit bdc747d into vega:main Jul 18, 2024
12 checks passed
@mattijn
Copy link
Contributor

mattijn commented Jul 18, 2024

Thank you @dangotbanned!

dangotbanned added a commit to dangotbanned/altair that referenced this pull request Jul 19, 2024
…tion on Click*

Utilises `alt.when`, introduced in vega#3427

Fixes vega#3301
jonmmease pushed a commit that referenced this pull request Jul 20, 2024
* fix: Use absolute imports explicitly

Fixes the mypy issue referenced in `alt.__init__.py`

* refactor: remove dynamic globals manipulation

The results of `_populate_namespace` were only visible in a `IPython` environment

* feat: Reimplement `expr` as a class w/ identicial semantics as the singleton

- all `ConstExpression`s are now read-only properties stored in a metaclass
- `expr` subclasses `ExprRef`, replacing the `__call__` w/ `__new__`
  - No instances of `expr` are ever created
- All `FunctionExpression`s are now classmethods
- Each `Expression` can now be referenced in autocomplete, without needing `IPython`/`Jupyter`
- Docstrings have been reformatted to be compatible with `sphinx`
- Broken links in docstrings fixed (e.g. for d3)
- class-level doc for `expr`, with references & doctest

* fix: use absolute imports in `test_expr`

Previously relied on the dynamic globals

* test(perf): rewrite `expr` tests to run in parallel

Previously 2 tests, now 170+

* test: confirm `expr` constants stay constant

* chore(typing): add ignores not flagged by mypy

* docs: adds `alt.expr` to API Reference

Uses the section heading proposed in #3427 (comment)

Expands on #2904

* test: ensure `test_expr_consts_immutable` pattern works for all versions

* docs: fix typos, regex artifacts, apply some `ruff` `D` rules

* docs: add doc for `_ConstExpressionType`

* test: make `expr` doctest testable

I left this out initially, but by adding static `Parameter` names it is now possible

* fix: re-run `generate-schema-wrapper`

https://github.com/vega/altair/actions/runs/10005909777/job/27657526013?pr=3466

* refactor: Remove `expr` test dependency on constants

Fixes #3466 (comment)

* style: remove trailing commas in `@pytest.mark.parametrize`
@dangotbanned dangotbanned deleted the condition-multiple branch July 21, 2024 11:03
dangotbanned added a commit to dangotbanned/altair that referenced this pull request Jul 21, 2024
Planned in vega#3427 (comment)

Will allow for more reuse
mattijn pushed a commit that referenced this pull request Jul 22, 2024
* refactor: Rename and move `is_undefined`

Planned in: https://github.com/vega/altair/pull/3480/files/419a4e944f231026322e2c4f137e7a3eb94735e8#r1679197993

- `api._is_undefined` -> `schemapi.is_undefined`
- Added to `utils.__all__`

* refactor: Rename and move `OneOrSeq`

Planned in #3427 (comment)

Will allow for more reuse
dangotbanned added a commit to dangotbanned/altair that referenced this pull request Jul 26, 2024
`field` proposed in vega#3239 (comment)
`agg` was developed during vega#3427 (comment) as a solution to part of vega#3476
dangotbanned added a commit that referenced this pull request Oct 30, 2024
Remember discovering this during (#3427), but hadn't documented
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the syntax for conditions with multiple predicates
5 participants