Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): move from .case() to .cases() #9096

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented May 1, 2024

Redo of #7914 (with substantive changes) and #9039 (merely switching the base repo to the correct one, my fork)

I will try to keep the summary below up to date with what this PR does:

Breaking change to Value.cases()

Before the API was

def cases(
        self,
        case_result_pairs: Iterable[tuple[ir.BooleanValue, Value]],
        default: Value | None = None,
    ) -> Value:

Now it is

def cases(self, *branches: tuple[Value, Value], else_=None) -> Value:

In other words, it is now variadic, and the else_ is kwarg. The else_ naming convention is now consistent with Value.substitute()

This will be a hard breaking change for users when they upgrade to 10.0.

Alternatives considered (I am currently going for the simplest but harshest path):

  • Some gnarly compact code that detects and translates old-style use of the API, and emits a warning. This was deemed too complex.
  • Some slightly less gnarly code that merely detects the old-style API, and errors.

Soft-deprecates ibis.case() and Value.case()

Existing users will be able to continue to use these APIs with no problems. There will be no warnings or anything else. However, we will remove these functions from the docs, and make the docstring advise to use the new API.
I want there to be ONE endorsed way of doing these conditionals, so I think only advertising one API is very important to me.

Alternatives considered:

  • Hard-deprecation: Add @util.deprecated(instead="use ibis.cases() instead", as_of="10.0") to these functions.
  • Hard-removal: remove them entirely in 10.0
  • Still possible to do: hide .case() even more from users and tab completion, eg behind __getattr__()

See this comment thread for more discussion.

@NickCrews
Copy link
Contributor Author

NickCrews commented May 1, 2024

EDIT: duh, it's because they don't guarantee row order. Updated the assertions to be order-independent.

Any idea as to why the datafusion, exasol, and risingwave column tests are failing? I still have trouble getting those backends running on my M1 so I can't debug locally very well.

@NickCrews NickCrews requested a review from cpcloud May 1, 2024 18:05
@NickCrews NickCrews force-pushed the case-to-cases branch 7 times, most recently from 513bb94 to ae70dc6 Compare May 8, 2024 22:11
@NickCrews
Copy link
Contributor Author

@cpcloud gentle nudge for a review here :)

@NickCrews
Copy link
Contributor Author

@cpcloud anything I can do to help move this forward/easier to review?

@ncclementi
Copy link
Contributor

@NickCrews would you mind fixing the conflicts and CI, then I'll nudge the team to get you a review soon.

@NickCrews
Copy link
Contributor Author

Thanks for the poke. I have a lot of other PRs in flight at the moment with ibis, many of which I think are a lot closer to the finish line than this, I'd love to get those merged first before I add another ball to juggle.

@NickCrews
Copy link
Contributor Author

One thing that should get merged first is #9334, where I fix a datashape bug that is exposed in this PR. Once that is in, then this PR will be simpler.

Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing through!

I still have many concerns about the way things are being done in this PR. There also seem to be a few scope increases (e.g., the cast insertion) that should really be avoided.

ibis/backends/pandas/executor.py Outdated Show resolved Hide resolved
ibis/backends/sql/compiler.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_generic.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_generic.py Outdated Show resolved Hide resolved
ibis/backends/tests/test_generic.py Outdated Show resolved Hide resolved
ibis/expr/operations/generic.py Outdated Show resolved Hide resolved
ibis/expr/types/generic.py Outdated Show resolved Hide resolved
"""
import ibis.expr.builders as bl
@staticmethod
def _norm_cases_args(*args, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very opposed to this function. Instead of massaging this to be something close to maintainable, why not just wait until 10.0? Doesn't seem worth the trouble to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would love to drop this. I was just trying to provide an upgrade path. Ideally:

  • a user upgrades to 9.2. They now get a deprecation warning, but their code continues to work. There is another API available they can switch to without the deprecation warning.
  • They upgrade to 10.0. The old API disappears

Without this compat gnarliness, then

  • a user upgrades to 9.2 They see no change
  • they upgrade to 10.0. Their code breaks, and won't work until they switch over to the new API.

As a user I definitely prefer the first experience, but if we provide nice actionable error messages then the second experience is definitely bearable. Should we go for the 2nd experience?

Similarly, should we keep the old CaseBuilder stuff around, or should we also hard-delete it all at once and provide an actionable error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's good to separate out the possible breaking changes. This is what I propose (which I think you agree with IIUC):

  • ibis.case(): we add the deprecation in 10.0, add a test that the old API raises a deprecation but still gives the right result. This isn't too hard to continue support. We fully remove in 11.0
  • Value.case(): Same as above
  • Value.cases(): remove this _norm_cases_args() gnarliness. This will just be hard breaking change for users when they upgrade 9.x to 10.0. No warning or upgrade path, sorry folks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is correct.

I will look into making these changes in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the commit I just pushed, I think we end up with the behavior above:

  • removed the _norm_cases_args() gnarliness
  • kept the deprecation warnings for ibis.case() and Value.case(). I also am fine with removing the deprecation, letting past users keep using them with no problem, but just not surfacing them in the docs or otherwise advertising them.

ibis/expr/types/generic.py Outdated Show resolved Hide resolved
ibis/expr/types/numeric.py Show resolved Hide resolved
@NickCrews NickCrews force-pushed the case-to-cases branch 2 times, most recently from c33860b to e0fa647 Compare July 12, 2024 17:09
NickCrews added a commit to NickCrews/ibis that referenced this pull request Jul 12, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler
NickCrews added a commit to NickCrews/ibis that referenced this pull request Jul 12, 2024
Prepping for ibis-project#9096

- moves several .ifelse() and .substitute() tests from ibis/backens/test/test_generic.py into a new test_conditionals.py
- moves some .case() tests from the pandas and dask backends into there, so the other backends are now tested as well, and removed duplication.
- adds a test case that `ibis.literal(5).nullif(5).case().when(None, "oops").else_("expected")` results in "expected". Not saying this is actually the best behavior,
but it is currently what we claim to be expected. Perhaps we want to revisit.
NickCrews added a commit to NickCrews/ibis that referenced this pull request Jul 12, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler
NickCrews added a commit to NickCrews/ibis that referenced this pull request Jul 12, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler

I move the validation for comparable-ness down into the operation so that
the logic is consolidated to one place.
in ibis-project#9096 there might be multiple places that construct an ops.SimpleCase, and we don't want
to have to implement the validation in all
calling locations.

We could consider relaxing the limitation for non-empty cases later, but for now lets be strict.
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still too much going on here. See review comments.

ibis/backends/tests/test_generic.py Outdated Show resolved Hide resolved

for case in cases:
if not rlz.comparable(base, case):
raise TypeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any changes that affect validation and/or type checking or construction of operations need to go in separately from API changes. If you want to make this change please put it in a separate PR.

ibis/backends/sql/compiler.py Outdated Show resolved Hide resolved
@cpcloud cpcloud removed this from the 9.2 milestone Jul 16, 2024
NickCrews added a commit to NickCrews/ibis that referenced this pull request Jul 16, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler

I move the validation for comparable-ness down into the operation so that
the logic is consolidated to one place.
in ibis-project#9096 there might be multiple places that construct an ops.SimpleCase, and we don't want
to have to implement the validation in all
calling locations.

We could consider relaxing the limitation for non-empty cases later, but for now lets be strict.
NickCrews added a commit to NickCrews/ibis that referenced this pull request Jul 16, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler

I move the validation for comparable-ness down into the operation so that
the logic is consolidated to one place.
in ibis-project#9096 there might be multiple places that construct an ops.SimpleCase, and we don't want
to have to implement the validation in all
calling locations.

We could consider relaxing the limitation for non-empty cases later, but for now lets be strict.

I already fixed the shape of ops.SearchedCase in ibis-project#9334,
but it looks like in that PR I forgot to also fix ops.SimpleCase, so I do that fix here.
@NickCrews
Copy link
Contributor Author

Once #9559 lands then I think the scope of this will be significantly reduced and this should be much easier, thank you for your reviews!

cpcloud pushed a commit to NickCrews/ibis that referenced this pull request Jul 23, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler

I move the validation for comparable-ness down into the operation so that
the logic is consolidated to one place.
in ibis-project#9096 there might be multiple places that construct an ops.SimpleCase, and we don't want
to have to implement the validation in all
calling locations.

We could consider relaxing the limitation for non-empty cases later, but for now lets be strict.

I already fixed the shape of ops.SearchedCase in ibis-project#9334,
but it looks like in that PR I forgot to also fix ops.SimpleCase, so I do that fix here.
cpcloud pushed a commit to NickCrews/ibis that referenced this pull request Jul 24, 2024
This is pinning down the expected behavior for cases before tackling
the case() to cases() switch in
ibis-project#9096
so that PR can be simpler

I move the validation for comparable-ness down into the operation so that
the logic is consolidated to one place.
in ibis-project#9096 there might be multiple places that construct an ops.SimpleCase, and we don't want
to have to implement the validation in all
calling locations.

We could consider relaxing the limitation for non-empty cases later, but for now lets be strict.

I already fixed the shape of ops.SearchedCase in ibis-project#9334,
but it looks like in that PR I forgot to also fix ops.SimpleCase, so I do that fix here.
@NickCrews NickCrews force-pushed the case-to-cases branch 2 times, most recently from 666d98b to 3ccca68 Compare July 24, 2024 20:08
@NickCrews
Copy link
Contributor Author

OK, this is much simpler now, there are no functionality changes, only API changes.

The one question in the air is what to do with the old APIS. Can you respond in the thread and then I can adjust things?

@NickCrews NickCrews requested a review from cpcloud July 24, 2024 20:22
@jcrist
Copy link
Member

jcrist commented Aug 12, 2024

Going through our PR backlog - @cpcloud, can you please give this another review when you have a moment?

@cpcloud
Copy link
Member

cpcloud commented Aug 25, 2024

Thinking about this PR some more, I'm not sure why we're going to all this trouble.

Do we really need to remove the existing case API? What problems is it causing or what additional maintenance does it add?

Right now cases is implemented in terms of the more primitive case, which is much less complex than what is being proposed here.

I think I am now -1 on this PR altogether. It seems like it might be fine to break cases and make it variadic for 10, but removing case seems wholly unecessary.

@NickCrews
Copy link
Contributor Author

I want the user experience to be consistent, and only one of the APIs to be surfaced in the docs and all the examples. I think I'd be fine not deleting the old API, and not even deprecating it, but just removing it from the docs. Also maybe removing it from tab complete? By putting it under gettattr or something?? In terms of maintenance cost I agree its not bad.

I do want to think about this not in terms of the cost to do the transition between these two states, which is a one time thing, and more in terms of how does the quality of the final state compare to how it was before for the user?

Also, if we're worried about breaking people, I believe it is a thing where we can ship a script that goes through their source code and rewrites the change for them (for the easy instances at least). There's a project that does this I think, just blanking on the name

@NickCrews NickCrews force-pushed the case-to-cases branch 2 times, most recently from 5ed9645 to db52a77 Compare September 11, 2024 18:39
@NickCrews
Copy link
Contributor Author

I'm still motivated to get this merged. I am trying to keep the original post at the top of this PR in sync with the changes, to provide a good summary of the changes, why we are doing them, and alternatives considered. I think I have mitigated some of your concerns above. Please read that and see what you think? Thanks!

@NickCrews
Copy link
Contributor Author

@cpcloud gentle ping, whenever you get the chance :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants