Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(api): make input value coercion of mutate() identical to select() #8878

Merged
merged 1 commit into from
Apr 3, 2024

Conversation

kszucs
Copy link
Member

@kszucs kszucs commented Apr 3, 2024

String and integer literals passed to select() are interpreted as columns whereas mutate() interpreted them as literals.

BREAKING CHANGE: strings passed to table.mutate() are now interpreted as column references instead of literals, use ibis.literal(string) to pass the string as a literal

@kszucs kszucs changed the title refactor(api): make input value coercion of mutate() identical to select() refactor(api): make input value coercion of mutate() identical to select() Apr 3, 2024
@kszucs kszucs requested review from jcrist and cpcloud April 3, 2024 09:38
@kszucs
Copy link
Member Author

kszucs commented Apr 3, 2024

I am actually not a big fan of t.select(42) trying to locate a column. Treating strings as column references is fine but integers are rather weird.

@kszucs kszucs force-pushed the bind-prefer-column branch 2 times, most recently from d652e3c to 0e2cef3 Compare April 3, 2024 10:22
Copy link
Member

@cpcloud cpcloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Still a good chunk of failures though.

@kszucs kszucs force-pushed the bind-prefer-column branch 2 times, most recently from e9596c0 to ace7011 Compare April 3, 2024 11:20
…ect()

String and integer literals passed to select() are interpreted as columns whereas mutate() interpreted them as literals.

BREAKING CHANGE: string and integer literals passed to table.mutate() are now interpreted as column references
@kszucs kszucs force-pushed the bind-prefer-column branch from ace7011 to cc2e34d Compare April 3, 2024 11:34
@jcrist
Copy link
Member

jcrist commented Apr 3, 2024

I am actually not a big fan of t.select(42) trying to locate a column. Treating strings as column references is fine but integers are rather weird.

I'd agree with this and would propose we deprecate (or just remove) ints as column references everywhere except when directly indexing a table (e.g. t[0]).

I do think that treating everything passed to mutate as a literal is a bit more useful behavior (t.mutate(new_col="existing_col") would be less common IMO than t.mutate(new_col="some_constant")), but I can see the argument here for consistency across methods.

@cpcloud
Copy link
Member

cpcloud commented Apr 3, 2024

+1 to removing the int-as-column references in mutate and select (I can't remember which one has it and which one doesn't), and keeping the string as column references but only in select.

We have rename for the use cases this covers.

@kszucs
Copy link
Member Author

kszucs commented Apr 3, 2024

Now both are consistent. My preference is to remove both, but I am afraid that this would be too big of a breaking change. Though we actually break it now, and provide a config option to fall back to the previous behavior.

@jcrist
Copy link
Member

jcrist commented Apr 3, 2024

+1 to removing the string-and-int-as-column references in mutate and select (I can't remember which one has it and which one doesn't)

IMO we'll still want to support strings as column references. t.select("x", "y", "z") is such a common pattern, and feels intuitive to me (while t.select(0, 1, 2) doesn't).

@kszucs
Copy link
Member Author

kszucs commented Apr 3, 2024

Ok, I think we have an agreement on removing the int case. Going to put up a follow-up for that.

@cpcloud
Copy link
Member

cpcloud commented Apr 3, 2024

We need to document this as part of this PR or the next one, because it seems pretty non trivial to articulate what the rules are especially around strings.

What exactly is the behavior of each of str and int without calling literal in each of select and mutate?

@kszucs
Copy link
Member Author

kszucs commented Apr 3, 2024

Created a follow-up issue for the documentation #8879

@kszucs kszucs merged commit 38e7e14 into ibis-project:main Apr 3, 2024
96 checks passed
@kszucs kszucs deleted the bind-prefer-column branch April 3, 2024 15:24
NickCrews added a commit to NickCrews/mismo that referenced this pull request Apr 10, 2024
cpcloud pushed a commit that referenced this pull request Apr 13, 2024
…erences (#8884)

Removing the int-as-column references in mutate and select. Further discussed in #8878.

BREAKING CHANGE: Integer inputs to `select` and `mutate` are now always interpreted as literals. Columns can still be accessed by their integer index using square-bracket syntax.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants