API: allow dependent assignment? #14207

chris-b1 · 2016-09-12T18:14:26Z

Since kwarg order will be guaranteed in python 3.6, we could allow this...though would only want to try with a 3.6+ version check, otherwise code could more or less randomly work/not work, depending on dict order.

In [44]: df = pd.DataFrame({'a': [1,2,3]})

In [45]: df.assign(b=1, c=lambda x: x['b'] * 2)

TomAugspurger · 2016-09-12T18:40:10Z

xref: #9777 (comment)

This would be a great, but I want to think a bit about exposing a feature that would so easily lead to subtle differences between 3.5 and 3.6. Not sure how best to handle it :/

Either way, we could probably deprecate the sorting of keys (documentation only; can't print a warning each time someone uses .assign).

TomAugspurger · 2016-09-12T18:43:01Z

FYI, here's the current docs (in a warning block):

Since the function signature of assign is **kwargs, a dictionary, the order of the new columns in the resulting DataFrame cannot be guaranteed to match the order you pass in. To make things predictable, items are inserted alphabetically (by key) at the end of the DataFrame.
All expressions are computed first, and then assigned. So you can't refer to another column being assigned in the same call to assign.

chris-b1 · 2016-09-12T20:07:43Z

Yeah, the back-compat is awkward, maybe this could be opt-in from an option, so it would at least raise on older pythons, although that's ugly too.
pd.set_option('ordered_assign', True) # raises on py < 3.6

TomAugspurger · 2017-02-07T23:46:02Z

@mrocklin thoughts on this? IIRC you wanted this initially to make things easier for dask? How much would break for you / dask users if we just removed the sorting of keys?

mrocklin · 2017-02-09T04:06:30Z

I think that Pandas should make whatever decisions make sense for Pandas. Dask.dataframe will adapt as necessary. I suspect that this won't cause a problem. Dask.dataframe will just try the assign operation on an empty dataframe to determine the column ordering. Whatever Pandas does we'll probably do automatically.

shoyer · 2017-02-09T04:35:41Z

Ultimately, I think this we should switch to this behavior and stop sorting.

But I don't see much of a rush. It might be better to wait until Python 3.6 is more widely used, even until pandas 2.0.

shoyer · 2017-02-09T04:37:46Z

I would also be comfortable just changing this for Python 3.6+. My guess is that there aren't that many folks using Python 3.6 in production yet, so this will probably cause minimal issues.

TomAugspurger · 2017-09-21T17:37:12Z

I think we should sort this out for 0.21. I propose

No change to 2.7 - 3.5; order is sorted by key
For 3.6+, we use the original order the user passed in
We don't allow dependent assignment yet

Maybe in the future, when there's less 3.5 and lower code around, we can allow for dependent assignment.

jorisvandenbossche · 2017-09-22T07:23:48Z

+1 on your summary

jreback · 2017-09-22T13:45:38Z

sgtm on @TomAugspurger summary as well.

bobhaffner · 2017-09-22T15:01:41Z

HI All, Tom said this was a good one for new contributors so I'm going after it.

jorisvandenbossche · 2017-09-25T06:49:55Z

Keep this open for the second part: allow dependent assignment somewhere in the future?

jreback · 2017-09-25T11:06:31Z

I would create a new separate issue if you really want to. That is a long time from now though.

jorisvandenbossche · 2017-09-25T12:08:46Z

The title and description in the top post is exactly this, so no need to create a new one I think, if we want an issue for it

…andas-dev#17632)

TomAugspurger · 2017-12-15T21:26:16Z

If you didn't see, as of python 3.7 dictionaries are ordered, as part of the language https://mail.python.org/pipermail/python-dev/2017-December/151283.html

My objection to allowing dependent assignment now was that it'd (potentially) lead to subtle bugs / confusing errors between versions of python. But since Python itself is opening the door, let's slip through too :) I think we should allow dependent assignment in 0.22.

datajanko · 2017-12-17T21:46:22Z

Obviously, I am not good at looking through existing issues. Please accept my aplogogies.

As of #18797 I am also interested in letting assign be able to accept dependent kwargs. I just performed the (in my opinion) naive/direct implementation (after adapting the test cases accordingly) and it works fine - for callables where references to the "depenendet columns" are evaluated lazily. Something like df.assign(b=1, c=df['b']) will - of course - not work.

So if it's okay, that dependent assignments only work for certain callables, I'd be happy to issue a pull request in the next days.

TomAugspurger · 2017-12-18T11:49:16Z

if it's okay, that dependent assignments only work for certain callables

Yes, that sounds right. You'll need to use a callable to refer to newly created columns in the .assign.

Specifically, 'df.assign(b=1, c=lambda x:x['b'])' does not throw an exception in python 3.6 and above. Further details are discussed in Issues pandas-dev#14207 and pandas-dev#18797.

datajanko · 2017-12-18T21:42:56Z

before issuing pull request, I obviously need to provide an entry in the what's new file. I assume other api changes is the right place, correct? Sorry for asking again, but this is may first coding contribution for an open source project.

chris-b1 · 2017-12-19T00:42:16Z

Yes, I would suggest there with an example. Always feel free to put up a work-in-progress PR for feedback.

…

On Dec 18, 2017 3:43 PM, "Jan Koch" ***@***.***> wrote: before issuing pull request, I obviously need to provide an entry in the what's new file. I assume other api changes is the right place, correct? Sorry for asking again, but this is may first coding contribution for an open source project. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#14207 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB1b_NkLM_BHJ8hW0sHnGK5FWxrdW0ZKks5tBtx0gaJpZM4J63ua> .

Specifically, 'df.assign(b=1, c=lambda x:x['b'])' does not throw an exception in python 3.6 and above. Further details are discussed in Issues pandas-dev#14207 and pandas-dev#18797.

Specifically, 'df.assign(b=1, c=lambda x:x['b'])' does not throw an exception in python 3.6 and above. Further details are discussed in Issues pandas-dev#14207 and pandas-dev#18797. populates dsintro and frame.py with examples and warning - adds example to frame.py - reworked warning in dsintro - reworked Notes in frame.py Remains open: frame.py probably is responsible vor travis not passing: doc test that requires python 3.6

…s-dev#18852)

TomAugspurger added the API Design label Sep 12, 2016

jreback added the Python 3.6 label Nov 17, 2016

TomAugspurger added this to the 0.21.0 milestone Sep 21, 2017

TomAugspurger added Difficulty Novice labels Sep 22, 2017

bobhaffner mentioned this issue Sep 22, 2017

preserve kwargs order on assign func for py36plus - #14207 #17632

Merged

4 tasks

jreback closed this as completed in #17632 Sep 24, 2017

jreback pushed a commit that referenced this issue Sep 24, 2017

preserve kwargs order on assign func for py36plus - #14207 (#17632)

965c1c8

jorisvandenbossche modified the milestones: 0.21.0, Someday Sep 25, 2017

jorisvandenbossche reopened this Sep 25, 2017

TomAugspurger added the good first issue label Oct 11, 2017

alanbato pushed a commit to alanbato/pandas that referenced this issue Nov 10, 2017

preserve kwargs order on assign func for py36plus - pandas-dev#14207 (p…

d93fc5f

…andas-dev#17632)

No-Stream pushed a commit to No-Stream/pandas that referenced this issue Nov 28, 2017

preserve kwargs order on assign func for py36plus - pandas-dev#14207 (p…

df11d07

…andas-dev#17632)

jreback removed the Difficulty Novice label Dec 15, 2017

chris-b1 mentioned this issue Dec 15, 2017

Suggestion: Make assign accepts list of dictionaries #18797

Closed

datajanko mentioned this issue Dec 19, 2017

ENH: df.assign accepting dependent **kwargs (#14207) #18852

Merged

3 tasks

jreback modified the milestones: Someday, 0.23.0 Dec 23, 2017

jreback mentioned this issue Dec 31, 2017

ENH: Let Series/DataFrame initialisation from dicts use insertion order when python>=3.6 #19018

Closed

jreback closed this as completed in #18852 Feb 10, 2018

jreback pushed a commit that referenced this issue Feb 10, 2018

ENH: df.assign accepting dependent **kwargs (#14207) (#18852)

bae38fc

shoyer mentioned this issue Feb 10, 2018

Update assign to preserve order for **kwargs pydata/xarray#1901

Open

harisbal pushed a commit to harisbal/pandas that referenced this issue Feb 28, 2018

ENH: df.assign accepting dependent **kwargs (pandas-dev#14207) (panda…

b98e595

…s-dev#18852)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: allow dependent assignment? #14207

API: allow dependent assignment? #14207

chris-b1 commented Sep 12, 2016 •

edited

Loading

TomAugspurger commented Sep 12, 2016

TomAugspurger commented Sep 12, 2016

chris-b1 commented Sep 12, 2016 •

edited

Loading

TomAugspurger commented Feb 7, 2017

mrocklin commented Feb 9, 2017

shoyer commented Feb 9, 2017

shoyer commented Feb 9, 2017

TomAugspurger commented Sep 21, 2017

jorisvandenbossche commented Sep 22, 2017

jreback commented Sep 22, 2017

bobhaffner commented Sep 22, 2017

jorisvandenbossche commented Sep 25, 2017

jreback commented Sep 25, 2017

jorisvandenbossche commented Sep 25, 2017

TomAugspurger commented Dec 15, 2017

datajanko commented Dec 17, 2017

TomAugspurger commented Dec 18, 2017

datajanko commented Dec 18, 2017

chris-b1 commented Dec 19, 2017 via email

API: allow dependent assignment? #14207

API: allow dependent assignment? #14207

Comments

chris-b1 commented Sep 12, 2016 • edited Loading

TomAugspurger commented Sep 12, 2016

TomAugspurger commented Sep 12, 2016

chris-b1 commented Sep 12, 2016 • edited Loading

TomAugspurger commented Feb 7, 2017

mrocklin commented Feb 9, 2017

shoyer commented Feb 9, 2017

shoyer commented Feb 9, 2017

TomAugspurger commented Sep 21, 2017

jorisvandenbossche commented Sep 22, 2017

jreback commented Sep 22, 2017

bobhaffner commented Sep 22, 2017

jorisvandenbossche commented Sep 25, 2017

jreback commented Sep 25, 2017

jorisvandenbossche commented Sep 25, 2017

TomAugspurger commented Dec 15, 2017

datajanko commented Dec 17, 2017

TomAugspurger commented Dec 18, 2017

datajanko commented Dec 18, 2017

chris-b1 commented Dec 19, 2017 via email

chris-b1 commented Sep 12, 2016 •

edited

Loading

chris-b1 commented Sep 12, 2016 •

edited

Loading