Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): add to_repr methods to DataFrame and Series #7802

Merged
merged 1 commit into from
Mar 27, 2023

Conversation

ghuls
Copy link
Collaborator

@ghuls ghuls commented Mar 26, 2023

No description provided.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Mar 26, 2023
@ghuls
Copy link
Collaborator Author

ghuls commented Mar 26, 2023

Closes: #7732

@alexander-beedie I also added missingpl.from_repr documentation link.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 27, 2023

I'm trying to come up with a different name for to_repr, as it's not the actual repr.

Something like df.to_init_repr() or pl.to_init_repr(df) instead?

@ritchie46
Copy link
Member

ritchie46 commented Mar 27, 2023

I'm trying to come up with a different name for "to_repr", as it's not the actual repr.

Yes, I agree. I think to_init_repr is better. 👍

Really cool feature @ghuls.

@ghuls
Copy link
Collaborator Author

ghuls commented Mar 27, 2023

I'm trying to come up with a different name for to_repr, as it's not the actual repr.

Something like df.to_init_repr() or pl.to_init_repr(df) instead?

I am fine with changing the name, but can you explain me why it is not the actual repr?

As far as I know __repr__ functions should construct a string that can reinstate the object.

import numpy as np

In [9]: a = np.array([[48984, 4894], [4568, 48968], [468135, 49849]])

In [10]: a.__repr__()
Out[10]: 'array([[ 48984,   4894],\n       [  4568,  48968],\n       [468135,  49849]])'

In [11]: print(a.__repr__())
array([[ 48984,   4894],
       [  4568,  48968],
       [468135,  49849]])

In [13]: from numpy import array

In [14]: eval(a.__repr__())
Out[14]: 
array([[ 48984,   4894],
       [  4568,  48968],
       [468135,  49849]])

Do you plan to find another name for pl.from_repr too?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Mar 27, 2023

I am fine with changing the name, but can you explain me why it is not the actual repr?

The short version would be: if it's not what you get back from calling repr(obj), then it's not the object's repr.

As far as I know __repr__ functions should construct a string that can reinstate the object.

The result from an object's repr is defined specifically as "a string containing a printable representation of an object1." So, something useful/informative that can reasonably be returned in an interactive console, notebook, logs, etc.

Now, for many types it is also a good idea to make the repr usable as if you could eval/init it (and on any smaller-scale custom types I develop at work I make the effort to do so); also from the python docs: "for many types, this function makes an attempt to return a string that would yield an object with the same value when passed to eval()" (but note the use of "many", not "all", and "makes an attempt").

Objects like DataFrames (or anything else that fronts large data) don't usually return eval-style reprs because anything other than a trivial frame would return an absurdly large string, rendering it unprintable and/or less useful as the "official" representation of the given object/data. Hence Polars doesn't, Pandas doesn't, PyArrow doesn't, DB interfaces don't, etc. The general trend in these cases is to have a partial table representation that provides a useful sense of the object (its size, the schema, some data, and so on).

Do you plan to find another name for pl.from_repr too?

No - because from_repr does parse the DataFrame repr, as advertised ;)
(And should probably handle the Series repr too, in the near future 💭)

Footnotes

  1. https://docs.python.org/3/library/functions.html#repr

@ghuls
Copy link
Collaborator Author

ghuls commented Mar 27, 2023

The result from an object's repr is defined specifically as "a string containing a printable representation of an object1." So, something useful/informative that can reasonably be returned in an interactive console, notebook, logs, etc.

I thought that __str__ was meant for that. (although for a lot of objects __str__ and __repr__ is the same).

Changed name to to_init_repr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants