-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor format_array in pandas\io\formats\format.py #26837
Comments
as per #26833 (comment) IIUC correctly, before EAs, but we now have a ExtensionArrayFormatter that includes logic to convert to a numpy array
whereas before, _formatting_values was used and conversion to a numpy array looked like
even though _formatting_values is depecated:
it is still used. but does not necessarily return an numpy array
so when creating a repr_html of a DataFrame which uses _format_col, we pass a It seems a bit convoluted. i'm not sure why a numpy array is not returned from _formatting_values. I think the issues are related since the discussions revolve around numpy arrays from extension arrays and formatting probably works best if the formatting is done on a numpy array with am object dtype. performance is probably not an issue for formatting. |
I think you are confusing two different And it is the Series._formatting_values that can potentially return an EA, as that is what ExtensionBlock.formatting_values decides to return (this is where the
It's a bit of indirection, but that is how the EAs are currently displayed: the values itself are converted to a numpy array (the |
I think the general discussion items are:
|
i guess. is #24858 related? |
but also pass through ExtensionArrayFormatter. Is this generic enough for EA developers? |
What do you mean exactly?
Well, that was an issue prompted by the deprecated |
that would presumably be less performant. on the other hand, shouldn't EA developers have a mechanism to be able to achieve the same performance? |
Yes, and additionally, EAs might also want to use the full array context to format their values. A possible conclusion from this discussion might be that it was maybe the wrong direction we took when deprecating @TomAugspurger do you remember what the motivation was to move from the formatting_values to the scalar formatter? |
ExtensionArrayFormatter has special casing for internal EAs. That's why this issue was opened. The title of this issue maybe misleading now since the PR where the contents of ExtensionArrayFormatter was moved to format_array has been reverted.
That maybe a separate issue. but could solve the special casing? |
Not sure, as I think any refactor of |
the issue is raised directly from #26833 (comment), where the PR was a precursor to allow a custom formatter to override the defaults #26000 The EA formatters are 'incomplete' as custom formatters as they don't account for other formatting options. I have now updated #26000, to allow EA formatters and custom formatters specified in to_string, to_latex and to_html to share the same format_array machinery. |
Not especially, though if I had to guess it was so that |
As requested, moving this comment here from #26833, discussing EA formatters and particularly the deprecation of
Its easy enough to eliminate the units from the display, but that information is kind of helpful, it's just repeated unnecessarily. as a suggestion, I'd like to see some control over the column headers handed to the EA,
or the like, which is more readable and compact. With some config option or other, we could include some df.info() stuff in the repr. |
Again, as also explained to @simonjayhawkins above, this is mixing up to different
So as far as I know, also the Series repr fully follows the
That are certainly interesting ideas, but a whole bigger discussion I think, outside the scope of this issue. |
Now I see what happened. When I was playing with this a few days ago, I was using a commit between |
So In other words, please go away :) I've got a small patch to the frame repr code (with a config option) which adds a row of dtypes to the column header. It's pretty small. I implemented using a small concept which might be relevent here. In Qt there's a concept called a "Role". When you query some object for its value, you pass it a "role" argument (a constant from a list) indicating in what context the value is being asked. So you have You've already started something like that with For example, In the patch I just mentioned, which includes the dtype in the frame repr's column header , I modified the repr code to call a (new) if isinstance(col.dtype, ExtensionDtype):
try:
dtype_name = col.dtype.to_string(role="frame.repr.column_header")
except:
dtype_name=str(col.dtype) With this approach, you can have very granular control over rendering. |
Ah, that was unfortunate :-) But that is also the reason it was reverted.
OK, but that's something more limited than providing the ability to also customize that from the EA. Adding dtype information is a general thing, so if we want to look at this, we should start looking at the option to show the dtype in the dataframe repr (independent of EA or not). Personally, If find that something worth looking into. Eg also R tibbles show their dtypes. If you want to do that, let's open a new issue for that. |
yes it is. But I mentioned it mostly to introduce the possibility of adding a "role" argument to |
But the |
But in general, I think the role suggestion is certainly more robust than the current |
I used the same concept to add "to_string(role)" method to dtype, for my own use, it's not a PR. In most cases you'd return the same thing, but you would have the control to have each one be different. |
I ran into this today. Im trying to write an ExtensionArray for a somewhat complicated object (each item is an xarary.DataArray that's a satellite image). xarray loads data lazily, but the call to pandas/pandas/io/formats/format.py Line 1649 in 2d3644c
So for this use case, I think we want the ExtensionArray to handle as much as possible of what class ExtensionArray:
def _format_array(self, formatter, float_format, na_rep, ...) -> List[str]? That would take some / all the kwargs from Finally, we would update |
see #26833 (comment)
to make
format_array
more generic we need to push more functionality to the objects to be formatted themselves (and PandasArray) rather than encoding the conversions and special casing informat_array
.The text was updated successfully, but these errors were encountered: