-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename entrypoint to __consortium_api__
?
#323
Comments
Slightly dreading starting the conversation though, and the downside is that the minimum pandas version supported by the standard would have to rise to 2.2 An alternative could be that in from typing import Any
from dataframe_api_compat import dataframe_api
@dataframe_api(api_version='2023.11-beta')
def my_dataframe_agnostic_function(df: DataFrame) -> Any:
for column_name in df.column_names:
new_column = df.col(column_name)
new_column = (new_column - new_column.mean()) / new_column.std()
df = df.assign(new_column.rename(f'{column_name}_scaled'))
return df.dataframe Then we don't need to bother pandas, and this looks pretty clean anyway |
Folks may not want to take on the I have no objections to the name change other than it may be a bit confusing when working across arrays, dataframes, and other future types that may have efforts to standardize APIs. We should probably also have our spec include this dunder method as part of the |
It's already mentioned here: dataframe-api/spec/purpose_and_scope.md Lines 261 to 276 in 7be00b6
I don't think If you have a |
If I get an arbitrary dataframe as input and I want to confirm it's standard-compliant, how do I do that today? In my mind the easiest way would be to have standard-compliant classes implement |
there's |
That returns the namespace and not a compliant dataframe object. So the code would end up looking like: def get_compliant_dataframe(df):
if hasattr(df, "__dataframe_namespace__"):
return df
else:
return df.__dataframe_consortium_standard__(...) It feels a bit clunky but I guess it's not too bad? |
yeah, and as Ralf said, in the end, people will probably just write their own helper functions might as well close then, this isn't too bad |
If #308 goes in, then the return value of
Column.get_value
will change. It will no longer be a Python scalar, but aScalar
This means I'll have to update the tests in pandas/Polars:
https://github.com/pandas-dev/pandas/blob/f777e67d2b29cda5b835d30c855b633269f5e8e8/pandas/tests/test_downstream.py#L340-L344
I'll change it to something much simpler that realistically will never break, like asserting something about
result.name
If I'm going to have to change things upstream, I'd like to take the chance the rename the entrypoint
__dataframe_consortium_standard__
is just...long. Originally we'd suggested__dataframe_standard__
, but Brock correctly pointed out that this has normative connotationsWe're starting to get positive responses (see koaning/scikit-lego#597, skrub-data/skrub#786), so the time to make changes is running out
My hope is that this would then need to be the last upstream update. The rest, we can handle here / in
dataframe-api-compat
The text was updated successfully, but these errors were encountered: