Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(python): Fix max_colname_length formatting in glimpse() #13969

Merged
merged 1 commit into from
Jan 25, 2024

Conversation

jacksonthall22
Copy link
Contributor

@jacksonthall22 jacksonthall22 commented Jan 24, 2024

It seems there was an arbitrary limit of 100 characters for the length of the string showing the column name and dtype in glimpse(). If you have long column names and pass a max_colname_length that is too long, you will get a formatting error because the format width specifier for the last line item becomes negative. Specifically, If you have a df with a column named a = 'some_long_col_name_xxxxxxxxx...', then df.glimpse(max_colname_length=b) throws a formatting error for any b > len(a) + len(f"<{_dtype_str_repr(df[a].dtype)}>") (see below). This PR just removes the width specifier on the last line item which was redundant anyway.

def test(max_colname_length):
    df = pl.DataFrame({'a' * 100: [1, 2, 3]})
    df.glimpse(max_colname_length=max_colname_length)
test(95)
Rows: 3
Columns: 1
$ aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa… <i64> 1, 2, 3
test(96)
ValueError                                Traceback (most recent call last)
[~\AppData\Local\Temp\ipykernel_30960\2962826816.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/user/.../~/AppData/Local/Temp/ipykernel_30960/2962826816.py) in ?()
      1 test = pl.DataFrame({'a' * 100: [1, 2, 3]})
----> 2 test.glimpse(max_colname_length=96)

[c:\Users\user\miniforge3\envs\env\Lib\site-packages\polars\dataframe\frame.py](file:///C:/Users/user/miniforge3/envs/env/Lib/site-packages/polars/dataframe/frame.py) in ?(self, max_items_per_column, max_colname_length, return_as_string)
   4338 
   4339         # print individual columns: one row per column
   4340         for col_name, dtype_str, val_str in data:
   4341             output.write(
-> 4342                 f"$ {col_name:<{max_col_name}}"
   4343                 f" {dtype_str:>{max_col_dtype}}"
   4344                 f" {val_str:<{min(len(val_str), max_col_values)}}\n"
   4345             )

ValueError: Sign not allowed in string format specifier

@stinodego stinodego changed the title Fix max_colname_length formatting in glimpse() fix(python): Fix max_colname_length formatting in glimpse() Jan 25, 2024
@github-actions github-actions bot added fix Bug fix python Related to Python Polars labels Jan 25, 2024
stinodego
stinodego previously approved these changes Jan 25, 2024
Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look into this a bit to fully understand it, but it is indeed a good fix! Now we properly respect the max_items_per_column / max_colname_length parameters.

The test failure is unrelated.

@stinodego stinodego dismissed their stale review January 25, 2024 08:24

Let me add a test first

@stinodego stinodego merged commit da867b7 into pola-rs:main Jan 25, 2024
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants