Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python,rust): raise default frame/series repr height from 8 to 10 #13699

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jan 13, 2024

Very minor change, but I received a few comments about this recently and I tend to agree with the idea. It is simply a small adjustment to favour the common case of being intrinsically more interested in a top 10 than a top 8.

Let's see what everyone thinks ;)

The change

  • Make the default number of visible rows in a frame/series repr 10 instead of 8.
  • That's it. I wasn't kidding when I said it was minor.

Why?

Wanting to look at a top/bottom 5 or a top 10 feels much more common than wanting to look at a top 4 or a top 8. Maybe this is down to humans having 5 fingers per hand, the decimal system, or something else, but there aren't a whole lot of "top 8" lists out there compared to "top 10" lists. The world seems used to thinking about result sets of this size, and there is little harm going with the flow here.

I present the following incredibly rigorous scientific evidence in the form of "number of search results returned from Google" for the quoted search terms "top 8" vs "top 10" 😉

"top 10": 1,660,000,000
"top 8":     82,200,000

Note that "top 5" also significantly outpaces both "top 8" and "top 4" (so when we do truncate, presenting the top/bottom 5 (which is what truncation would leave us with if our default total changes to 10 rows) is likely preferable over top/bottom 4, particularly during interactive exploration:

"top 5": 633,000,000 
"top 4":  81,100,000

(Chart showing number of Google search results for "top n" search terms; note the huge spikes at 5 and 10).

How this manifests

Something like df.head(10) won't show you all of the results, for example:

  • Top 10 Netflix (English) films this week

    Current (can't actually see the whole top 10)

    shape: (10, 4)
    ┌───────────────────────────────────┬──────────────┬────────────┬─────────┐
    │ film                              ┆ hours_viewed ┆ views      ┆ runtime │
    │ ---                               ┆ ---          ┆ ---        ┆ ---     │
    │ str                               ┆ str          ┆ str        ┆ str     │
    ╞═══════════════════════════════════╪══════════════╪════════════╪═════════╡
    │ The Equalizer 3                   ┆ 26,800,000   ┆ 14,800,000 ┆ 01:49   │
    │ Rebel Moon — Part One: A Child o… ┆ 25,100,000   ┆ 11,100,000 ┆ 02:16   │
    │ Leave the World Behind            ┆ 18,700,000   ┆ 7,900,000  ┆ 02:22   │
    │ Exodus: Gods and Kings            ┆ 18,600,000   ┆ 7,400,000  ┆ 02:30   │
    │ …                                 ┆ …            ┆ …          ┆ …       │
    │ Leo                               ┆ 9,800,000    ┆ 5,500,000  ┆ 01:47   │
    │ Those Who Wish Me Dead            ┆ 8,600,000    ┆ 5,200,000  ┆ 01:40   │
    │ Bitconned                         ┆ 7,700,000    ┆ 4,900,000  ┆ 01:34   │
    │ Chicken Run: Dawn of the Nugget   ┆ 8,000,000    ┆ 4,700,000  ┆ 01:42   │
    └───────────────────────────────────┴──────────────┴────────────┴─────────┘
    

    Proposed (top 10 visible without truncation)

    shape: (10, 4)
    ┌───────────────────────────────────┬──────────────┬────────────┬─────────┐
    │ film                              ┆ hours_viewed ┆ views      ┆ runtime │
    │ ---                               ┆ ---          ┆ ---        ┆ ---     │
    │ str                               ┆ str          ┆ str        ┆ str     │
    ╞═══════════════════════════════════╪══════════════╪════════════╪═════════╡
    │ The Equalizer 3                   ┆ 26,800,000   ┆ 14,800,000 ┆ 01:49   │
    │ Rebel Moon — Part One: A Child o… ┆ 25,100,000   ┆ 11,100,000 ┆ 02:16   │
    │ Leave the World Behind            ┆ 18,700,000   ┆ 7,900,000  ┆ 02:22   │
    │ Exodus: Gods and Kings            ┆ 18,600,000   ┆ 7,400,000  ┆ 02:30   │
    │ Aquaman                           ┆ 16,800,000   ┆ 7,000,000  ┆ 02:23   │
    │ The Super Mario Bros. Movie       ┆ 8,700,000    ┆ 5,700,000  ┆ 01:32   │
    │ Leo                               ┆ 9,800,000    ┆ 5,500,000  ┆ 01:47   │
    │ Those Who Wish Me Dead            ┆ 8,600,000    ┆ 5,200,000  ┆ 01:40   │
    │ Bitconned                         ┆ 7,700,000    ┆ 4,900,000  ┆ 01:34   │
    │ Chicken Run: Dawn of the Nugget   ┆ 8,000,000    ┆ 4,700,000  ┆ 01:42   │
    └───────────────────────────────────┴──────────────┴────────────┴─────────┘
    

I think this is somewhat trivial, but also actually worthwhile; any additional thoughts, disagreements? 😄

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jan 13, 2024
@stinodego
Copy link
Member

I think this is a good change. I'm not sure what the rationale was for the existing behavior - something with display in notebooks? I don't know. But the proposed change sounds fine to me 👍

@mcrumiller
Copy link
Contributor

Related #13445.

@stinodego
Copy link
Member

Probably should have @ritchie46 sign off on this one, but looks good to me!

@ritchie46
Copy link
Member

Can you mention this PR as a comment. Then from now on, we know the rationale behind our choice. :)

The rationale behind 8 was just me choosing something reasonable, I think.

In any case. Fine by me. 👍

@alexander-beedie
Copy link
Collaborator Author

Can you mention this PR as a comment. Then from now on, we know the rationale behind our choice. :)

Good plan; done ;)

@alexander-beedie alexander-beedie merged commit 729790d into pola-rs:main Jan 14, 2024
23 checks passed
@alexander-beedie alexander-beedie deleted the tweak-default-repr-height branch January 14, 2024 08:57
@Wainberg
Copy link
Contributor

@alexander-beedie the default for Series is still 26 rows, should this also be changed to 10?

>>> pl.Series(range(1, 26 + 1))
shape: (26,)
Series: '' [i64]
[
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
]
>>> pl.Series(range(1, 27 + 1))
shape: (27,)
Series: '' [i64]
[
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        1215
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants