feat(python): 10% speedup for `to_dicts` method #6415

alexander-beedie · 2023-01-24T16:39:35Z

Note that we will be able to redirect to iterrows(named=True) once 0.16.0 is released, as that will build dictionaries using the same optimisations.

Until then, here's a simple ~10% speedup on the current approach (as tested on a -release build).

gab23r · 2023-01-24T18:16:06Z

@alexander-beedie Just for my understanding, what is the purpose of doing dict_ = dict ?

stinodego

Actually shouldn't we use rows instead of iterrows?

I think we might be able to remove this method, as df.rows(named=True) offers the same functionality after 0.16.0?

alexander-beedie · 2023-01-25T00:45:01Z

Actually shouldn't we use rows instead of iterrows?

As the size of the frame gets very large it's actually more efficient and, perhaps surprisingly, slightly faster to use iterrows (probably because you allocate per slice, not for the whole result set). In local benchmarking there's a transition somewhere between around 20m elements where the crossover happens:

df = pl.DataFrame(
    {
        "x": range(10_000_000),
        "y": date(2023,1,1),
        "z": "abcdefghijklm",
    }
)
with Timer():
    _ = df.rows()
# Elapsed time: 16.1816 seconds

with Timer():
    _ = list(df.iterrows())
# Elapsed time: 15.5870 seconds

I think we might be able to remove this method, as df.rows(named=True) offers the same functionality after 0.16.0?

I'd be in favour of even going one step beyond that and removing rows in favour of just iterrows, given that it's just a list call away, as above 😅 (Should probably better-characterise the domains where rows is faster first though, so we always take the faster of the two options, as it's still a bit faster on smaller data).

(I think I can further speed-up both calls too; will experiment...)

alexander-beedie · 2023-01-25T00:50:37Z

@alexander-beedie Just for my understanding, what is the purpose of doing dict_ = dict ?

It results in a python bytecode optimisation because we have such a simple/obvious hot loop; when the function gets run it'll use calls to LOAD_FAST (or probably LOAD_DEREF in this case, inside the list comprehension) when resolving dict/zip instead of calls to the slightly more expensive LOAD_GLOBAL. Localising self.columns does much the same thing, optimising away a LOAD_ATTR. It's a trick rarely worth using if you have much else going on in your loop, but here it gets us about an 8% speedup vs not doing it, so...

gab23r · 2023-01-25T11:24:06Z

Thanks @alexander-beedie for the explanation !

feat(python): 10% speedup for "to_dicts"

564a3f8

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Jan 24, 2023

stinodego approved these changes Jan 24, 2023

View reviewed changes

ritchie46 merged commit fe22509 into pola-rs:master Jan 25, 2023

alexander-beedie deleted the speedup-to-dicts branch January 25, 2023 09:19

alexander-beedie mentioned this pull request Jan 27, 2023

feat(python): ~30% faster iter_rows(named=True) and to_dicts(), if pyarrow available #6493

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): 10% speedup for `to_dicts` method #6415

feat(python): 10% speedup for `to_dicts` method #6415

alexander-beedie commented Jan 24, 2023 •

edited

Loading

gab23r commented Jan 24, 2023

stinodego left a comment •

edited

Loading

alexander-beedie commented Jan 25, 2023 •

edited

Loading

alexander-beedie commented Jan 25, 2023 •

edited

Loading

gab23r commented Jan 25, 2023

feat(python): 10% speedup for to_dicts method #6415

feat(python): 10% speedup for to_dicts method #6415

Conversation

alexander-beedie commented Jan 24, 2023 • edited Loading

gab23r commented Jan 24, 2023

stinodego left a comment • edited Loading

Choose a reason for hiding this comment

alexander-beedie commented Jan 25, 2023 • edited Loading

alexander-beedie commented Jan 25, 2023 • edited Loading

gab23r commented Jan 25, 2023

feat(python): 10% speedup for `to_dicts` method #6415

feat(python): 10% speedup for `to_dicts` method #6415

alexander-beedie commented Jan 24, 2023 •

edited

Loading

stinodego left a comment •

edited

Loading

alexander-beedie commented Jan 25, 2023 •

edited

Loading

alexander-beedie commented Jan 25, 2023 •

edited

Loading