Skip to content

Commit

Permalink
perf: optimize DataFrame.describe by presorting columns
Browse files Browse the repository at this point in the history
By presorting numerical columns, quantiles/min/max will be O(1)
  • Loading branch information
taki committed Jan 18, 2024
1 parent 36d0e94 commit bb3500e
Showing 1 changed file with 7 additions and 1 deletion.
8 changes: 7 additions & 1 deletion py-polars/polars/dataframe/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4451,8 +4451,14 @@ def describe(
for c in self.columns
]

# Sort numerical columns to make all quantiles O(1)
sort_exprs = [
(F.col(c).sort() if c in stat_cols else F.col(c)) for c in self.columns
]
df_cols_presorted = self.select(*sort_exprs)

# Calculate metrics in parallel
df_metrics = self.select(
df_metrics = df_cols_presorted.select(
F.all().count().name.prefix("count:"),
F.all().null_count().name.prefix("null_count:"),
*mean_exprs,
Expand Down

0 comments on commit bb3500e

Please sign in to comment.