Skip to content

Commit

Permalink
feat(python): support mean for bool columns in Series.describe
Browse files Browse the repository at this point in the history
  • Loading branch information
taki committed Jan 21, 2024
1 parent 0a701db commit 45ac4da
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 6 deletions.
27 changes: 26 additions & 1 deletion py-polars/polars/series/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1899,11 +1899,21 @@ def describe(
-----
The median is included by default as the 50% percentile.
The mean for boolean series is the ratio of true values
to the total non-null values.
Returns
-------
DataFrame
Mapping with summary statistics of a Series.
Warnings
--------
We will never guarantee the output of describe to be stable.
It will show statistics that we deem informative and may
be updated in the future.
Examples
--------
>>> s = pl.Series([1, 2, 3, 4, 5])
Expand All @@ -1925,6 +1935,20 @@ def describe(
│ max ┆ 5.0 │
└────────────┴──────────┘
>>> s = pl.Series([True, False, True, None, True])
>>> s.describe()
shape: (4, 2)
┌────────────┬───────┐
│ statistic ┆ value │
│ --- ┆ --- │
│ str ┆ f64 │
╞════════════╪═══════╡
│ count ┆ 4.0 │
│ null_count ┆ 1.0 │
│ sum ┆ 3.0 │
│ mean ┆ 0.75 │
└────────────┴───────┘
Non-numeric data types may not have all statistics available.
>>> s = pl.Series(["a", "a", None, "b", "c"])
Expand Down Expand Up @@ -1957,11 +1981,12 @@ def describe(
stats["max"] = self.max()

elif self.dtype == Boolean:
stats_dtype = Int64
stats_dtype = Float64
stats = {
"count": self.count(),
"null_count": self.null_count(),
"sum": self.sum(),
"mean": self.mean(),
}
elif self.dtype == String:
stats_dtype = Int64
Expand Down
6 changes: 1 addition & 5 deletions py-polars/tests/unit/series/test_describe.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,11 +61,7 @@ def test_series_describe_boolean() -> None:
s = pl.Series([True, False, None, True, True])
result = s.describe()

stats = {
"count": 4,
"null_count": 1,
"sum": 3,
}
stats = {"count": 4, "null_count": 1, "sum": 3, "mean": 0.75}
expected = pl.DataFrame({"statistic": stats.keys(), "value": stats.values()})
assert_frame_equal(expected, result)

Expand Down

0 comments on commit 45ac4da

Please sign in to comment.