-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
100 * pl.col("a").mean()
returns 0
when a
is a boolean column
#7651
Comments
Uneducated guess: |
Seems like the casting is as you suspected, @slonik-az ldf = pl.DataFrame({"a": [True, True, False]}).lazy()
ldf.select(pl.col("a").mean() * 100).describe_optimized_plan()
# ' SELECT [[(col("a").mean().cast(Int32)) * (100)]] FROM\n DF ["a"]; PROJECT 1/1 COLUMNS; SELECTION: "None"'
ldf.select(pl.col("a").mean() * 100.).describe_optimized_plan()
# ' SELECT [[(col("a").mean().cast(Float64)) * (100.0)]] FROM\n DF ["a"]; PROJECT 1/1 COLUMNS; SELECTION: "None"' I'm probably not the best person for this fix, but may continue to poke at it as an excuse to read some more code. Here's where I got: In The function Seems like Agg(Mean(Node)) should (always?) be f64, rather than bool |
Current:// polars/polars/polars-lazy/polars-plan/src/logical_plan/aexpr/schema.rs
impl AExpr {
pub fn to_field(
match self {
Agg(agg) => {
Mean(expr) => {
let mut field = ... //comes in as Boolean type via Col
float_type(&mut field) // Remains Boolean: DataType::Boolean.is_numeric() == False
Ok(field) To Fix:Basic working code // polars/polars/polars-lazy/polars-plan/src/logical_plan/aexpr/schema.rs
impl AExpr {
pub fn to_field(
match self {
Agg(agg) => {
Mean(expr) => {
let mut field = ... //comes in as Boolean type via Col
if matches!(&field.dtype, DataType::Boolean) {
field.coerce(DataType::Float64);
} else {
float_type(&mut field);
}
Ok(field) Other Comment - Lazy Median(Boolean) cannot infer dtype
pl.DataFrame({"a": [True, True, False, False]}).lazy().select(pl.col("a").median()).dtypes
# [Boolean]
pl.DataFrame({"a": [True, True, False, False]}).lazy().select(pl.col("a").median()).collect().dtypes
# [Float64]
|
@ritchie46 could you accept this issue as well? |
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Multiplying the output of
pl.col("a").mean()
by100
, where"a"
is a boolean column, results in0
.Reproducible example
Expected behavior
Output should be 33.3333.
Installed versions
The text was updated successfully, but these errors were encountered: