Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Increase verbosity of duplicate column error message #11899

Merged
merged 5 commits into from
Feb 13, 2024

Conversation

mcrumiller
Copy link
Contributor

@mcrumiller mcrumiller commented Oct 20, 2023

Resolves #14458

I notice that none of our rust errors are multi-line, is this okay here?

@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Oct 20, 2023
ritchie46
ritchie46 previously approved these changes Oct 20, 2023
crates/polars-error/src/lib.rs Outdated Show resolved Hide resolved
@ritchie46 ritchie46 self-requested a review October 20, 2023 18:33
@ritchie46 ritchie46 dismissed their stale review October 20, 2023 18:33

wrong button

@mcrumiller
Copy link
Contributor Author

I cannot replicate the hypothesis error. It appears it was failing in this case:

import polars as pl
from datetime import date, datetime

pl.DataFrame({
    "col0": pl.Series([0, 0], dtype=pl.UInt16),
    "col1": pl.Series([0, 0], dtype=pl.UInt16),
    "col2": pl.Series([datetime(2000, 1, 1), datetime(2000, 1, 1)], dtype=pl.Datetime("ns")),
    "col3": pl.Series([0, 0], dtype=pl.UInt16),
    "col4": pl.Series([0, 0], dtype=pl.UInt8),
    "col5": pl.Series([None, None], dtype=pl.Date),
    "col6": pl.Series([None, None], dtype=pl.Categorical),
}).null_count()
shape: (1, 7)
┌──────┬──────┬──────┬──────┬──────┬──────┬──────┐
│ col0 ┆ col1 ┆ col2 ┆ col3 ┆ col4 ┆ col5 ┆ col6 │
│ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ u32  ┆ u32  ┆ u32  ┆ u32  ┆ u32  ┆ u32  ┆ u32  │
╞══════╪══════╪══════╪══════╪══════╪══════╪══════╡
│ 0    ┆ 0    ┆ 0    ┆ 0    ┆ 0    ┆ 2    ┆ 2    │
└──────┴──────┴──────┴──────┴──────┴──────┴──────┘

This should pass the assertion test, no clue why it's reporting 2. I installed the same hypothesis version and set the replication decorator, but it fails to run on my machine.

@alexander-beedie
Copy link
Collaborator

I cannot replicate the hypothesis error.

It's this: #11910

@mcrumiller
Copy link
Contributor Author

Thanks Alex. Once that gets resolved I'll pull those changes in here.

@mcrumiller
Copy link
Contributor Author

@ritchie46 this one is now good to go.

@@ -1518,7 +1518,14 @@ impl DataFrame {
let mut names = PlHashSet::with_capacity(cols.len());
for name in cols {
if !names.insert(name.as_str()) {
polars_bail!(duplicate = name);
let msg = format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still to broad IMO. I think this should be here:

polars_ensure!(names.insert(name), duplicate = name);

@stinodego stinodego changed the title fix(rust): increase verbosity of duplicate column error message feat: Increase verbosity of duplicate column error message Feb 13, 2024
@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Feb 13, 2024
@stinodego
Copy link
Member

@mcrumiller I opened a new issue specifically for the improvement in this PR. Could you address Ritchie's comment and finish this improvement?

@mcrumiller
Copy link
Contributor Author

mcrumiller commented Feb 13, 2024

@stinodego I think I accomplished what he suggested, let me know if it still needs work and/or you have any suggestions.

Edit: yikes hang on, there's some bad formatting.

@mcrumiller
Copy link
Contributor Author

Edit: fixed. Got some failures due to the new CI issue.

@mcrumiller
Copy link
Contributor Author

mcrumiller commented Feb 13, 2024

Should we add newlines for our long error messages, or let the terminal wrap them?

>>> import polars as pl
>>> df = pl.DataFrame({"a": [1], "b": [1]})
>>> df.with_columns(pl.all().alias("a"))
polars.exceptions.ComputeError: the name: 'a' passed to `LazyFrame.with_columns` is duplicate

It's possible that multiple expressions are returning the same default column name. If this is the case, try renaming the columns with `.alias("new_name")` to avoid duplicate column names.
>>> df.lazy().with_columns(pl.all().alias("a")).collect()
polars.exceptions.ComputeError: the name: 'a' passed to `LazyFrame.with_columns` is duplicate

It's possible that multiple expressions are returning the same default column name. If this is the case, try renaming the columns with `.alias("new_name")` to avoid duplicate column names.

@stinodego
Copy link
Member

Should we add newlines for our long error messages, or let the terminal wrap them?

Terminal should wrap them.

@mcrumiller
Copy link
Contributor Author

I'll rebase to fix the CI tests.

@ritchie46 ritchie46 merged commit fb51095 into pola-rs:main Feb 13, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More helpful error message for duplicate column names in select/with_columns
4 participants