feat(ux): improve error message on unequal schemas during set ops #9115

NickCrews · 2024-05-03T21:33:26Z

I got tired of ibis.union() failing but only saying "the schemas are different"

The relevant tests are in ibis/tests/expr/test_set_operations.py, looks like the bases are already covered pretty well, don't think I need to add any more

cpcloud · 2024-05-06T13:16:13Z

ibis/expr/operations/relations.py

+            if "Conflicting values" in str(e):
+                raise RelationError(err_msg) from e
+            else:
+                raise


We should really avoid this sort of exception hacking unless it's required.

In no way are the messages coming out of exceptions stable, including messages coming from library internals.

If this ValueError is something we control, then the appropriate thing to do here is to raise a more specific exception in the schema subtract operation and then catch that specific type here.

Raising ValueError is about as useful as other mostly-useless idioms like raising Exception or RuntimeError, which means catching ValueError is very heavy handed (and requires checking for specificity using string hacking).

I agree, was just trying to keep this PR focused on the one thing. I'll switch the valueerror to a relation error :)

OK, check out the new way I do it.

well, actually now I'm thinking about this more. If we make a new error, then if a user wants to do schema1 | schema2, then they will need to import a special ibis exception so they can catch it, and remember its name etc. I actually am tempted to keep the error as some builtin exception and then expect us/users to match via the string. Perhaps we make it so the message string always has a "conflicting values:" prefix (or some other key) and that is what we match on?

What do you think?

I am very against reusing standard library exceptions unless it unambiguously makes sense.

ValueError IMO almost never makes sense to reuse. Generally speaking exceptions that map to a huge variety of errors are terrible candidates for reuse, because any downstream code that uses a ValueError exception handler has now opted into a potentially massive set of error conditions. It's really not that different from catching Exception.

That video isn't about matching builtins AFAICT, it's about not naming exceptions redundantly.

I am definitely on board with avoiding the SomethingWentWrongException in favor of SomethingWentWrong.

I linked to the very start of where he starts talking about exceptions, later he actually gets to his thesis of "stop writing custom exceptions". He talks about that more scattered through the whole talk, you sorta have to listen to the whole thing to get everything (not expecting you to)

Yeah that is the downside to using generic exceptions. I think there are pros and cons of both, which I prefer probably depends on the circumstance. One factor is how likely is the user going to realize something is fallible and try to catch it. If they always wrap the schema1 | schema2 call in a try/catch, then the bubbling up error isn't really a problem. But if someone forgets to do that, then then genericness is a problem. I might expect with no evidence, that in this case, the vast majority of people would not wrap these set ops in a try/catch (they are probably gonna be data scientists) and thus that makes me more likely to agree with you.

I guess the thing I want to avoid is ibis exporting 20 different errors that someone has to keep track of. I would love it if we could use some other existing ibis Exception class here, or do something to simplify the situation

The other typical approach is to have a generic class of error and then some kind of enum-like object embedded in it that allows someone to differentiate.

This is effectively OOP-ization of the errno approach from C programming. OSError in the stdlib is a good example of this approach.

As a user, I don't really care how many exception types there are. I care about getting specific, actionable feedback about what went wrong. If there are 20 ways in which things can go wrong, I want to know about that. Sure, those could all be just IbisError. From an interactive-mode-mostly-end-user perspective, I don't the exception count matters.

As a library author I care about the number of exception types only if they aren't stable. Even in an unstable world where exceptions are changing often, I would much rather have to deal with 20 exception types than have 20 different if "string" in str(e)-style checks littered all over my code, liable to break in hard-to-debug ways just because someone decided to fix a typo or capitalize a letter in their error message.

I'm not saying we need a new exception type for every possible error. We should determine whether it's appropriate on a case by case basis and we shouldn't set some arbitrary limit like "no more than 20 custom exception types".

LOL "Define your own exceptions when you need to"

https://youtu.be/o9pEzgHorH0?feature=shared&t=1575

LOL "Define your own exceptions when you need to"

lol yeah I wish he had seen how this was funny :), can't believe the audience doesn't laugh there more

OK I think I'm happy, I just wanted to make sure you had heard this train of thought, but it appears you have thought about it plenty and know what you're looking for. Sounds fine to me then! Is there something we should to to remind ourselves of all of the custom exceptions we define that we will have to eventually expose once we make that stable? maybe just a tracking issue that just lists them all? I'm worried that eg this one hidden away in this corner will slip through the cracks and won't be considered once we merge/publish the errors

ibis/expr/schema.py

ibis/common/collections.py

cpcloud · 2024-05-07T17:57:17Z

I'll rebase your PR to pick up the fix for the failing docs build.

ibis/common/collections.py

NickCrews marked this pull request as draft May 3, 2024 21:34

NickCrews force-pushed the set_err_msg branch 3 times, most recently from 82d43e8 to 50f3693 Compare May 3, 2024 23:56

NickCrews marked this pull request as ready for review May 4, 2024 00:46

NickCrews requested a review from cpcloud May 4, 2024 00:46

cpcloud requested changes May 6, 2024

View reviewed changes

NickCrews force-pushed the set_err_msg branch 2 times, most recently from 3de6eae to e13cf23 Compare May 6, 2024 18:34

cpcloud reviewed May 6, 2024

View reviewed changes

ibis/common/collections.py Outdated Show resolved Hide resolved

NickCrews requested a review from cpcloud May 7, 2024 01:51

cpcloud reviewed May 7, 2024

View reviewed changes

ibis/common/collections.py Outdated Show resolved Hide resolved

NickCrews force-pushed the set_err_msg branch from e13cf23 to f6a629d Compare May 7, 2024 18:05

feat: improve error message on unequal schemas during set ops

fb0926c

NickCrews force-pushed the set_err_msg branch from f6a629d to fb0926c Compare May 11, 2024 08:08

cpcloud approved these changes May 12, 2024

View reviewed changes

cpcloud added this to the 9.1 milestone May 12, 2024

cpcloud added feature Features or general enhancements ux User experience related issues labels May 12, 2024

cpcloud changed the title ~~feat: improve error message on unequal schemas during set ops~~ feat(ux): improve error message on unequal schemas during set ops May 12, 2024

cpcloud merged commit 5488896 into ibis-project:main May 12, 2024
87 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ux): improve error message on unequal schemas during set ops #9115

feat(ux): improve error message on unequal schemas during set ops #9115

NickCrews commented May 3, 2024 •

edited

Loading

cpcloud May 6, 2024

NickCrews May 6, 2024

NickCrews May 6, 2024

NickCrews May 6, 2024 •

edited

Loading

cpcloud May 6, 2024

NickCrews May 6, 2024

cpcloud May 6, 2024 •

edited

Loading

cpcloud May 6, 2024 •

edited

Loading

NickCrews May 6, 2024

cpcloud commented May 7, 2024

feat(ux): improve error message on unequal schemas during set ops #9115

feat(ux): improve error message on unequal schemas during set ops #9115

Conversation

NickCrews commented May 3, 2024 • edited Loading

cpcloud May 6, 2024

Choose a reason for hiding this comment

NickCrews May 6, 2024

Choose a reason for hiding this comment

NickCrews May 6, 2024

Choose a reason for hiding this comment

NickCrews May 6, 2024 • edited Loading

Choose a reason for hiding this comment

cpcloud May 6, 2024

Choose a reason for hiding this comment

NickCrews May 6, 2024

Choose a reason for hiding this comment

cpcloud May 6, 2024 • edited Loading

Choose a reason for hiding this comment

cpcloud May 6, 2024 • edited Loading

Choose a reason for hiding this comment

NickCrews May 6, 2024

Choose a reason for hiding this comment

cpcloud commented May 7, 2024

NickCrews commented May 3, 2024 •

edited

Loading

NickCrews May 6, 2024 •

edited

Loading

cpcloud May 6, 2024 •

edited

Loading

cpcloud May 6, 2024 •

edited

Loading