Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ux): improve error message on unequal schemas during set ops #9115

Merged
merged 1 commit into from
May 12, 2024

Conversation

NickCrews
Copy link
Contributor

@NickCrews NickCrews commented May 3, 2024

I got tired of ibis.union() failing but only saying "the schemas are different"

The relevant tests are in ibis/tests/expr/test_set_operations.py, looks like the bases are already covered pretty well, don't think I need to add any more

@NickCrews NickCrews marked this pull request as draft May 3, 2024 21:34
@NickCrews NickCrews force-pushed the set_err_msg branch 3 times, most recently from 82d43e8 to 50f3693 Compare May 3, 2024 23:56
@NickCrews NickCrews marked this pull request as ready for review May 4, 2024 00:46
@NickCrews NickCrews requested a review from cpcloud May 4, 2024 00:46
Comment on lines 296 to 299
if "Conflicting values" in str(e):
raise RelationError(err_msg) from e
else:
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really avoid this sort of exception hacking unless it's required.

In no way are the messages coming out of exceptions stable, including messages coming from library internals.

If this ValueError is something we control, then the appropriate thing to do here is to raise a more specific exception in the schema subtract operation and then catch that specific type here.

Raising ValueError is about as useful as other mostly-useless idioms like raising Exception or RuntimeError, which means catching ValueError is very heavy handed (and requires checking for specificity using string hacking).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, was just trying to keep this PR focused on the one thing. I'll switch the valueerror to a relation error :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, check out the new way I do it.

Copy link
Contributor Author

@NickCrews NickCrews May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, actually now I'm thinking about this more. If we make a new error, then if a user wants to do schema1 | schema2, then they will need to import a special ibis exception so they can catch it, and remember its name etc. I actually am tempted to keep the error as some builtin exception and then expect us/users to match via the string. Perhaps we make it so the message string always has a "conflicting values:" prefix (or some other key) and that is what we match on?

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very against reusing standard library exceptions unless it unambiguously makes sense.

ValueError IMO almost never makes sense to reuse. Generally speaking exceptions that map to a huge variety of errors are terrible candidates for reuse, because any downstream code that uses a ValueError exception handler has now opted into a potentially massive set of error conditions. It's really not that different from catching Exception.

That video isn't about matching builtins AFAICT, it's about not naming exceptions redundantly.

I am definitely on board with avoiding the SomethingWentWrongException in favor of SomethingWentWrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I linked to the very start of where he starts talking about exceptions, later he actually gets to his thesis of "stop writing custom exceptions". He talks about that more scattered through the whole talk, you sorta have to listen to the whole thing to get everything (not expecting you to)

Yeah that is the downside to using generic exceptions. I think there are pros and cons of both, which I prefer probably depends on the circumstance. One factor is how likely is the user going to realize something is fallible and try to catch it. If they always wrap the schema1 | schema2 call in a try/catch, then the bubbling up error isn't really a problem. But if someone forgets to do that, then then genericness is a problem. I might expect with no evidence, that in this case, the vast majority of people would not wrap these set ops in a try/catch (they are probably gonna be data scientists) and thus that makes me more likely to agree with you.

I guess the thing I want to avoid is ibis exporting 20 different errors that someone has to keep track of. I would love it if we could use some other existing ibis Exception class here, or do something to simplify the situation

Copy link
Member

@cpcloud cpcloud May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other typical approach is to have a generic class of error and then some kind of enum-like object embedded in it that allows someone to differentiate.

This is effectively OOP-ization of the errno approach from C programming. OSError in the stdlib is a good example of this approach.

As a user, I don't really care how many exception types there are. I care about getting specific, actionable feedback about what went wrong. If there are 20 ways in which things can go wrong, I want to know about that. Sure, those could all be just IbisError. From an interactive-mode-mostly-end-user perspective, I don't the exception count matters.

As a library author I care about the number of exception types only if they aren't stable. Even in an unstable world where exceptions are changing often, I would much rather have to deal with 20 exception types than have 20 different if "string" in str(e)-style checks littered all over my code, liable to break in hard-to-debug ways just because someone decided to fix a typo or capitalize a letter in their error message.

I'm not saying we need a new exception type for every possible error. We should determine whether it's appropriate on a case by case basis and we shouldn't set some arbitrary limit like "no more than 20 custom exception types".

Copy link
Member

@cpcloud cpcloud May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL "Define your own exceptions when you need to"

https://youtu.be/o9pEzgHorH0?feature=shared&t=1575

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL "Define your own exceptions when you need to"

lol yeah I wish he had seen how this was funny :), can't believe the audience doesn't laugh there more

OK I think I'm happy, I just wanted to make sure you had heard this train of thought, but it appears you have thought about it plenty and know what you're looking for. Sounds fine to me then! Is there something we should to to remind ourselves of all of the custom exceptions we define that we will have to eventually expose once we make that stable? maybe just a tracking issue that just lists them all? I'm worried that eg this one hidden away in this corner will slip through the cracks and won't be considered once we merge/publish the errors

ibis/expr/schema.py Outdated Show resolved Hide resolved
@NickCrews NickCrews force-pushed the set_err_msg branch 2 times, most recently from 3de6eae to e13cf23 Compare May 6, 2024 18:34
ibis/common/collections.py Outdated Show resolved Hide resolved
@NickCrews NickCrews requested a review from cpcloud May 7, 2024 01:51
@cpcloud
Copy link
Member

cpcloud commented May 7, 2024

I'll rebase your PR to pick up the fix for the failing docs build.

ibis/common/collections.py Outdated Show resolved Hide resolved
@cpcloud cpcloud added this to the 9.1 milestone May 12, 2024
@cpcloud cpcloud added feature Features or general enhancements ux User experience related issues labels May 12, 2024
@cpcloud cpcloud changed the title feat: improve error message on unequal schemas during set ops feat(ux): improve error message on unequal schemas during set ops May 12, 2024
@cpcloud cpcloud merged commit 5488896 into ibis-project:main May 12, 2024
87 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements ux User experience related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants