Skip to content

Choosing Exceptions to Raise

Jeff Tratner edited this page Sep 24, 2013 · 9 revisions

This is an opinionated (start) to a guide to using Exceptions in Pandas.

Overall: use builtin exceptions wherever possible and avoid using raise Exception!

Builtin Exceptions

  • ValueError - passed a value of the right type, but it's not compatible (e.g., incompatibly-shaped objects).
  • TypeError - wrong number of arguments, mutually exclusive arguments - sidenote: don't raise TypeError if you're doing arithmetic. It's much better to return NotImplemented, which will cause Python to try the _r* methods. If both return NotImplemented, you'll get TypeError: unsupported operand type(s) for <op>: '<type1>' and '<type2>'
  • Grey area for ValueError and TypeError - incompatible dtypes, wrong option (i.e., if something can have three values, 'a', 'b', and 'c' and you pass 'd' should that be TypeError or ValueError. If it's really that there are only a few options, probably better to go for TypeError
  • KeyError - anything that has to do with mapping lookups
  • IndexError - lookup that tries to go out of range of sequence (e.g., [1, 2][5])
  • IOError - broad error class that covers most failures of IO (duh...but it's important to differentiate, can be helpful for telling whether to clean something up, etc.)

AssertionErrors and assert

  • AssertionError and assert statements should be used only for testing that contracts are working correctly - specifically, assert and AssertionError should only be used to test that you haven't reached an inconsistent state [like one internal function passing the wrong arguments to another - if you need to validate input to an external function, you should never raise an AssertionError or use an assert statement to check that. For example, let's say you have a Cython function that will segfault if you pass it the wrong type or the wrong length. It makes sense to do something like:
assert type(inpt) is dict, "Wrong input type for dict!"
my_cythonized_function(inpt)
# or...
assert len(obj) >= n, "Need size of at least %d" % n
my_cythonized_function(obj)

You'll notice there are a number of places where we have

if <condition>:
    raise AssertionError(<msg>)

instead of assert <condition>, <msg>. My assumption is that this is because, in those places, we believe that those should only be internal errors, but we're not sure if the external API can trigger them. Since assert statements are stripped out when you run python -O, a decision was made to change them all to AssertionErrors. Hopefully as we expand the test suite, we can decide to go back to assert statements which more clearly indicate which things are part of the internal contract and which are actually things that don't get checked anywhere else (and therefore should be a different kind of error).

A note on testing

If you add something that can raise an Exception of any kind, make sure you've added a test case for it. (the exception to this are AssertionErrors, which shouldn't be reachable with the public API. If you test for TypeError, be sure to use pandas.util.testing.assertRaisesRegexp to check that you're getting the TypeError you expect (it's very easy for TypeError tests to actually catch wrong number of required arguments, incorrect keyword arguments, or other changes to signature).

Using Pandas-provided Exceptions

Almost never should be used. Unless you see a very specific need (like using Exceptions to signal or an error that just doesn't fit in any other way), you should use builtin exceptions.