Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inspect.getsource() on sourceless dataclass raises undocumented exception #98239

Closed
mthuurne opened this issue Oct 13, 2022 · 8 comments
Closed
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@mthuurne
Copy link

mthuurne commented Oct 13, 2022

Bug report

If I run the following program in Python 3.10:

from dataclasses import dataclass
from inspect import getsource

defs = {}
exec(
    """
@dataclass
class C:
    "The source for this class cannot be located."
""",
    {"dataclass": dataclass},
    defs,
)

try:
    getsource(defs["C"])
except OSError:
    print("Got the documented exception.")

The output is:

$ python sourceless_dataclass.py 
Traceback (most recent call last):
  File "<path>/sourceless_dataclass.py", line 16, in <module>
    getsource(defs["C"])
  File "/usr/lib/python3.10/inspect.py", line 1147, in getsource
    lines, lnum = getsourcelines(object)
  File "/usr/lib/python3.10/inspect.py", line 1129, in getsourcelines
    lines, lnum = findsource(object)
  File "/usr/lib/python3.10/inspect.py", line 940, in findsource
    file = getsourcefile(object)
  File "/usr/lib/python3.10/inspect.py", line 817, in getsourcefile
    filename = getfile(object)
  File "/usr/lib/python3.10/inspect.py", line 786, in getfile
    raise TypeError('{!r} is a built-in class'.format(object))
TypeError: <class 'C'> is a built-in class

The documentation states that OSError can be raised but does not mention TypeError.

The implementation of inspect.getsource() assumes that if a class has no __module__ attribute, it must be a built-in class, but a sourceless dataclass doesn't have a __module__ attribute either. I don't know whether this is a bug in getsource() or whether the generation of the dataclass should set __module__ to '__main__', but in any case the behavior is not as documented.

Your environment

  • CPython versions tested on: Python 3.10.6
  • Operating system and architecture: Ubuntu Linux 18.04

Linked PRs

@mthuurne mthuurne added the type-bug An unexpected behavior, bug, or error label Oct 13, 2022
@mthuurne
Copy link
Author

Note that inspect.getmodule() returns the builtins module when passed a sourceless dataclass instead of returning None.

@ericvsmith
Copy link
Member

ericvsmith commented Oct 15, 2022

Note there's a slightly different error with namedtuples. Note this is with 3.12.0a0:

>>> from collections import namedtuple
>>> from inspect import getsource
>>>
>>> defs = {}
>>> exec("""
... T = namedtuple("T", [])
... """,
... {'namedtuple': namedtuple},
... defs,
... )
>>> getsource(defs["T"])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "...\cpython\Lib\inspect.py", line 1255, in getsource
    lines, lnum = getsourcelines(object)
                  ^^^^^^^^^^^^^^^^^^^^^^
  File "...\cpython\Lib\inspect.py", line 1237, in getsourcelines
    lines, lnum = findsource(object)
                  ^^^^^^^^^^^^^^^^^^
  File "...\cpython\Lib\inspect.py", line 1048, in findsource
    file = getsourcefile(object)
           ^^^^^^^^^^^^^^^^^^^^^
  File "...\cpython\Lib\inspect.py", line 925, in getsourcefile
    filename = getfile(object)
               ^^^^^^^^^^^^^^^
  File "...\cpython\Lib\inspect.py", line 893, in getfile
    raise OSError('source code not available')
OSError: source code not available
>>> from inspect import getmodule
>>> getmodule(defs["T"])
<module '__main__' (<class '_frozen_importlib.BuiltinImporter'>)>
>>>

Maybe that will help point someone in the right direction.

@mthuurne
Copy link
Author

OSError is the documented exception if the source cannot be found, so I think that for namedtuple it is working as intended. For the namedtuple class, __module__ is set to '__main__', so for consistency it might be good for the dataclass creation to do the same.

@sobolevn
Copy link
Member

sobolevn commented Feb 8, 2023

I doubt that these two cases are the same:

  • namedtuple is a function that creates a new class
  • dataclass is a decorator that works on an existing class

Classes are created according to the regular Python rules. When you exec a source code without __name__ provided, it defaults to 'builtins'. And inspect does not know how to get sources of builtins.

This happens without @dataclass as well:

from inspect import getsource

defs = {}
exec(
    """
class C:
    "The source for this class cannot be located."
""",
    defs,
)

print(defs["C"].__module__)  # 'builtins'

try:
    getsource(defs["C"])
except OSError:
    print("Got the documented exception.")
# TypeError: <class 'C'> is a built-in class

This is why you get an exception.

Fix:

exec(
    """
@dataclass
class C:
    "The source for this class cannot be located."
""",
    {"dataclass": dataclass, "__name__": "__main__"},  # pass `__name__`!
    defs,
)

So:

  1. I propose to leave dataclasses alone
  2. I think we must document TypeError in inspect calls. It is explicit, it is here for quite a long time: there are no reasons to keep it in secret :)

I will send a PR for this.

@leofang
Copy link

leofang commented Mar 20, 2023

It appears to me that this is a kind of introspection limitation for sourceless dataclasses.

There are other ways to generate a sourceless dataclass, such as the dataclasses.make_dataclass() API. However, in that case __module__ is 'types', as can be seen using the documented example:

>>> from dataclasses import *
>>> C = make_dataclass('C',
...                    [('x', int),
...                      'y',
...                     ('z', int, field(default=5))],
...                    namespace={'add_one': lambda self: self.x + 1})
>>> 
>>> C
<class 'types.C'>
>>> C.__module__
'types'

Then in this case getsource() would raise the right error type (OSError):

>>> inspect.getsource(C)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/leof/miniforge3/envs/opt_einsum_dev/lib/python3.8/inspect.py", line 997, in getsource
    lines, lnum = getsourcelines(object)
  File "/home/leof/miniforge3/envs/opt_einsum_dev/lib/python3.8/inspect.py", line 979, in getsourcelines
    lines, lnum = findsource(object)
  File "/home/leof/miniforge3/envs/opt_einsum_dev/lib/python3.8/inspect.py", line 824, in findsource
    raise OSError('could not find class definition')
OSError: could not find class definition

and this time the error makes better sense, because the actually codegen is buried somewhere else, and it's not possible (to my knowledge, at least) to infer the lines & line number.

Ideally, in the case of sourceless dataclasses, one would likely want to overwrite __module__ to the call site (of make_dataclasses(), for example). Then somehow inspect should be able to make use of this knowledge and determine the appropriate source file & line number. But I am not sure if there's a robust solution to handle all corner cases, hence I think of it as a limitation.

@sobolevn
Copy link
Member

make_dataclass now supports module argument, please see #102104

@AlexWaygood AlexWaygood added the stdlib Python modules in the Lib dir label Mar 20, 2023
@leofang
Copy link

leofang commented Mar 20, 2023

Thank you, @sobolevn, it's nice to see this new module argument added.

But as I said above, even after __module__ is added/overwritten, inspect.getsource() would still raise (in the OP's case __module__ is builtins, my case it's a user-supplied module location, so the error type differs). It'd be cool for getsource() to point to the line where the dataclass is created:

C = make_dataclass(...)

but I don't have good suggestion for how to make it 100% robust.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Mar 23, 2023
…ror` (pythonGH-101689)

(cherry picked from commit b613208)

Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Mar 23, 2023
…ror` (pythonGH-101689)

(cherry picked from commit b613208)

Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
miss-islington added a commit that referenced this issue Mar 23, 2023
…H-101689)

(cherry picked from commit b613208)

Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
miss-islington added a commit that referenced this issue Mar 23, 2023
…H-101689)

(cherry picked from commit b613208)

Co-authored-by: Nikita Sobolev <mail@sobolevn.me>
@hauntsaninja
Copy link
Contributor

I see we made the documentation change and sobolevn's #102104 helps here too. I'm not seeing a concrete additional suggestion (and maybe any such suggestion should be its own issue), so closing. Thanks all! :-)

Fidget-Spinner pushed a commit to Fidget-Spinner/cpython that referenced this issue Mar 27, 2023
warsaw pushed a commit to warsaw/cpython that referenced this issue Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

6 participants