Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gazelle): __init__.py in per-file targets #1582

Merged
merged 9 commits into from
Dec 20, 2023
Merged

fix(gazelle): __init__.py in per-file targets #1582

merged 9 commits into from
Dec 20, 2023

Conversation

siddharthab
Copy link
Contributor

@siddharthab siddharthab commented Nov 29, 2023

As per Python spec, __init__.py files are depended upon by every file
in the package, so let's make sure that our generated targets also
understand this implicit dependency. Note that because Python module
dependencies are not a DAG, we can not depend on the Bazel target for
__init__.py files (to avoid cycles in Bazel), and hence a non-empty
__init__.py file is added to the srcs attribute of every
py_library target.

The language spec also says that each package depends on the parent
package, but that is a less commonly used feature, and can make things
more complex.

From importlib docs:

Changed in version 3.3: Parent packages are automatically imported.

From import language reference:

Importing parent.one will implicitly execute parent/init.py and parent/one/init.py.

As per Python spec, `__init__.py` files are depended upon by every file
in the package, so let's make sure that our generated targets also
understand this implicit dependency. Note that because Python module
dependencies are not a DAG, we can not depend on the Bazel target for
`__init__.py` files (to avoid cycles in Bazel), and hence a non-empty
`__init__.py` file is added to the `srcs` attribute of every
`py_library` target.

The language spec also says that each package depends on the parent
package, but that is a less commonly used feature, and can make things
more complex.
@siddharthab
Copy link
Contributor Author

Test is failing because of a timeout in requirements_test. All other tests are passing.

@siddharthab siddharthab marked this pull request as ready for review November 29, 2023 19:31
@siddharthab siddharthab requested a review from f0rmiga as a code owner November 29, 2023 19:31
Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

gazelle/python/__main__.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@siddharthab siddharthab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review.

@philsc
Copy link
Contributor

philsc commented Dec 7, 2023

As per Python spec, init.py files are depended upon by every file
in the package

From a Bazel point of view, would it not be more accurate to say that the __init__.py file depends on every other file in the package? I.e. you really shouldn't access __init__.py without having access to all the other files too. I.e. there should only be one public library in the package. And that's the __init__.py one. That one public library then pulls in all the other libraries.

@rickeylev
Copy link
Collaborator

would it not be more accurate to say init.py depends on every file in the package

No. But also Yes.

No in that, if an init.py only imports one sub module among many, init.py doesn't depend on the others. In theory, you could have a module that isn't part of the whole package and people opt-into pulling it in. e.g.

py_library(name = "__init__", srcs=[__init__.py, cheap.py])
py_library(name = "expensive", srcs=[expensive.py], deps=[__init__])

With the above, the expensive file only gets included if you depend on the expensive target. Yay, faster builds! But...

Yes in that, in practice, maintaining such a distinction is hard once any code or imports go into __init__.py. People love putting code in __init__.py to make their apis "cleaner", even though it often turns into importing unused stuff. Python's support for circular imports also exacerbates the issue. (This is a wide-spread issue with Python IMO, not specific to Bazel. Bazel just makes it more obvious because you begin to clearly see what you pay for every build). Anyways, yeah, every sub-module depends on init, and then the init has to depend on anything it imports, and then it snowballs from there, and you basically end up with a single target for the entire directory tree that looks like srcs=glob(**.py).

@rickeylev rickeylev requested a review from aignas December 7, 2023 21:44
@siddharthab
Copy link
Contributor Author

I think people are often confused about the spec of the __init__.py file. Python language spec says that each module depends on its package (the word "package" is sometimes used to refer to the directory and sometimes used for just the __init__.py module within the directory), and each package depends on the parent package. We have not attempted the dependency on the parent package in this change.

What people have done in practice is used the package to re-export symbols from the modules in the package, so they can refactor code in between the modules in a package without breaking downstream uses. This is just a level of indirection that people opt into and they have chosen the __init__.py file for no special reason other than the import path looks cleaner; but it was not the purpose of the __init__.py file. This choice leads to circular dependencies (modules depend on package and package depends on module).

If a package is supplying a public interface through the __init__.py file (and keeping the modules private), then it makes sense to have a single target for the entire package. However, there are codebases that do not do this as often, especially codebases that are using Bazel. There, you would expect the individual modules to be exposed to the users as their own targets. But there might still be some initialization code common to the entire package in __init__.py files, like setting environment variables, checking the user's installed Python wheels, etc., which is what this change is trying to address. I think these kinds of initializations were the primary reason for the existence of __init__.py in the language spec.

PS: I am new to Python and in trying to understand the language spec, and what Gazelle can do for it, I tried to experiment with different things in https://github.com/siddharthab/bazel-gazelle-python including transitive closures of module groups, but I think simply repeating the source file is as good a thing as any (with perhaps the label name used to resolve ambiguities for import resolution when multiple targets have the same file in srcs).

@aignas
Copy link
Collaborator

aignas commented Dec 8, 2023

@siddharthab, what do you think about making this behaviour optional via the gazelle directives?

@siddharthab
Copy link
Contributor Author

Sounds acceptable to me.

But note that this is the correct behavior as per the language spec. It will be surprising if it does not happen by default.

@aignas
Copy link
Collaborator

aignas commented Dec 8, 2023

My thinking is that this change may be a breaking change given that this makes the structure of the bazel dep graph more rigid and in certain cases the users may want to have time to migrate - first this feature becomes available, but is not the default, then in the subsequent version it becomes available as a default but users can still switch back and then the version after that we can remove the old behaviour.

@siddharthab
Copy link
Contributor Author

Sounds good. I have now put this feature behind a directive. Let me know if you would like to see more tests. Also happy to address any nits. If you would rather make changes to my branch directly, that is also OK.

@philsc
Copy link
Contributor

philsc commented Dec 12, 2023

I know I'm piling on to the pile, so feel free to ignore this.

As per Python spec, init.py files are depended upon by every file
in the package,

WDYT about amending the Summary with a link to the relevant part of the spec?

@siddharthab
Copy link
Contributor Author

WDYT about amending the Summary with a link to the relevant part of the spec?

Done.

Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, could you add a CHANGELOG.md for the fix please?

@siddharthab
Copy link
Contributor Author

LGTM, could you add a CHANGELOG.md for the fix please?

Done.

CHANGELOG.md Outdated Show resolved Hide resolved
@aignas aignas enabled auto-merge December 20, 2023 10:01
@aignas aignas added this pull request to the merge queue Dec 20, 2023
Merged via the queue into bazelbuild:main with commit 5ba63a8 Dec 20, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants