Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support PEP 420 -- Implicit Namespace Packages #13293

Closed
jiasli opened this issue May 1, 2020 · 19 comments · Fixed by #14372
Closed

Support PEP 420 -- Implicit Namespace Packages #13293

jiasli opened this issue May 1, 2020 · 19 comments · Fixed by #14372
Assignees
Milestone

Comments

@jiasli
Copy link
Member

jiasli commented May 1, 2020

As Python 2 has been deprecated, we should move away from the pkg_resources approach and use PEP 420 -- Implicit Namespace Packages.

Importing pkg_resources will add a 100ms~200ms delay to CLI's loading time.

$ time python -c "import pkg_resources"

real    0m0.153s
user    0m0.137s
sys     0m0.016s
time python -X importtime -m azure.cli version -h 2>perf.log
tuna perf.log

image

This is discussed at pypa/setuptools#510 which affects many projects but still doesn't have a proper fix.

Moreover, on Linux, the wrapper ~/env38/bin/az created by Easy Install also requires pkg_resources:

#!/home/admin/env38/bin/python
# EASY-INSTALL-DEV-SCRIPT: 'azure-cli==2.5.1','az'
__requires__ = 'azure-cli==2.5.1'
__import__('pkg_resources').require('azure-cli==2.5.1')
__file__ = '/home/admin/azure-cli/src/azure-cli/az'
with open(__file__) as f:
    exec(compile(f.read(), __file__, 'exec'))

This makes az installed via azdev even slower.

For comparison, directly run azure.cli:

$ time python -m azure.cli version -h

real    0m0.261s
user    0m0.238s
sys     0m0.024s

Run the Easy Install wrapper:

$ time python ~/env38/bin/az version -h

real    0m0.822s
user    0m0.779s
sys     0m0.044s
python -X importtime ~/env38/bin/az version -h 2>perf.log
tuna perf.log

image

In our Linux release, we package our own wrapper src/azure-cli/az, so it doesn't suffer from the slowness of this Easy Install wrapper.

@yonzhan
Copy link
Collaborator

yonzhan commented May 2, 2020

add to S169

@lmazuel
Copy link
Member

lmazuel commented May 5, 2020

See also #8164

Since you don't need Python 2, your situation is way simpler than it was when I created that issue, but might give you some insights. Feel free to contact me if you have questions on packaging.

@jiasli
Copy link
Member Author

jiasli commented May 13, 2020

#2951 aimed to remove pkg_resources/declare_namespace from the final wheel, but in my test with pip and Windows MSI installer, the __init__.py or __init__.pyc file is still there. Why?

This also brings up another question: what is the purpose of azure-cli-nspkg and azure-cli-command_modules-nspkg. Are we free to remove them?

@lmazuel
Copy link
Member

lmazuel commented May 16, 2020

Yes, you can kill them now, their only purpose is Python 2, that you don't handle anymore. Remove them all from everywhere.

@haroldrandom
Copy link
Contributor

I think we are going to onboard this, have to change the way to find Python Package in setup.py from find_packages to find_namespace_packages.
And a lot of continuous intgeration job need to change accordingly as we use find_package a lot in CLI Extension Azure/azure-cli-extensions#1608

@zooba
Copy link
Member

zooba commented May 18, 2020

Yes, you can kill them now, their only purpose is Python 2, that you don't handle anymore. Remove them all from everywhere.

Not quite true.

Python resolves namespace packages by searching every sys.path entry for the name (in this case, azure), and returning the first one that contains an __init__.py file. If no __init__.py file is found, it well return all the azure directories that were found as a namespace package (essentially, nothing on disk, and if you import any subpackages from it then it'll search all the matching directories to find it).

By adding the nspkg packages, you get an empty __init__.py file, which prevents this lookup mechanism. Your package "wins", and nobody else gets to merge with you. (Or someone else wins, but then everything breaks immediately because they aren't the real library.)

But the point is that the __init__.py file should be empty. You only need the pkg_resources code when installing each package into a separate directory, which pip has never done (that was an old setuptools idea that even setuptools deprecated eventually).

You also need them if you put the __init__.py files in your development environment, where each package in is a separate part of the repo, because then Python won't merge them together. Given the choice of deleting the __init__.py file or adding the pkg_resources hack (or adding a far more reliable __path__ update), for some reason everyone keeps adding the pkg_resources hack back. I really can't believe how many years I've been saying to stop doing that, and yet it keeps happening 😆

(Minor aside - the reason for the nspkg packages is that pip doesn't reference count individual files. So while you could put the __init__.py file in every package, the first one that was uninstalled/updated would likely delete it and break the rest. The safest approach is to give it its own package, as we have.)

So my recommendations are (completely unchanged from when I first designed the current SDK structure):

  • no azure\__init__.py files in the repository
  • no azure\__init__.py files in sdists/wheels
  • empty azure\__init__.py file in azure-nspkg package
  • unpinned (no version) dependency on azure-nspkg from every package that puts a subpackage under azure\

(Find/replace for azure/mgmt and azure/storage and any others that behave similarly.)

@lmazuel
Copy link
Member

lmazuel commented May 19, 2020

The current recommendation I push is:

  • azure\__init__.py with pkgutil line in the repository. Could do no file if Python 2 dev is not required.
  • azure\__init__.py with pkgutil line in the sdist. Could do no file if Python 2 dev is not required.
  • no azure\__init__.py file in wheels
  • empty azure\__init__.py in azure-nspkg package
  • unpinned (no version) dependency on azure-nspkg from every package that puts a subpackage under azure\
  • BUT, make azure-nspkg a Python 2 only requirement (use PEP420 when you can)

pkg_resources is dead and is not part of any contributions guidelines (in favor of pkgutil when Python 2 matters).

So I think the only contentious part is: should we install azure-nspkg on Python 3, or not (and then use PEP420)? I was advocate of PEP420.

Adding @brettcannon and @johanste for two cents

@lmazuel
Copy link
Member

lmazuel commented May 19, 2020

Side note @jiasli that I didn't realize before @zooba comment, in real life CLI installation, you shouldn't have the pkg_resources line anyway, since you're not supposed to rely on nspkg packages today already. So I think you did you measurement on a dev install of the CLI. In order to get numbers that represents reality, I would encourage you do them on a real install.

If you do have pkg_resources on any real CLI installation, that's a bug and that's parallel to the discussion if we should use PEP420 (no __init__.py) or regular package (empty __init__.py file)

@brettcannon
Copy link
Member

I don't have time for 2 cents (plus I live in a country without a penny so I couldn't give you 2 cents even if I wanted to 😉 ).

@jiasli
Copy link
Member Author

jiasli commented May 20, 2020

Regarding real-life CLI installation's __init__.py

If you do have pkg_resources on any real CLI installation, that's a bug and that's parallel to the discussion if we should use PEP420 (no __init__.py) or regular package (empty __init__.py file)

Unfortunately, the real-life CLI installation does have __init__.py with pkg_resources 🤔. @fengzhou-msft, please help check this issue.

> pip install uncompyle6

> uncompyle6 "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\Lib\site-packages\azure\__init__.pyc"
# uncompyle6 version 3.7.0
# Python bytecode 3.6 (3379)
# Decompiled from: Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)]
# Embedded file name: C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-7cl6xofh\azure-cli\azure\__init__.py
# Compiled at: 2020-04-30 14:19:00
# Size of source mod 2**32: 414 bytes
import pkg_resources
pkg_resources.declare_namespace(__name__)
# okay decompiling C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\Lib\site-packages\azure\__init__.pyc

> uncompyle6 "C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\Lib\site-packages\azure\cli\__init__.pyc"
# uncompyle6 version 3.7.0
# Python bytecode 3.6 (3379)
# Decompiled from: Python 3.8.2 (tags/v3.8.2:7b3ab59, Feb 25 2020, 23:03:10) [MSC v.1916 64 bit (AMD64)]
# Embedded file name: C:\Users\VSSADM~1\AppData\Local\Temp\pip-install-7cl6xofh\azure-cli\azure\cli\__init__.py
# Compiled at: 2020-04-30 14:19:00
# Size of source mod 2**32: 627 bytes
__doc__ = "The Azure Command-line tool.\n\nThis tools provides a command-line interface to Azure's management and storage\nAPIs.\n"
import pkg_resources
pkg_resources.declare_namespace(__name__)
__author__ = 'Microsoft Corporation <python@microsoft.com>'
__version__ = '2.5.1'
# okay decompiling C:\Program Files (x86)\Microsoft SDKs\Azure\CLI2\Lib\site-packages\azure\cli\__init__.pyc

@jiasli
Copy link
Member Author

jiasli commented May 20, 2020

Regarding the empty __init__.py from azure-cli-nspkg

There are 2 conflicting azure/cli/__init__.py files.

# src/azure-cli-nspkg/azure/cli/__init__.py
<Empty>

# src/azure-cli/azure/cli/__init__.py
import pkg_resources
pkg_resources.declare_namespace(__name__)

src/azure-cli-nspkg/azure doesn't contain a __init__.py with pkg_resources. I think that's why src/azure-cli-nspkg/azure/cli/__init__.py is not discovered during import azure.cli.

>>> import azure
>>> azure.__path__
[
    'd:\\cli\\env38\\lib\\site-packages\\azure',
    'd:\\cli\\azure-cli\\src\\azure-cli-telemetry\\azure', 
    'd:\\cli\\azure-cli\\src\\azure-cli-core\\azure', 
    'd:\\cli\\azure-cli\\src\\azure-cli\\azure', 
    'd:\\cli\\azure-cli\\src\\azure-cli-testsdk\\azure'
]

>>> import azure.cli
>>> azure.cli.__path__
[
    'd:\\cli\\azure-cli\\src\\azure-cli-telemetry\\azure\\cli', 
    'd:\\cli\\azure-cli\\src\\azure-cli-core\\azure\\cli', 
    'd:\\cli\\azure-cli\\src\\azure-cli\\azure\\cli', 
    'd:\\cli\\azure-cli\\src\\azure-cli-testsdk\\azure\\cli'
]

If src/azure-cli-nspkg/azure/cli/__init__.py is empty, it is indicating a regular package, instead of a namespace package. We perhaps shouldn't even use the misleading name azure-cli-nspkg.

@lmazuel
Copy link
Member

lmazuel commented May 22, 2020

@jiasli

  • On dev environment, you must NOT install any nspkg. For dev, there is no question that you want to leverage PEP420. The nspkg will indeed create trouble to import, at least if you venv is in editable mode.
  • Your production wheel must NOT contains azure\cli\__init__.py,
  • Your github repo should not contain any azure\cli\__init__.py. This file shouldn't exist.

The debate we have with @zooba is solely about this line, and nothing else, the rest of the recommended guidelines still applies. Once everything else is fixed, and we're down the decide this, we'll settle the debate :). In the meantime, keep this line since it's there, but there is many things to still fix it seems.

@lmazuel
Copy link
Member

lmazuel commented May 22, 2020

@jiasli @fengzhou-msft I created a dedicated issue for the content of the wheels file
#13635

@jiasli
Copy link
Member Author

jiasli commented May 26, 2020

Thanks @lmazuel. One thing I noticed is that azure-nspkg's py2 wheel does contain an azure/__init__.py with pkg_resources in it (Yes, this is what we want). However, azure-cli-nspkg's azure/cli/__init__.py is empty. I am thinking: is the naming of this package wrong from the beginning, which caused all these confusions?

@johanste
Copy link
Member

No, it contains: __path__ = __import__('pkgutil').extend_path(__path__, __name__)

No reference to pkg_resources

@jiasli
Copy link
Member Author

jiasli commented May 27, 2020

Sorry, I mean https://pypi.org/project/azure-nspkg/1.0.0/, which Azure CLI uses.

image

https://github.com/Azure/azure-cli-dev-tools/blob/9fd92e46fb3f7bbae605b3972c30e56afbe66c49/azdev/operations/setup.py#L110

pip_cmd("install -q -I azure-nspkg==1.0.0", "Installing `azure-nspkg`...")

@johanste do you mind sharing more context about #2951? I am a little bit confused about why azure-cli-nspkg is called nspkg when it only contains an empty azure/cli/__init__.py.

We can schedule a meeting some time around 5pm Redmond time for a quick sync.

@zooba
Copy link
Member

zooba commented May 28, 2020

The nspkg is short for "NameSpace PacKaGe", which arguably is backwards because it removes the namespace package by turning it into a real one, but ultimately the name doesn't matter much as it has no functionality.

Its sole purpose is to get around the fact that pip does not reference count individual files on install, but it will ensure there is only a single version of an entire package installed at once. If we put the __init__.py in every package, they'd stomp on each other and the first uninstall/upgrade would delete it. By putting it in its own package, it only gets installed once and won't be touched at all during operations on other packages.

None of this particularly matters for CLI when you have a "static" install (i.e. anything besides pip), since nobody is adding or removing packages to your SDK install. As long as you end up with the empty __init__.py file somehow, it will make your azure module importable, and everything will work.

@jiasli
Copy link
Member Author

jiasli commented May 29, 2020

In my mind, the name does matter because it indicates the purpose of this package:

  1. azure-nspkg and azure-mgmt-nspkg are used to turn azure and azure.mgmt into pkg_resource or pkgutil style namespace packages, while
  2. azure-cli-nspkg is used to turn azure.cli into a normal package with an empty src/azure-cli-nspkg/azure/cli/__init__.py (However, this doesn't work as expected in the released package/real installation as I discovered in Support PEP 420 -- Implicit Namespace Packages #13293 (comment))
  3. azure-cli-command_modules-nspkg is used to turn azure.cli.command_modules into a normal package with an empty src/azure-cli-command_modules-nspkg/azure/cli/command_modules/__init__.py. However, azure-cli already has an empty src/azure-cli/azure/cli/command_modules/__init__.py.

I think we need more clarification on the initial naming motivation, and also an investigation on why azure-cli-nspkg doesn't work as expected in the real installation.

@yonzhan yonzhan modified the milestones: S170, S171 May 31, 2020
@yonzhan yonzhan self-assigned this Jun 9, 2020
@yonzhan yonzhan modified the milestones: S171, S172 Jun 19, 2020
@jiasli jiasli modified the milestones: S172, S173 Jul 9, 2020
@jiasli jiasli modified the milestones: S173, S174 Jul 29, 2020
@arrownj
Copy link
Contributor

arrownj commented Aug 20, 2020

Close this for CLI will support this in 2.11.0 (#14372).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment