Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert underscore to dash in pip package names #18126

Closed
wants to merge 2 commits into from

Conversation

mikaelarguedas
Copy link
Member

@mikaelarguedas mikaelarguedas commented Jun 6, 2018

related to #18116

@@ -237,13 +237,13 @@ python:
python-adafruit-bno055-pip:
debian:
pip:
packages: [adafruit_bno055]
packages: [adafruit-bno055]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to pip freeze it is Adafruit-BNO055

@dirk-thomas
Copy link
Member

I find it rather confusing to use a different name in the rosdep database then what the actual package name on PyPI is (e.g. https://pypi.org/project/catkin_tools/). I think it would be "cleaner" if the rosdep value matches exactly the package name on PyPI. Any necessary transformation should be done by the tool imo.

@mathias-luedtke
Copy link
Contributor

mathias-luedtke commented Jun 6, 2018

@dirk-thomas: the updated names are the PyPi packages names (as reported by pip).
Even catkin_pkg is listed as catkin-pkg in python.yaml.
https://pypi.org/project/catkin-tools/ works as well, but redirects..
Even https://pypi.org/search/?q=catkin links to the dash URLs.

As an alternative approach, rosdep could use a canonical name to match against the pip freeze output. For now it seems to be lower-case and underscores converted to dashes, but there might be more rules.

@dirk-thomas
Copy link
Member

catkin_tools is the official package name (see https://github.com/catkin/catkin_tools/blob/2cae17f8f32b0193384d2c7734afee1c60c4add2/setup.py#L110). Everything else is mangled and imo that should happen in the code - not in the rosdep values.

@dirk-thomas
Copy link
Member

Even https://pypi.org/search/?q=catkin links to the dash URLs.

But it shows as catkin_pkg / catkin_tools in the result list which are the actual names.

@mathias-luedtke
Copy link
Contributor

catkin_tools is the official package name

But the debian package name is python-catkin-tools. ;)
And pip is using catkin-tools.

Don't get me wrong, I see your point, this will confuse users at first.
However, currently the rosdep system is broken for these packages, so it would be good to find a solution.

@mathias-luedtke
Copy link
Contributor

@tfoote, @mikaelarguedas, @dirk-thomas: How to proceed here?
A fix in rosdep is possible, but will be rather ugly because the ouput of pip freeze must be converted as well as the pip package name that was provided by python.yaml.

Copy link
Member

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me as it stands.

As I see it:

  • The pip rules should use the canonical "pip" name, as reported by pip freeze. There's not a programmatic mapping from any project name to the pip key that I know of we can implement in code.

The main key for consistency sake should follow the predicted debian naming convention with the -pip extension if pip only.

There is a "canonicalize_name" method in the packaging module: https://github.com/pypa/packaging/blob/master/docs/utils.rst

That implements PEP 503's normalize names, but pip freeze doesn't follow that entirely. Exceptions I found on my system include both . and capitals.

  • pyOpenSSL==0.15.1
  • zope.interface==4.1.3

@tfoote
Copy link
Member

tfoote commented Jun 11, 2018

Also note that there's actually bugs related to using the different names within pip: pypa/pip#5021 they're not even equivalent.

@dirk-thomas
Copy link
Member

I don't agree with the above statement:

The pip rules should use the canonical "pip" name, as reported by pip freeze

Why are the actual package names declared in the packages source code not "canonical"?

@mathias-luedtke
Copy link
Contributor

Why are the actual package names declared in the packages source code not "canonical"?

That's a very good question (there are some open issues discussing this..).
For now, pip is reporting different names, e.g.

$ pip show catkin_tools
Name: catkin-tools
...

(needs pip 9 or newer)

@fmessmer
Copy link
Contributor

Any chance this gets rebased and merged (as a temporary solution) until a final solution is agreed upon and implemented?

@fmessmer
Copy link
Contributor

a month later....any chance this gets rebased and merged?
or what's the current status?

@tfoote
Copy link
Member

tfoote commented Jul 25, 2018

@dirk-thomas Since this is how pip works, will it work for you if we just follow what they're doing since we don't have control over it? If so we'd just need a rebase of this commit.

@dirk-thomas
Copy link
Member

dirk-thomas commented Jul 25, 2018

My opinion hasn't changed. I think the canonical name is the name a package declares in its manifest and the rosdep database should use that exact name.

When interacting with pip rosdep can apply any transformation necessary to match the behavior of pip (likely just replacing underscores with dashes?). That seems to be very little effort to do as far as I can see.

@mathias-luedtke
Copy link
Contributor

(likely just replacing underscores with dashes?). That seems to be very little effort to do as far as I can see

Unfortunately, the names are case-sensitive as well.
In addtion pip does not track these variations properly, so it would install the same package again and again.

@dirk-thomas
Copy link
Member

Unfortunately, the names are case-sensitive as well.
In addtion pip does not track these variations properly, so it would install the same package again and again.

The underscore / dash replacement was just an example. Whatever transformation pip applies can be applied in rosdep to match the behavior of pip.

@mathias-luedtke
Copy link
Contributor

Whatever transformation pip applies can be applied in rosdep to match the behavior of pip.

This seems to vary with each pip version.
Only the name reported by pip show seems to be supported throughout all versions.

@dirk-thomas
Copy link
Member

This seems to vary with each pip version.

If that is the case that is an even stronger reason why we shouldn't manipulate the entries in the rosdep database. If that mangling changes we have to way to get back to the canonical / "upstream" name of the packages. And a single database wouldn't work with different pip versions.

@mathias-luedtke
Copy link
Contributor

@dirk-thomas, that's why I added:

Only the name reported by pip show seems to be supported throughout all versions.

These names work with all pip versions and are easy to verfiy/maintain.

@dirk-thomas
Copy link
Member

This conversation is not making much progress.

I have stated a clear reason why I think the canonical names of the packages should be kept in the rosdep database without any mangling.

What is the advantage of mangling the package names in place (over applying the name mangling at runtime by code in rosdep)? Is it only avoiding to touch rosdep code? Or the concern that the mangling function might be difficult to write? Or that the mangling function can't be implemented since there is no "logic" behind it (which would be surprising since pip does the transformation somehow)?

@mathias-luedtke
Copy link
Contributor

canonical names of the packages should be kept in the rosdep database

The canonical name (w.r.t pip) is the name reported by pip.

Is it only avoiding to touch rosdep code? Or the concern that the mangling function might be difficult to write? Or that the mangling function can't be implemented since there is no "logic" behind it (which would be surprising since pip does the transformation somehow)?

I don't think that it would be hard to write (after the actual logic was identified). But it would introduce effort for the rosdep maintainers to keep up with the rules implemented by each pip version:
trusty: 1.5.4
xenial: 8.1.1
bionic: 9.0.1
pip upstream: 18.0.0

@dirk-thomas
Copy link
Member

I don't think that it would be hard to write (after the actual logic was identified). But it would introduce effort for the rosdep maintainers to keep up with the rules implemented by each pip version:

If each PIP version mangles differently which variation of the mangled package names would you propose to store in the rosdep database then?

@mathias-luedtke
Copy link
Contributor

If each PIP version mangles differently which variation of the mangled package names would you propose to store in the rosdep database then?

pip does not really mangle the name. There is one name that is used by pip internally (the one that is reported by pip show, as provided in this PR).
However, different pip version have different fallback strategies.
pip accepts the wheel name for install as well, but pip does not really relate the wheel name to the pip name in pip show and pip list.
This results in a situation in which rosdep works for installing pip packages, but fails to detect them on some/most systems.

I see two alternatives:

  1. if we use the canonocal name in the rosdep database, then the pip issues will be fixed for all users without need to upgrade, after rosdep update (without admin rights).
  2. We should at least store the case-sensitive pip name (which matches the name in setup.py) and could implement the character replacement (underscore->dash only?) in rosdep.

While the latter is consistent with setup.py, it still has at least one (minor) issue:
If a user resolve a rosdep key to a pip key (rosdep resolve), this resolved won't work with pip show for some pip versions.

@dirk-thomas
Copy link
Member

dirk-thomas commented Jul 26, 2018

pip does not really mangle the name. There is one name that is used by pip internally (the one that is reported by pip show, as provided in this PR).

A package named e.g. catkin_pkg in its setup.py is being mangled by pip (or anything in the pipeline until then) as it reports it as catkin-pkg in e.g. pip freeze. That is a transformation of the package name.

if we use the canonocal name in the rosdep database, then the pip issues will be fixed for all users without need to upgrade, after rosdep update (without admin rights).

Please use fixed terminology for the conversation. "Canonical" seems to be not agreed on - I defined it above as the name in the upstream source. In this context I think you mean the name reported by pip. If we can't get a consensus on the term we shouldn't use it at all to avoid confusion.

While the latter is consistent with setup.py, it still has at least one (minor) issue:
If a user resolve a rosdep key to a pip key (rosdep resolve), this resolved won't work with pip show for some pip versions.

A straight forward solution to this case would be to apply the transformation in the rosdep resolve logic then. The only thing I advocate for is to not change the input data since the same can be achieved by transforming the input data by the code which uses it.

@ipa-mdl Do you agree that if you can manually rewrite rosdep keys in the database [your option 1] that the very same effect can be achieved by keeping the rosdep database untouched and transforming the value in some code (wherever that logic would be) [your option 2]?

@mathias-luedtke
Copy link
Contributor

mathias-luedtke commented Jul 26, 2018

TL;DR: This PR should at least get refactored to fix the case of the pip entries. It would be great to find any solution for the "dash problem" as well.

Do you agree that if you can manually rewrite rosdep keys in the database [your option 1] that the very same effect can be achieved by keeping the rosdep database untouched and transforming the value in some code (wherever that logic would be) [your option 2]?

I don't agree. The manual process is case specific (-> for each package) and can be verified with pip show.
The automatic transformation must be generic and is harder to proof.
In addition the input data is harder to verify. (pypi.org resolves both versions, with dash and without)

A package named e.g. catkin_pkg in its setup.py is being mangled by pip (or anything in the pipeline until then) as it reports it as catkin-pkg in e.g. pip freeze. That is a transformation of the package name.

The rosdep database does not contain the Python/PyPi package name, but whatever is the internal name in the respective installer (e.g. pip). That's what rosdep was created for (?).

The only thing I advocate for is to not change the input data since the same can be achieved by transforming the input data by the code which uses it.

This would require every data user to know about the transformation which is buried in some code.

@dirk-thomas
Copy link
Member

This PR should at least get refactored to fix the case of the pip entries.

I absolutely agree with you on this part. 👍 e.g. adafruit-pca9685 should be changed to Adafruit_PCA9685 (actually putting the actual "upstream" package name into the rosdep database).

In addition the input data is harder to verify.

All of the following URLs resolve to a single page (the first one which is the name of the :

Same for catkin_pkg:

So updating the existing rosdep entries to the actual package name seems to be fairly straight forward and scriptable. Also for instructions it is easy to recommend using the name which the PyPI page actually resolves to aka the "upstream" package name.

@tfoote
Copy link
Member

tfoote commented Jul 26, 2018

At some level I understand the desire to use the declared name. But pip clearly prefers and basically changes strongly recommends to enforces that there's no underscores.

Some commands work with either the underscored version or the dashed version. But some only work with the dashed version.

And pip always outputs the dashed version in it's listings.

Underscore cannot be used for pip show

$ pip show catkin-pkg
---
Metadata-Version: 1.1
Name: catkin-pkg
Version: 0.4.6
Summary: catkin package library
Home-page: http://wiki.ros.org/catkin_pkg
Author: Dirk Thomas
Author-email: dthomas@osrfoundation.org
License: BSD
Location: /usr/lib/python2.7/dist-packages
Requires: argparse, docutils, python-dateutil, pyparsing
Classifiers:
  Programming Language :: Python
  License :: OSI Approved :: BSD License
Entry-points:
  [console_scripts]
  catkin_create_pkg = catkin_pkg.cli.create_pkg:main
  catkin_find_pkg = catkin_pkg.cli.find_pkg:main
  catkin_generate_changelog = catkin_pkg.cli.generate_changelog:main_catching_runtime_error
  catkin_package_version = catkin_pkg.cli.package_version:main
  catkin_prepare_release = catkin_pkg.cli.prepare_release:main
  catkin_tag_changelog = catkin_pkg.cli.tag_changelog:main
  catkin_test_changelog = catkin_pkg.cli.test_changelog:main
You are using pip version 8.1.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
$ pip show catkin_pkg
You are using pip version 8.1.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

pip freeze only reports the dashed version.

$ pip freeze | grep catkin
You are using pip version 8.1.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
catkin-pkg==0.4.6
catkin-pkg-modules==0.4.6
catkin-tools==0.4.4

I agree that it would be relatively easy to write the logic to mangle a rule for catkin_pkg into catkin-pkg inside rosdep before checking for the output. And it would also be relatively easy to make sure we always do that before using any other internal function to rosdep. But what that doesn't account for is other use cases where there's a person in the loop. If I'm looking at the database it it says there's a pip package 'catkin_pkg' I might check if it's installed manually by running pip freeze | grep catkin_pkg because I saw that rule, and it will find nothing. And the user must know that under the hood there's a translation going on. This is definitely not obvious nor is it easy to discover.

pip effectively enforces the PEP8 packages shouldn't use underscore and makes sure that there's only one representation. Since this is specifically the pip package manager and pip reports the packages using the dashed version, I'd suggest that we use the pip reported name as the canonical name for our rules. This will avoid any requirement to mangle the names, it will be completely clear what is going on when you install something. And the key will work for all pip commands

@wjwwood
Copy link
Member

wjwwood commented Jul 26, 2018

For me, the - isn't clearly better, because there are other cases where the - is not preferred over the upstream name, e.g.:

So if they're even close in terms of correctness, my preference would be for _, since you can never have 100% consistency with - as you cannot have that as the module name:

>>> import catkin-pkg
  File "<stdin>", line 1
    import catkin-pkg
                 ^
SyntaxError: invalid syntax

🤷‍♂️

@wjwwood
Copy link
Member

wjwwood commented Jul 26, 2018

If I'm looking at the database it it says there's a pip package 'catkin_pkg' I might check if it's installed manually by running pip freeze | grep catkin_pkg because I saw that rule, and it will find nothing. And the user must know that under the hood there's a translation going on. This is definitely not obvious nor is it easy to discover.

I agree that's not good, but maybe the right answer is to give hints for people (maybe when using rosdep from the command line a hint comes up that swapping _ and - might be required).

In the end, I feel that this is a problem that PyPI has invented (by choosing a convention that's contrary what the community is doing ergonomically), so I feel we're not to blame if someone confuses catkin_pkg with catkin-pkg.

@tfoote tfoote added the rosdep Issue/PR is for a rosdep key label Jan 16, 2019
@clalancette
Copy link
Contributor

At the risk of wading into something that I don't fully understand...

This has been open for 2 years, with no real movement in almost 2 years. The title doesn't really reflect what it does. There is a merge conflict. And there seems to be no agreement here. So I'm going to tentatively close this PR. If you feel that we should keep this open, please feel free to reopen.

@clalancette clalancette closed this Jun 2, 2020
@33thou
Copy link

33thou commented Oct 26, 2020

For what its worth, the dash version is called the "normalized name" as per this PEP:

There are many implementations of a Python package repository and many tools that consume them. Of these, the canonical implementation that defines what the "simple" repository API looks like is the implementation that powers PyPI. This document will specify that API, documenting what the correct behavior for any implementation of the simple repository API.
...
This PEP references the concept of a "normalized" project name. ... The name should be lowercased with all runs of the characters ., -, or _ replaced with a single - character

Also, interesting to notes that now it seems https://pypi.org/project/catkin_pkg/ redirects to https://pypi.org/project/catkin-pkg/

It is indeed unfortunate that the normalized name is not usable in an actual python import statement. I fell down this rabbit hole while trying to decide what to name my own packages: repos, directories, package name, etc.

@clalancette clalancette deleted the pip_underscore branch January 19, 2021 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rosdep Issue/PR is for a rosdep key
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants