Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pypidb #674

Closed
jayvdb opened this issue Mar 29, 2020 · 15 comments
Closed

pypidb #674

jayvdb opened this issue Mar 29, 2020 · 15 comments

Comments

@jayvdb
Copy link

jayvdb commented Mar 29, 2020

Hi,

It looks like this project is almost complete, which is great to see.
A heads up I am using the fedora.json as a dataset for a list of Fedora python packages to use in testing of https://github.com/jayvdb/pypidb , which locates the SCM for PyPI packages. The methodology is to not rely on the links provided in the PyPI metadata, and it also doesnt include explicit mappings which would be brittle.

There are some packages I couldnt/didnt map to PyPI names in https://github.com/jayvdb/pypidb/blob/ba7740e/tests/datasets.py#L46 , and I am only currently looking at packages prefixed python[23]-* and py*. There is a mapping of Fedora names to PyPI names directly above that, which may be incomplete.

The list of those PyPI packages in Fedora that I do not yet find an SCM for is at https://github.com/jayvdb/pypidb/blob/master/tests/test_fedora.py#L50 . I suspect some of those will be missing mappings from Fedora names to PyPI names, which the team here might be able to spot quickly.

The resulting mappings to SCM might be helpful for this project in various ways, e.g.

  • doing additional analysis (an area I would love to go into is checking whether downstream projects are running CI, esp on Python 3.8),
  • remediation downstream (the obvious is updating setup.py/setup.cfg/etc to emit Python 3.x metadata), and
  • linking directly to the SCM for packages which havent been ported yet.
  • also checking whether .spec are using current URLs (the logic in pypidb is very often identifying readthedocs websites, but that isnt exposed via the API yet; c.f. Split homepage, scm and repo links jayvdb/pypidb#29).
@encukou
Copy link
Member

encukou commented Mar 30, 2020

Yes, this project is winding down; with <150 packages remaining it's getting to the point where having a database is unnecessary. And I already update it less and less often.

I'm not sure what you are asking here. Do you want help mapping Fedora packages to SCM URLs, or Fedora packages to PyPI packages? Or do you want some help with one of the possible projects you mention?

@jayvdb
Copy link
Author

jayvdb commented Mar 30, 2020

Hi @encukou ,

Help mapping Fedora packages to PyPI packages would be very helpful, if the team here are able to use their knowledge or tools to address that, and find it beneficial to have that mapping.

Besides that, I created the issue to see if there is interest here in using pypidb to improve the linkage from Fedora packages to the source repositories for any reason that might be on the roadmap for Fedora. While this project is nearing completion, it surely has spawned other side-projects for improving Fedora Python that would also require systematic tasks that this project has facilitated.

@encukou
Copy link
Member

encukou commented Apr 1, 2020

Help mapping Fedora packages to PyPI packages would be very helpful, if the team here are able to use their knowledge or tools to address that, and find it beneficial to have that mapping.

Binary RPMs provide python3dist(NAME), where NAME is the name the project would have if uploaded to PyPI. That's the most reliable way to get the name.
Of course, not every project is uploaded to PyPI.

side-projects for improving Fedora Python that would also require systematic tasks that this project has facilitated.

Currently don't really have a task where a mapping to PyPI names or SCM URLs would help.
Usually you need the SCM when making a pull request upstream, but finding the repo is trivial compared to understanding the code, reading contribution guidelines, etc.

@jayvdb
Copy link
Author

jayvdb commented Apr 1, 2020

Binary RPMs provide python3dist(NAME), where NAME is the name the project would have if uploaded to PyPI. That's the most reliable way to get the name.

Seems you havent looked at the list I provided yet, and dont seem to be interested, so I'll close it.

@jayvdb jayvdb closed this as completed Apr 1, 2020
@encukou
Copy link
Member

encukou commented Apr 2, 2020

Seems you havent looked at the list I provided yet

Do you mean https://github.com/jayvdb/pypidb/blob/ba7740e/tests/datasets.py#L46 and https://github.com/jayvdb/pypidb/blob/master/tests/test_fedora.py#L50 ?
I've looked at them, but I don't know what I should do with them. Use them? Help maintain them? Generate a better version? (I don't know how they're currently generated.)

dont seem to be interested

Linking PyPI names or SCM URLs doesn't sound useful to me, currently. But I'll keep pypidb in mind if I find a project where it's needed.

@jayvdb
Copy link
Author

jayvdb commented Apr 2, 2020

Help maintain them? Generate a better version? (I don't know how they're currently generated.)

They are manually maintained because they are names which cant be automatically matched using the semi-automated matching algorithms in get_pypi_name, or the three fully-automated matching algorithms in https://github.com/jayvdb/pypidb/blob/0baa777/tests/test_fedora.py#L124 .

What would be really great is if Fedora undertook a project to rename their packages to match PyPI name so there is a consistent naming convention, or even only the subset of cases where the Fedora name clashes with a different PyPI project with the same name, which is very confusing.

Also working with upstream projects to have them published on PyPI. I havent been doing that with openSUSE projects which were on the equivalent openSUSE list, and most upstream projects are happy to publish onto PyPI but often would like someone to help fix/test the setup.py changes needed.

@encukou
Copy link
Member

encukou commented Apr 2, 2020

Thanks for explaining! I see more clearly what you're trying to say. Even though my answer doesn't change, I can at least explain a bit more.

Both projects you mention would be nice. I'll certainly encourage packagers to use predictable component names. But I can't commit to drive a project to rename/publish them all. Especially the renaming would be a much project project than it might seem.

Some name clashes come with backwards compatibility issues. Keep in mind that Fedora is older than PyPI.
I don't think the two namespaces can realistically be joined; there'll always be some exceptions.

Two examples to think about:

  • Should the Ansible RPM be named python-ansible? I don't think it should. Same for all other tools – text editors, command-line helpers, anything that's not primarily an importable library.
  • The python-ldap Fedora package is python-ldap on PyPI. Should it be released as ldap on PyPI? (It would be nice but it turns out pip can't handle such a name change when updating.)

While systems with ad-hoc rules and exceptions might be useful to get general overviews (like portingdb), it would be hard to build on top of them in the long term, and it'll be hard to maintain them. (portingdb sure is pretty bad, but at least it's going away after Python 2 is gone.)
So instead of guessing names, automated tools around Fedora can use python3dist(NAME) virtual provides. We're working to make that reliable and useful, because it can become a solid standard to base other things on.

@jayvdb
Copy link
Author

jayvdb commented Apr 3, 2020

@encukou , one of the automated matching algorithms I use is adding/removing python- prefixes. The package names I linked to are not such simple cases.

Redhat is older than PyPI, but Fedora isnt :P

While systems with ad-hoc rules and exceptions might be useful to get general overviews (like portingdb), it would be hard to build on top of them in the long term, and it'll be hard to maintain them.

This highlights another way this can help. After portingdb, where might I obtain a good list of Fedora Python package names. I originally fetched all rpm specs and obtained the names from that, however that is a huge download for the purposes of obtaining a list of 3000 names.

I have built the list of adhoc data about Fedora package name mismatches, however that is not where I expect it to end, and this issue is part of trying to plot a path forward. But before finding where to maintain the data, the data needs to be checked. You can see in my adhoc data lists I often mention where the Fedora .spec has incorrectly used a %define pypi_name .. or similar bad data in the spec files. Obviously fixing those will mean it would be possible to maintain mismatches in the .spec files, which might be the best way to maintain them - it would be easy for rpm macros or rpmlint to verify that the pypi name is correct, or the spec explicitly informs the checker that it doesnt have a pypi name.

Where packages cant be published on PyPI in a reasonable timeframe, Fedora might like to create dummy packages on PyPI with appropriate names, to prevent joe-jobs. This is the rationale I used with great effect to encourage package maintainers to get onto PyPI, as the adverse security implications are quite understandable and motivating.

So instead of guessing names, automated tools around Fedora can use python3dist(NAME) virtual provides. We're working to make that reliable and useful, because it can become a solid standard to base other things on.

So we do have shared objectives. All of the names on my list will almost certainly be wrong data emitted by python3dist(NAME). I've already provided a list of very high probability problems that need further analysis - or, python3dist(NAME) will identify a few cases where my mapping data is missing entries.

@encukou
Copy link
Member

encukou commented Apr 3, 2020

After portingdb, where might I obtain a good list of Fedora Python package names. I originally fetched all rpm specs and obtained the names from that, however that is a huge download for the purposes of obtaining a list of 3000 names.

That depends on what you consider a Python package. Something that is written entirely/mostly in Python? Has some Python script? Depends on Python directly/transitively? Installs an importable Python module? Has PyPA metadata? Needs Python always/optionally to build/run?

One of these commands might do the job:

$ dnf repoquery --repo rawhide --whatprovides '/usr/lib*/python*/site-packages/*'
$ dnf repoquery --repo rawhide --whatprovides 'python3dist(*'

You can see in my adhoc data lists I often mention where the Fedora .spec has incorrectly used a %define pypi_name .. or similar bad data in the spec files.

For collectd_systemd, evic, hwdata and possibly others: While pypi_name is a badly chosen name in this case, it's not actually used to generate python3dist(NAME). That's taken from actual metadata (e.g. setup.py).


But you're right that the python3dist name should match the PyPI package, or be blocked on PyPI. Otherwise we'll get mismatches between what the RPM and pip metadata means.
@hroncok, do you have an opinion? IMO this should be at least a SHOULD in the new packaging guidelines.

@hroncok
Copy link
Member

hroncok commented Apr 3, 2020

Huh. There is absolutely no way for the packaging automation to know the Python package name is not registered on PyPI or that it is clashing with another software.

What do you propose the guidelines should say? Something like:

A packager should always check if the generated python3dist(...) name corresponds to the appropriate PyPI package and <do X> if that's not the case.

Where do X can be something like:

  • talk to upstream about hosting the project on PyPI or at least namesquat the name there
  • talk to upstream to rename their package if the PyPI package is a different software
  • disable the provides generator if this conflicts with another Fedora package

Also note that when the RPM is installed, pip considers the package installed as well regardless of whether it actually is the same as on PyPI.

@encukou
Copy link
Member

encukou commented Apr 3, 2020

A packager should always check if the generated python3dist(...) name corresponds to the appropriate PyPI package and if that's not the case.

Yes. I was thinking about the first two Xs.

Also note that when the RPM is installed, pip considers the package installed as well regardless of whether it actually is the same as on PyPI.

Yes, and that's a problem: the names in setup.py will mean different packages for pip and for Fedora.

@hroncok
Copy link
Member

hroncok commented Apr 3, 2020

https://docs.fedoraproject.org/en-US/packaging-guidelines/Python/#_automatic_provides_with_a_standardized_name

When building a Python package, RPM looks for .dist-info and .egg-info files or directories in the %files sections of all packages. If one or more are found, RPM parses them to find the standardized name (i.e. dist name, name on PyPI) of the packaged software, and then automatically creates two Provides: tags in the following format:

Provides: python3.Ydist(CANONICAL_STANDARDIZED_NAME)
Provides: python3dist(CANONICAL_STANDARDIZED_NAME)

The 3.Y is the Python version used (usually 3.6 and higher), and between the parentheses is the name of the software in a canonical format used by Python tools and services such as setuptools, pip and PyPI. The canonical name is obtained by switching the standardized name to lower case and converting all runs of non-alphanumeric characters to single “-” characters. Example: “The $$$ Tree” becomes “the-tree”.

New paragraph:

The name is derived from the Python distribution package name (e.g. the name argument of the setup() function in setup.py). It means, it does not necessarily correspond with the provided module name (how the package is imported) -- for example, the djangorestframework Python package would provide python3dist(djangorestframework) even when imported via import rest_framework. For packages from PyPI, this is the same name as used there. Packages hosted somewhere else sometimes may have names clashing with different packages from PyPI or names missing from PyPI entirely. In that case, packagers SHOULD contact upstream in order to resolve this situation (by adding the package to PyPI and renaming it if necessary).


PS Should we reopen this or take it elsewhere?

PS2 I've noticed the guidelines also say "Using a fictional module named 'example', the subpackage containing the Python 3 version must provide python3-example." Which obviously is not followed at all (see pkg_resources, rest_framework, etc...).

@hroncok
Copy link
Member

hroncok commented Apr 3, 2020

I've noticed the guidelines also say "Using a fictional module named 'example', the subpackage containing the Python 3 version must provide python3-example." Which obviously is not followed at all (see pkg_resources, rest_framework, etc...).

https://pagure.io/packaging-committee/issue/965

@jayvdb
Copy link
Author

jayvdb commented Apr 4, 2020

I am primarily interested in importable Python modules, as they are dependencies that other packages may need.

$ dnf repoquery --repo rawhide --whatprovides '/usr/lib*/python*/site-packages/*'

Perfect; thanks.

fwiw, the openSUSE naming policy does enforce the PyPI name, but with some exceptions which might help guide any Fedora policy around certain possible problems, but I find it doesnt answer the hair naming issues like . vs -, and makes exceptions that are problematic such as jupyter kernels which are often dependencies, if only in test suites of other packages. Worth a quick read.
https://en.opensuse.org/openSUSE:Packaging_Python#Naming_policy

@encukou
Copy link
Member

encukou commented Apr 30, 2020

Hello again! We had some private discussions on topics like this. Sorry for silence.
Today we proposed a draft of new Python Packaging Guidelines for Fedora, which will try to synchronize the "PyPI name" between PyPI and Fedora (python3dist(...))
https://hackmd.io/XzJe-sHUQvWK7cSrEH_aKg?both#PyPI-parity
We plan that Python dependencies will be specified using that, rather than importable module names.

As for dots vs. dashes, those can be put into a canonical form, which PyPI uses, using an algorithm described in PEP 503. All automated tools should convert to that form.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants