Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

pypidb #26

Closed
jayvdb opened this issue Mar 29, 2020 · 7 comments
Closed

pypidb #26

jayvdb opened this issue Mar 29, 2020 · 7 comments
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@jayvdb
Copy link

jayvdb commented Mar 29, 2020

If I understand correctly, this project would be where it would be useful to add https://github.com/jayvdb/pypidb , which can resolve the PyPI->SCM link very reliably.

I have planned to identify the SCM type (jayvdb/pypidb#29) which might be helpful here, but wouldnt be a blocker, I guess as filtering using URL prefixes is the existing strategy here and works well if only a few SCM are supported.

@fridex
Copy link
Contributor

fridex commented Mar 30, 2020

Hi @jayvdb!

thanks for reaching out! The project seems to be very interesting. However, it's not clear to me from the README file how the detection is performed. Does it follow PyPI's Homepage or other links present on PyPI?

@jayvdb
Copy link
Author

jayvdb commented Mar 30, 2020

Yes, it uses the URLs in the PyPI metadata, from the url fields in the JSON and also the text/markup fields in the JSON. It finds the best one, and validates it.

The selection process is augmented with a bunch of rules in https://github.com/jayvdb/pypidb/blob/master/pypidb/_rules.py which guide the engine through the available URLs, and also maps PyPI email addresses and package namespaces (e.g. zope.foo) to GitHub orgs or repositories if all URLs have been rejected. In addition there is the option to add commit sha per project, to capture URL changes in the source repository setup.py/pyproject.toml/etc which hasn't been released to PyPI yet, which means PRs in the source repo can be used immediately in pypidb and we dont need to push for the project to do a release which usually aggravates the maintainers.

The objective is always to use good URLs provided in the metadata, and validate them, and the email/namespace mappings are only a backup if the metadata doesnt provide good URLs.

@jayvdb
Copy link
Author

jayvdb commented Mar 30, 2020

One black box way to get a feel for it is to see one test dataset which is a sample of explicit mappings in https://github.com/jayvdb/pypidb/blob/master/tests/data.py , where

  1. "exact_*" vs "mismatch_*" is whether the URL is "similar" to the PyPI project name, where similarity is measured in _compute_similarity which is currently not well optimised, but it is cheaper than extra network traffic
  2. "*_fetched" vs "*_metadata" is whether fetches (maximum of 5) were required to arrive at the correct URL, or the URL decision only required the PyPI metadata.

@sesheta
Copy link
Member

sesheta commented Dec 9, 2021

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@sesheta sesheta added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 9, 2021
@sesheta
Copy link
Member

sesheta commented Jan 8, 2022

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten

@sesheta sesheta added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 8, 2022
@sesheta
Copy link
Member

sesheta commented Feb 7, 2022

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

@sesheta
Copy link
Member

sesheta commented Feb 7, 2022

@sesheta: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sesheta sesheta closed this as completed Feb 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants