-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
✨ Increase PyPI parsing flexibility #3423
Conversation
Note, you'll need to fix DCO on the commits you have already. Another repo has written a nice guide on fixing it here: I've enabled the rest of the CI to run, where the linter may pickup some things. I'll review the rest of the change when I have some time. |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #3423 +/- ##
==========================================
+ Coverage 63.80% 66.54% +2.73%
==========================================
Files 183 183
Lines 12951 12989 +38
==========================================
+ Hits 8263 8643 +380
+ Misses 4228 3861 -367
- Partials 460 485 +25 |
… hope its unique Signed-off-by: Josh Cogan <joshgc@google.com>
…t to support gitlab better Signed-off-by: Josh Cogan <joshgc@google.com>
…t check there is a single valid url Signed-off-by: Josh Cogan <joshgc@google.com>
Signed-off-by: Josh Cogan <joshgc@google.com>
Signed-off-by: Josh Cogan <joshgc@google.com>
|
These just look like capitalization issues. I think it's safe to normalize for your deduplication logic with
agree. unless we have a list of tags ("source code", "repository", etc) we prefer, before falling back to the "parse everything" approach.
Some sort of counting approach could help? Of course this may introduce issues for other repos.
|
…fix filter to remove .git Signed-off-by: Josh Cogan <joshgc@google.com>
Signed-off-by: Josh Cogan <joshgc@google.com>
Yes I made this case insensitive. Even without the ranked key look up we have x5 the hit rate! I can also add the list look up method too. I ended up having ~10 items in it before |
Signed-off-by: Josh Cogan <joshgc@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks, just the small question. And the PR will need to be marked ready for review.
Just need DCO fixed again for your very last commit |
Signed-off-by: Josh Cogan <joshgc@google.com>
Done. Sorry/Thanks!! |
Signed-off-by: Josh Cogan <joshgc@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
* Make PyPI parsing more flexible to find any github or gitlab url, and hope its unique Signed-off-by: Josh Cogan <joshgc@google.com> * Refactor the addRepo to not pass around a mutable object. Tweak a test to support gitlab better Signed-off-by: Josh Cogan <joshgc@google.com> * Ignore users called sponsors for github repos. Remove the set and just check there is a single valid url Signed-off-by: Josh Cogan <joshgc@google.com> * Remove unneeded variables and code Signed-off-by: Josh Cogan <joshgc@google.com> * Reducing indentation Signed-off-by: Josh Cogan <joshgc@google.com> * Make github url path parts case insensitive and use more explicit suffix filter to remove .git Signed-off-by: Josh Cogan <joshgc@google.com> * Appease the linter--may its wisdown never wane. Signed-off-by: Josh Cogan <joshgc@google.com> * CamelCase -> camelCase to prevent export Signed-off-by: Josh Cogan <joshgc@google.com> * Add test and allowance for gitlab to also be case insensitive Signed-off-by: Josh Cogan <joshgc@google.com> * hub vs lab typo Signed-off-by: Josh Cogan <joshgc@google.com> --------- Signed-off-by: Josh Cogan <joshgc@google.com> Signed-off-by: Allen Shearin <allen.p.shearin@gmail.com>
What kind of change does this PR introduce?
This is a likely a non-breaking change, but due to PyPI diversity I can't prove it doesn't break on any real-world package. I analyzed 396 packages (those I'm using in my current project). Of those only 63 could be correctly parsed by current parsing (before this PR) and I break none of them with this PR.
However after this PR, of the 396 packages 329 get parsed correctly..
Improvement opportunities:
What is the current behavior?
Only use ["Info"]["project_urls"]["Source"] to find a github URL
What is the new behavior (if this is a feature change)?**
Look through all the ["Info"]["project_urls"] and ["Info"]["project_url"] to find all possible repos. Fail if that number is anything but 1.
Which issue(s) this PR fixes
Fixes #3249
Special notes for your reviewer
None
Does this PR introduce a user-facing change?