Split out a LinkCollector class from PackageFinder #6910

cjerdonek · 2019-08-23T05:33:43Z

This PR splits out from the PackageFinder class a new LinkCollector class responsible solely for gathering all the links, but without doing any filtering of them. This reduces the PackageFinder class down to ~360 lines and makes it so the rest of PackageFinder doesn't have a dependency on PipSession or making networking requests (just filtering and sorting strings).

The code in this PR should be logically equivalent to what was there before (so in particular no behavior change).

This PR makes progress e.g. on issue #6430 because the LinkCollector class's main collect_links() method provides the "output all possible links" functionality discussed in that issue. The PR also updates the index.py architecture section created in PR #6787 to reflect the new class.

A follow-up PR can move LinkCollector and the index.py functions it depends on to a separate module.

src/pip/_internal/index.py

cjerdonek · 2019-08-23T07:45:30Z

PR updated to use a setter, as @uranusjr requested. (Thanks for taking a look, @uranusjr!)

pradyunsg

I've not looked at the tests yet and haven't 100% reviewed this code. This PR is just slightly too big to review comfortably for me. Or... I'm just tired after writing 2 exams today. 🙃

Anyway, posting my review eagerly because shorter feedback loops are better.

src/pip/_internal/index.py

pradyunsg · 2019-08-23T10:52:08Z

src/pip/_internal/index.py

+    # (1) links from file locations,
+    # (2) links from find_links, and
+    # (3) a dict mapping HTML page url to links from that page.
+    CollectedLinks = Tuple[List[Link], List[Link], Dict[str, List[Link]]]


A good follow up might be to convert this to a TypedDict, so that this value has some hint in the key names, about what the different values are for.

Or a namedtuple, you mean?

+1 for namedtuple (and I prefer it over typeddict)

cjerdonek · 2019-08-23T14:00:26Z

This PR is just slightly too big to review comfortably for me.

Anyway, posting my review eagerly because shorter feedback loops are better.

@pradyunsg Thanks! I can try breaking this up into two or three PR's so the number of lines will be smaller in each. Many of the lines are due to moving methods with no change in content, which gives the appearance of more happening than there really is.

cjerdonek · 2019-08-23T14:40:43Z

Okay, I created PR #6913 to do separately, which will make this PR a lot smaller afterwards.

Move some PackageFinder methods (preparation for PR #6910)

pradyunsg · 2019-08-24T02:49:58Z

Thanks for breaking this up! Even doing separated commits would've worked and this works well too. 🙃

Many of the lines are due to moving methods with no change in content, which gives the appearance of more happening than there really is.

Yea, it's difficult to "see" what's a functionally equivalent change vs what's a moved method.

cjerdonek · 2019-08-24T02:56:57Z

No problem!

Yea, it's difficult to "see" what's a functionally equivalent change vs what's a moved method.

I agree! I'm always more than happy to break commits and/or PR's up for you or others if it helps make it easier to review. So don't ever hesitate to ask..

BrownTruck · 2019-08-24T06:00:03Z

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will be eligible for code review and hopefully merging!

cjerdonek · 2019-09-10T09:02:06Z

Okay, this PR is ready to review again. The main hold-up was that I wanted to add at least one non-trivial unit test of the LinkCollector class's main collect_links() method. I did that, and it's a unit test (no network requests) since I mocked _get_html_response() for it, which is the underlying function responsible for making PackageFinder's network requests.

130 lines of this PR are for that unit test, and another 50 lines are for updating the package-finding.rst architecture document, so the PR isn't as big as it seems. The main change is to PackageFinder.find_all_candidates(), along with moving PackageFinder._get_pages() to LinkCollector.

I also addressed the previous review comments.

cjerdonek · 2019-09-10T09:06:46Z

Also, with this PR, LinkCollector is just 86 lines, and PackageFinder is down to 367 lines.

cjerdonek · 2019-09-10T17:11:14Z

I just did some commit history hygiene so it's easier for people to see what's happening in the commit that adds the LinkCollector class, FYI.

chrahunt

Looks good! Just 2 minor comments.

chrahunt · 2019-09-10T23:27:55Z

tests/unit/test_index.py

+    given names, a link whose URL has a base name matching that name.
+    """
+    for name in names:
+        for link in links:


I would do something like

assert any( link.url.endswith(name) for link in links ), message_from_exception_below

I wouldn’t use assert here, Pytest can emit preplexing errors for asserts in functions. Personally I’d write a custom assertion, but Chris’s version would work well enough for me.

Maybe a middle of the road alternative would be satisfactory:

for name in names: if not any(link.url.endswith(name) for link in links): raise RuntimeError(...)

Pytest can emit perplexing errors for asserts in functions

Do you have an example? I've used asserts in helper functions, classes, and fixtures and haven't noticed any issues so far.

Thanks for the feedback, both of you. I've updated the PR. I didn't feel strongly either way here, so I just went with @chrahunt's original suggestion. I do agree with @uranusjr that pytest often includes extra info that looks confusing and isn't especially helpful, e.g. the following in this case:

E assert False E + where False = any(<generator object check_links_include.<locals>.<genexpr> at 0x106edb518>)

but because we're including a custom error message, we can just ignore those other portions and just look at the custom message.

chrahunt · 2019-09-10T23:36:16Z

docs/html/development/architecture/package-finding.rst

 2. Constructs a ``CandidateEvaluator`` object and uses that to determine
   the best candidate. It does this by calling the ``CandidateEvaluator``
   class's ``compute_best_candidate()`` method on the return value of
   ``find_all_candidates()``. This corresponds to steps 4-5 of the Overview.


+.. _link-collector-class:
+
+The ``LinkCollector`` class


I don't know if this discussion was had before, but would it be better to keep these kinds of docs in the code itself? Then we could just include it here in any number of ways and I think it would be more likely to be kept up to date.

Thanks for taking a look! I don't really have a preference for how this type of documentation is done. The reason I added it in the first place is that there was a request from a number of people for me to document the various classes in a separate rst file, so I did that. And I was just updating it here. (I will note that one advantage of doing it directly in the rst is that it lets you use the various reStructuredText formatting, links, etc, whereas having that in a docstring would be a bit weird. I'm also not sure the extent to which it would carry over.)

The original discussion was here: #6787

That's true, it would look kind of strange to have to the rst in the docstring, given that we don't currently do that anywhere else.

cjerdonek added C: finder PackageFinder and index related code skip news Does not need a NEWS file entry (eg: trivial changes) type: refactor Refactoring code labels Aug 23, 2019

cjerdonek force-pushed the link-collector branch 2 times, most recently from f9120d3 to b9a24b2 Compare August 23, 2019 05:55

cjerdonek mentioned this pull request Aug 23, 2019

Command to know what file would be downloaded for a requirement #6430

Closed

uranusjr reviewed Aug 23, 2019

View reviewed changes

src/pip/_internal/index.py Show resolved Hide resolved

cjerdonek force-pushed the link-collector branch from b9a24b2 to d2af421 Compare August 23, 2019 07:44

pradyunsg reviewed Aug 23, 2019

View reviewed changes

cjerdonek mentioned this pull request Aug 23, 2019

Move some PackageFinder methods (preparation for PR #6910) #6913

Merged

cjerdonek added a commit that referenced this pull request Aug 24, 2019

Merge pull request #6913 from cjerdonek/link-collector-1

2985efe

Move some PackageFinder methods (preparation for PR #6910)

BrownTruck added the needs rebase or merge PR has conflicts with current master label Aug 24, 2019

cjerdonek force-pushed the link-collector branch from d2af421 to 263ce4b Compare September 10, 2019 08:22

pypa-bot removed the needs rebase or merge PR has conflicts with current master label Sep 10, 2019

cjerdonek force-pushed the link-collector branch 4 times, most recently from adccd7c to 96fd8c1 Compare September 10, 2019 08:53

cjerdonek force-pushed the link-collector branch 2 times, most recently from e65a3e5 to 2464559 Compare September 10, 2019 16:57

Rename PackageFinder's _package_versions() to evaluate_links().

9ae5f1a

Add LinkCollector class to index.py.

ed55cde

cjerdonek force-pushed the link-collector branch from 2464559 to 6e2ad4e Compare September 10, 2019 17:13

chrahunt reviewed Sep 10, 2019

View reviewed changes

cjerdonek added 2 commits September 12, 2019 02:31

Add a couple tests.

12a27d0

Update architecture/package-finding.rst.

ca4fc9e

cjerdonek force-pushed the link-collector branch from 6e2ad4e to ca4fc9e Compare September 12, 2019 09:32

cjerdonek closed this Sep 12, 2019

cjerdonek reopened this Sep 12, 2019

chrahunt approved these changes Sep 12, 2019

View reviewed changes

cjerdonek merged commit 084d797 into pypa:master Sep 13, 2019

cjerdonek deleted the link-collector branch September 13, 2019 15:50

lock bot added the auto-locked Outdated issues that have been locked by automation label Oct 13, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 13, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split out a LinkCollector class from PackageFinder #6910

Split out a LinkCollector class from PackageFinder #6910

cjerdonek commented Aug 23, 2019 •

edited

Loading

cjerdonek commented Aug 23, 2019

pradyunsg left a comment

pradyunsg Aug 23, 2019

cjerdonek Aug 23, 2019

uranusjr Aug 29, 2019

cjerdonek commented Aug 23, 2019 •

edited

Loading

cjerdonek commented Aug 23, 2019

pradyunsg commented Aug 24, 2019

cjerdonek commented Aug 24, 2019

BrownTruck commented Aug 24, 2019

cjerdonek commented Sep 10, 2019 •

edited

Loading

cjerdonek commented Sep 10, 2019

cjerdonek commented Sep 10, 2019

chrahunt left a comment

chrahunt Sep 10, 2019

uranusjr Sep 11, 2019

chrahunt Sep 11, 2019

cjerdonek Sep 12, 2019

chrahunt Sep 10, 2019

cjerdonek Sep 11, 2019

cjerdonek Sep 11, 2019

chrahunt Sep 12, 2019

Split out a LinkCollector class from PackageFinder #6910

Split out a LinkCollector class from PackageFinder #6910

Conversation

cjerdonek commented Aug 23, 2019 • edited Loading

cjerdonek commented Aug 23, 2019

pradyunsg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjerdonek commented Aug 23, 2019 • edited Loading

cjerdonek commented Aug 23, 2019

pradyunsg commented Aug 24, 2019

cjerdonek commented Aug 24, 2019

BrownTruck commented Aug 24, 2019

cjerdonek commented Sep 10, 2019 • edited Loading

cjerdonek commented Sep 10, 2019

cjerdonek commented Sep 10, 2019

chrahunt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjerdonek commented Aug 23, 2019 •

edited

Loading

cjerdonek commented Aug 23, 2019 •

edited

Loading

cjerdonek commented Sep 10, 2019 •

edited

Loading