Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split out a LinkCollector class from PackageFinder #6910

Merged
merged 4 commits into from
Sep 13, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 40 additions & 7 deletions docs/html/development/architecture/package-finding.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ file to download for a package, given a requirement:
is an HTML page of anchor links.
2. Collect together all of the links (e.g. by parsing the anchor links
from the HTML pages) and create ``Link`` objects from each of these.
The :ref:`LinkCollector <link-collector-class>` class is responsible
for both this step and the previous.
3. Determine which of the links are minimally relevant, using the
:ref:`LinkEvaluator <link-evaluator-class>` class. Create an
``InstallationCandidate`` object (aka candidate for install) for each
Expand All @@ -39,6 +41,7 @@ The remainder of this section is organized by documenting some of the
classes inside ``index.py``, in the following order:

* the main :ref:`PackageFinder <package-finder-class>` class,
* the :ref:`LinkCollector <link-collector-class>` class,
* the :ref:`LinkEvaluator <link-evaluator-class>` class,
* the :ref:`CandidateEvaluator <candidate-evaluator-class>` class,
* the :ref:`CandidatePreferences <candidate-preferences-class>` class, and
Expand Down Expand Up @@ -95,18 +98,47 @@ links.
One of ``PackageFinder``'s main top-level methods is
``find_best_candidate()``. This method does the following two things:

1. Calls its ``find_all_candidates()`` method, which reads and parses all the
index URL's provided by the user, constructs a :ref:`LinkEvaluator
<link-evaluator-class>` object to filter out some of those links, and then
returns a list of ``InstallationCandidates`` (aka candidates for install).
This corresponds to steps 1-3 of the :ref:`Overview <index-py-overview>`
above.
1. Calls its ``find_all_candidates()`` method, which gathers all
possible package links by reading and parsing the index URL's and
locations provided by the user (the :ref:`LinkCollector
<link-collector-class>` class's ``collect_links()`` method), constructs a
:ref:`LinkEvaluator <link-evaluator-class>` object to filter out some of
those links, and then returns a list of ``InstallationCandidates`` (aka
candidates for install). This corresponds to steps 1-3 of the
:ref:`Overview <index-py-overview>` above.
2. Constructs a ``CandidateEvaluator`` object and uses that to determine
the best candidate. It does this by calling the ``CandidateEvaluator``
class's ``compute_best_candidate()`` method on the return value of
``find_all_candidates()``. This corresponds to steps 4-5 of the Overview.


.. _link-collector-class:

The ``LinkCollector`` class
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this discussion was had before, but would it be better to keep these kinds of docs in the code itself? Then we could just include it here in any number of ways and I think it would be more likely to be kept up to date.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look! I don't really have a preference for how this type of documentation is done. The reason I added it in the first place is that there was a request from a number of people for me to document the various classes in a separate rst file, so I did that. And I was just updating it here. (I will note that one advantage of doing it directly in the rst is that it lets you use the various reStructuredText formatting, links, etc, whereas having that in a docstring would be a bit weird. I'm also not sure the extent to which it would carry over.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original discussion was here: #6787

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, it would look kind of strange to have to the rst in the docstring, given that we don't currently do that anywhere else.

***************************

The :ref:`LinkCollector <link-collector-class>` class is the class
responsible for collecting the raw list of "links" to package files
(represented as ``Link`` objects). An instance of the class accesses the
various `PEP 503`_ HTML "simple repository" pages, parses their HTML,
extracts the links from the anchor elements, and creates ``Link`` objects
from that information. The ``LinkCollector`` class is "unintelligent" in that
it doesn't do any evaluation of whether the links are relevant to the
original requirement; it just collects them.

The ``LinkCollector`` class takes into account the user's :ref:`--find-links
<--find-links>`, :ref:`--extra-index-url <--extra-index-url>`, and related
options when deciding which locations to collect links from. The class's main
method is the ``collect_links()`` method. The :ref:`PackageFinder
<package-finder-class>` class invokes this method as the first step of its
``find_all_candidates()`` method.

The ``LinkCollector`` class is the only class in the ``index.py`` module that
makes network requests and is the only class in the module that depends
directly on ``PipSession``, which stores pip's configuration options and
state for making requests.


.. _link-evaluator-class:

The ``LinkEvaluator`` class
Expand Down Expand Up @@ -191,7 +223,8 @@ The ``BestCandidateResult`` class
The ``BestCandidateResult`` class is a convenience "container" class that
encapsulates the result of finding the best candidate for a requirement.
(By "container" we mean an object that simply contains data and has no
business logic or state-changing methods of its own.)
business logic or state-changing methods of its own.) It stores not just the
final result but also intermediate values used to determine the result.

The class is the return type of both the ``CandidateEvaluator`` class's
``compute_best_candidate()`` method and the ``PackageFinder`` class's
Expand Down
Loading