Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing more external dependencies for stubs #90

Open
hauntsaninja opened this issue Feb 16, 2023 · 8 comments
Open

Allowing more external dependencies for stubs #90

hauntsaninja opened this issue Feb 16, 2023 · 8 comments

Comments

@hauntsaninja
Copy link
Contributor

hauntsaninja commented Feb 16, 2023

Currently, we're quite strict about what external dependencies stubs can use. We have a small allowlist here:

EXTERNAL_REQ_ALLOWLIST = {

Briefly, the reason for this is a) Python packages can execute arbitrary code at install time, b) type checkers sometimes install stub packages automatically, c) stub packages are quite popular, d) users likely expect stub packages to be inert

So what would it take to remove the allowlist? We already have an important additional check in stub uploader: we ensure that external dependencies of a stub package must be a dependency of the upstream package. However, there's still a hole, in that stub dependencies of stub packages are not currently checked.

To spell things out, this is the scenario that we're concerned about:

  • Package foo exists, that depends on bar
  • typeshed adds types-foo, add dependency on bar
  • In order to make types-foo useful, we need bar, so we do some security review of bar / the bar maintainer and add it to the allowlist
  • bar (gets hacked and) adds a dependency on evil
  • types-foo users are affected, while this isn't good we're okay with it, since those are assumed to be a subset of foo users
  • Someone sneaks past typeshed maintainers and adds a dependency on types-foo to types-requests (wouldn't be too hard to do)
  • types-requests users are affected, this is really bad

Once we plug this hole, we could maybe get rid of the allowlist or have much more relaxed criteria.

This has been discussed in a couple places, mainly #61 (comment). I'm writing this up here as a way to easily communicate the current status quo to typeshed contributors.

Plugging the hole is a pretty easy change to make to stub_uploader, see diff here:

diff --git a/stub_uploader/metadata.py b/stub_uploader/metadata.py
index 85d0643..56c5448 100644
--- a/stub_uploader/metadata.py
+++ b/stub_uploader/metadata.py
@@ -61,7 +61,7 @@ class Metadata:
     def requires_typeshed(self) -> list[Requirement]:
         reqs = self._unvalidated_requires_typeshed
         for req in reqs:
-            verify_typeshed_req(req)
+            verify_typeshed_req(req, self.upstream_distribution)
         return reqs
 
     @property
@@ -145,7 +145,7 @@ def strip_types_prefix(dependency: str) -> str:
     return dependency.removeprefix(TYPES_PREFIX)
 
 
-def verify_typeshed_req(req: Requirement) -> None:
+def verify_typeshed_req(req: Requirement, upstream_distribution: Optional[str]) -> None:
     if not req.name.startswith(TYPES_PREFIX):
         raise InvalidRequires(f"Expected dependency {req} to start with {TYPES_PREFIX}")
 
@@ -154,9 +154,26 @@ def verify_typeshed_req(req: Requirement) -> None:
             f"Expected dependency {req} to be uploaded from stub_uploader"
         )
 
-    # TODO: make sure that if a typeshed distribution depends on other typeshed stubs,
-    # the upstream depends on the upstreams corresponding to those stubs.
-    # See https://github.com/typeshed-internal/stub_uploader/pull/61#discussion_r979327370
+    if upstream_distribution is None:
+        raise InvalidRequires(
+            f"There is no upstream distribution on PyPI, so cannot verify {req}"
+        )
+
+    resp = requests.get(f"https://pypi.org/pypi/{upstream_distribution}/json")
+    if resp.status_code != 200:
+        raise InvalidRequires(
+            f"Expected dependency {req} to be accessible on PyPI, but got {resp.status_code}"
+        )
+
+    data = resp.json()
+
+    if strip_types_prefix(req.name) not in [
+        Requirement(r).name for r in (data["info"].get("requires_dist") or [])
+    ]:
+        raise InvalidRequires(
+            f"Expected dependency {strip_types_prefix(req.name)} to be listed in "
+            f"{upstream_distribution}'s requires_dist"
+        )

However, typeshed currently has twelve stubs that fail this test:

test_recursive_verify[Pygments] - stub_uploader.metadata.InvalidRequires: Expected dependency docutils to be listed in Pygments's requires_dist
test_recursive_verify[redis] - stub_uploader.metadata.InvalidRequires: Expected dependency pyOpenSSL to be listed in redis's requires_dist
test_recursive_verify[PyScreeze] - stub_uploader.metadata.InvalidRequires: Expected dependency Pillow to be listed in PyScreeze's requires_dist
test_recursive_verify[pyinstaller] - stub_uploader.metadata.InvalidRequires: Expected dependency docutils to be listed in setuptools's requires_dist
test_recursive_verify[pysftp] - stub_uploader.metadata.InvalidRequires: Expected dependency paramiko to be listed in pysftp's requires_dist
test_recursive_verify[slumber] - stub_uploader.metadata.InvalidRequires: Expected dependency requests to be listed in slumber's requires_dist
test_recursive_verify[python-xlib] - stub_uploader.metadata.InvalidRequires: Expected dependency Pillow to be listed in python-xlib's requires_dist
test_recursive_verify[JACK-Client] - stub_uploader.metadata.InvalidRequires: Expected dependency cffi to be listed in JACK-Client's requires_dist
test_recursive_verify[setuptools] - stub_uploader.metadata.InvalidRequires: Expected dependency docutils to be listed in setuptools's requires_dist
test_recursive_verify[PyAutoGUI] - stub_uploader.metadata.InvalidRequires: Expected dependency PyScreeze to be listed in PyAutoGUI's requires_dist
test_recursive_verify[tzlocal] - stub_uploader.metadata.InvalidRequires: Expected dependency pytz to be listed in tzlocal's requires_dist
test_recursive_verify[D3DShot] - stub_uploader.metadata.InvalidRequires: Expected dependency Pillow to be listed in D3DShot's requires_dist

So it may be that there's some reasonable improvement that could be made to the check that works for these twelve packages. Or maybe we can just require these specific stubs to have individualised exceptions committed to stub_uploader. Or maybe it's not super viable to plug this hole and we need to keep the allowlist forever.

(Also note that the implementation of the check in that diff^ isn't perfect, since it only works for projects that are on PyPI and have wheels, and only checks the dependencies of the latest version)

@Avasam
Copy link
Contributor

Avasam commented Feb 20, 2023

I feel like this is related. In python/typeshed#9511 I tried to use lxml-stubs, but can't because it's not explicitly mentioned as a dependency of openpyxl. Only et-xmlfile is (which itself is based on the is based upon the xmlfile module from lxml, but that's besides the point).

Seeing python/mypy#14737 makes me wanna request allowing those for stub_uploader as well, but that's only growing the list. (unless you wanna directly depend on, and trust, mypy's non_bundled_packages)


On the note of more relaxed criteria and catching more non-wheel declared dependencies without running arbitrary code, maybe #88 would help ?
It still won't catch all cases (including mine mentioned above), but I already know of at least two cases in typeshed were it would be beneficial.

@srittau
Copy link
Contributor

srittau commented Jul 15, 2024

Currently, EXTERNAL_REQ_ALLOWLIST is a free-for-all. I.e. once a package is on the allowlist, every other package can depend on it. It would be safer to specify exactly which stub packages can depend on which other packages. This would also allow us to be a bit looser in some cases. For example, the implications of types-obscure-pkg depending on other-obscure-pkg are lower than types-requests being able to depend on obscure-pkg.

We should also consider moved the allowlist out of the metadata.py file into a data file to separate code from data.

@srittau
Copy link
Contributor

srittau commented Jul 15, 2024

For example, the implications of types-obsure-pkg depending on other-obscure-pkg are lower than types-requests being able to depend on obscure-pkg.

Although there is the danger of a hacked maintainer account adding a dependency on types-obscure-pkg to types-requests. This case needs to be considered.

@srittau
Copy link
Contributor

srittau commented Jul 16, 2024

Okay, here is a suggestion. We allow:

  • A dependency on the upstream package if this pulls in required dependencies. This way we don't need to manage these dependencies ourselves.
  • A dependency on other typeshed packages if those only have (recursively) no outside dependencies.
  • An allowlist – per package – for the remaining cases. The allowlist of a typeshed package depending on another typeshed package with an allowlist, must include the items from the latter allowlist explicitly. Edit 3: If the latter package depends on its upstream package, this must also be allowlisted in the dependent package.

Edit 2: Also, on upload of a package with outside dependencies, we reverse check whether there are other typeshed packages that depend on this that don't have the required outside dependencies in their allowlist.

@AlexWaygood
Copy link
Contributor

A dependency on other typeshed packages if those only have (recursively) no outside dependencies.

To illustrate the implications of this proposal: currently 6 typeshed packages depend on types-requests:

  • types-caldav
  • types-docker
  • types-hvac
  • types-requests-oauthlib
  • types-slumber
  • types-tensorflow

All of these currently pull in urllib3 due to types-requests depending on urllib3, but only types-docker explicitly lists urllib3 as an external dependency in its METADATA.toml file.

To clarify the proposal here: would we still allow these typeshed packages to depend on types-requests if (and only if) they explicitly also listed urllib3 in their requirements, and we had urllib3 in their per-package allowlists at stub-uploader?

@srittau
Copy link
Contributor

srittau commented Jul 16, 2024

To clarify the proposal here: would we still allow these typeshed packages to depend on types-requests if (and only if) they explicitly also listed urllib3 in their requirements, and we had urllib3 in their per-package allowlists at stub-uploader?

At least the latter, yes. They don't necessarily need to list it in their METADATA requirements, as there's no necessity to make it into their dependency field in the package.

@AlexWaygood
Copy link
Contributor

I can see the attraction of making allowlists per-package. The way that urllib3 has become an indirect dependency of so many stubs packages shows how easy it is for packages to become significant attack vectors (without us realising) once they've been added to the allowlist. For the specific case of urllib3, the ecosystem is probably going to have much bigger problems than the security of typeshed packages if that package is compromised in some way -- but it's a useful example. And as you say, if allowlists are per-package, then adding an entry to the allowlist doesn't have nearly the same implications in terms of security.

It will probably mean more busywork for us to maintain per-package allowlists in a separate repo, but I think it's probably worth it.

@srittau
Copy link
Contributor

srittau commented Jul 16, 2024

We could also add back a general allowlist for very popular packages in addition to the per-package allowlist if this proves to be too much busywork at a later date. If we put the allowlist into an external file, we should keep that possibility in mind when deciding on a format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants