Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate non-authoritative information in the registry Publish API #14492

Open
kornelski opened this issue Sep 4, 2024 · 5 comments
Open

Deprecate non-authoritative information in the registry Publish API #14492

kornelski opened this issue Sep 4, 2024 · 5 comments
Labels
A-documenting-cargo-itself Area: Cargo's documentation C-bug Category: bug S-triage Status: This issue is waiting on initial triage.

Comments

@kornelski
Copy link
Contributor

kornelski commented Sep 4, 2024

Problem

The registry protocol takes a JSON with metadata when publishing crates: https://doc.rust-lang.org/cargo/reference/registry-web-api.html#publish

This metadata is meant to contain a summary of Cargo.toml metadata. The problem is that nothing stops rogue clients from sending different metadata that doesn't match the Cargo.toml. If the registry uses this JSON, and doesn't validate Cargo.toml in the tarball, this can be used to bypass policy checks done at publishing time, e.g. sneak in dependencies from other registries that wouldn't be allowed otherwise.

Fortunately, crates.io has already fixed this vulnerability: rust-lang/crates.io#7238

The only potentially useful not-entirely-redundant field in this metadata was the readme, but that has been fixed in Cargo: #5911

Since a robust implementation of a registry has to extract the Cargo.toml and validate it anyway, the redundant metadata in the publish call is just a footgun. It's not an authoritative source of data. Validating that it matches Cargo.toml is just an unnecessary work for a registry, since doing that requires obtaining the authoritative data from Cargo.toml anyway, and at that point it's easiest to just ignore the uploaded JSON. Generating and sending redundant data that should be ignored is also wasteful for the clients. The data is also in a different format than Cargo.toml with bug-prone quirks like explicit_name_in_toml that has its logic backwards compared to the package field in the manifest.

Possible Solution(s)

For backwards compatibility with other registries Cargo will probably have to continue sending this JSON for a while, but at least the documentation can be updated to instruct registry implementers to ignore the submitted JSON and read the Cargo.toml from the tarball instead (there are crates for this, so it's easy).

https://doc.rust-lang.org/cargo/reference/registry-web-api.html#publish

@kornelski kornelski added C-bug Category: bug S-triage Status: This issue is waiting on initial triage. labels Sep 4, 2024
@epage epage added the A-documenting-cargo-itself Area: Cargo's documentation label Sep 4, 2024
@epage
Copy link
Contributor

epage commented Sep 4, 2024

Might be good for us to also provide conversion functions in a low level crate between the relevant types. This would require pulling our IndexPackage out from cargo into cargo-util-schemas (which we would like to do along with all of our other schemas, just need someone to do it).

@kornelski
Copy link
Contributor Author

kornelski commented Sep 4, 2024

Helping with conversion would suggest that's legitimate useful data. I don't think anybody should be using it, because it's a bug or vulnerability waiting to happen. I've discovered it by messing up data in my own registry implementation, and realized I've wasted so much effort on generating and parsing data that I must not use, and which has zero value.

I'd change the documentation to tell registry implementers to skip over the JSON blob entirely, and not even bother decoding it. After some deprecation time and/or some future edition Cargo could send 0-length field there.

@epage
Copy link
Contributor

epage commented Sep 4, 2024

Helping with conversion would suggest that's legitimate useful data. I don't think anybody should be using it, because it's a bug or vulnerability waiting to happen. I've discovered it by messing up data in my own registry implementation, and realized I've wasted so much effort on generating and parsing data that I must not use, and which has zero value.

Sorry, I meant conversion from TomlManifest to IndexPackage which is what we are asking them to do with the proposed documentation update and involves more work than NewCrate to IndexPackage (even though that has some gotchas).

@kornelski
Copy link
Contributor Author

kornelski commented Sep 5, 2024

edit: nevermind, registry URLs are in Cargo.toml too.

@Eh2406
Copy link
Contributor

Eh2406 commented Sep 5, 2024

I agree that we should maintain a library for converting from a crate file to the index row. Certainly by the time we've done that, if not before, the documentation should recommend reading the crate file directly.

If your registry is not written in Rust, then the official crate doesn't do you much good. Similarly, if militias upload is not part of your threat model, then the metadata blob can be trusted. Not including this in your threat model is reasonable for private registries, if the attacker can publish malicious code to the registry there is much worse they can do. If both apply, the metadata is a reasonable choice over trying to reimplement the crate file to index row logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-documenting-cargo-itself Area: Cargo's documentation C-bug Category: bug S-triage Status: This issue is waiting on initial triage.
Projects
None yet
Development

No branches or pull requests

3 participants