Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cargo should reference git dependencies by commit hash instead of url #7497

Open
brenzi opened this issue Oct 9, 2019 · 12 comments
Open

Cargo should reference git dependencies by commit hash instead of url #7497

brenzi opened this issue Oct 9, 2019 · 12 comments
Labels
A-crate-dependencies Area: [dependencies] of any kind A-git Area: anything dealing with git C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.

Comments

@brenzi
Copy link

brenzi commented Oct 9, 2019

cargo thinks the following dependencies are different although they are exactly the same:

[dependencies]
a = {git = "https://github.com/aaa/bbb"}
a = {git = "https://github.com/aaa/bbb.git"}
a = {git = "https://github.com/aaa///bbb"}
a = {git = "https://github.com/aaa/bbb", rev ="0123456789abcdef"}
a = {git = "https://github.com/aaa/bbb", branch = "ccc"}
a = {git = "https://github.com/aaa/bbb", tag = "ddd"}
a = {git = "https://github.com/fff/bbb", tag = "ddd"}

assuming that branch ccc HEAD as well as tag ddd points to rev = "0123456789abcdef", being the HEAD of master. Moreover, fff/bbb being a fresh fork of repository bbbby user fff

I would suggest that all these references are being treated as equal because they are.

The fact that cargo doesn't know that all these dependencies are equal causes a lot of trouble:

The great thing about git is that a commit hash is a unique fingerprint of a repository. It doesn't matter on which branch, tag or even in which organization the reference is located. The commit hash is the same, so is the code.

actually, just writing

[dependencies]
a = { git-rev = "0123456789abcdef" }

might not be practical, but it would be unique across the entirety of all git repositories in existence. The only problem being to retrieve the corresponding code if you don't know the url, so I'm not suggesting this syntax. I just want to make the point that - behind the scenes - cargo could very well use a git commit hash without any url, branch or tag as an identifier for a specific version of a crate.

@brenzi brenzi added the C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` label Oct 9, 2019
@ehuss ehuss added A-git Area: anything dealing with git A-crate-dependencies Area: [dependencies] of any kind labels Oct 10, 2019
@alexcrichton
Copy link
Member

Thanks for the report! Unfortunately though this isn't a particularly actionable issue as-is and as-written is likely to basically be ignored for the rest of time. It's not really the most productive to off-hand propose large-scale changes to the foundations of Cargo (such as how git sources are handled). I agree there are bugs and improvements to be made with Cargo, but if you're curious to see actual change and see bugs fixed, proposing that Cargo completely rethinks how it handles git repositories is unlikely to do that. (this also ignores context for why Cargo does what it does today!)

@brenzi
Copy link
Author

brenzi commented Oct 17, 2019

@alexcrichton: I'm aware that my suggestion goes beyond a quick fix. I may also have neglected valid reasons to do it like cargo does it today.

However: May I ask for your opinion on the approach suggested from a design point of view rather than the likelihood of it being done?

@dingelish
Copy link

Hi @alexcrichton , we are suffering from this for a period of time. Our rust-sgx-sdk has been transferred to the apache organization and renamed twice. In a foreseeable future, there'll be at least two more renaming. It's painful to modify hundreds of forked crates for each of the renaming action. We really need it.

A temporary workaround is to simulate the HTTP 302 redirection to help git find the correct repo. The result is that we can have a URL like https://sdk.sgx.rs which 302 to https://github.com/apache/incubator-mesatee-sgx. But this requires the bypass of certificate verification (sdk.sgx.rs != github.com), which cannot be disabled by the built-in libgit2. Fortunately, we find that the git cli can bypass it. So the solution is to enable this in .cargo/config:

[net]
git-fetch-with-cli = true

@brenzi 's comment is very reasonable

I'm not sure a custom domain will resolve the underlying problem. It's just a workaround. I could live with changing the global setting but I don't think this is elegant because it might change other behaviour as well as a side effect

It'll be the best if I can get some comments from you on the following questions:
(1) Do you think the hash-based deduplication is do-able in cargo? If so, could we have an estimated timeline of it?
(2) Do you think the 302 solution is appropriate in this circumstance? or is there any better way to solve it?

Thank you very much!

@alexcrichton
Copy link
Member

I do not personally have time to prioritize and implement this at this time, but if you're willing to put in the work to design and implement this that'd be great!

@PiDelport
Copy link

Related: #7670

@derekdreery
Copy link
Contributor

derekdreery commented Jun 18, 2021

It looks like you might be able to do a quick fix by resolving all the different forms of git revs in SourceIdInner::canonical_url (in src/cargo/core/source/source_id.rs). I haven't figured out if this is feasible yet.

EDIT So this solves my case (repo?rev=xxxxxx#xxxxxx == repo#xxxxxx), but not the general case. The way to solve the general case is to reference git dependencies by rev hash rather than url. It also requires some kind of concept of underspecified git dep (so that during dep resolution cargo can merge git deps if possible). Quite a lot of work: I'll think about how it might look.

@derekdreery
Copy link
Contributor

Would a possible solution be to wait until feature resolution is complete, then compare all git revisions to see if they share hash (the hash is 20 bytes long and 2 ** 160 is a very very large number). This would be n! comparisons. The questions here are: could there possibly be collisions?

The thing is, I don't think there is any other way to know if two repos share code in general. They might have completely different names, so if you can't use the Oids (hashes) then I'm stumped.

@derekdreery
Copy link
Contributor

If we didn't want to go down the route of unifying all gits based on revision hashes, we could do it for some obvious cases, for example we could identify commits based on a (location, rev) pair, where location is some sort of canonicalised repo url, and rev is the hash of the revision/commit. This might be the best safest option. If users are pulling in 2 different repos from different places, can they really expect us to dedup them?

@derekdreery
Copy link
Contributor

Note:

https://github.com/aaa/bbb.git

is already unified with

https://github.com/aaa/bbb

(https://github.com/rust-lang/cargo/blob/master/src/cargo/util/canonical_url.rs#L50)

@derekdreery
Copy link
Contributor

derekdreery commented Jun 18, 2021

There are two solutions I see

  1. Download repos as you are constructing the SourceId, and always resolve the given branch/tag/rev to a specific revision. Use this when comparing packages for equality during building the dependency graph.
  2. Have a separate step after you've gathered all the SourceIds to add in the revs.

Both of these require downloading/accessing the git repo earlier I think. I don't think you should try to unify 2 different repositories, only revisions within a single repository. I also think the https://github.com/aaa///bbb case is pathological and we shouldn't try to support it.

What are others' thoughts? And if you like the idea can you point me in the direction of how to implement it, because I've been looking at the codebase and I can't figure out how the resolution bit works.

sunshowers added a commit to oxidecomputer/omicron that referenced this issue Dec 5, 2022
This is PR 3 of N in this series. For the earlier ones, see:

* oxidecomputer/crucible#552
* oxidecomputer/propolis#262

Currently, we specify the progenitor dependency as both:

```toml
progenitor-client = { git = "https://github.com/oxidecomputer/progenitor", branch = "main" }
```

And

```toml
progenitor-client = { git = "https://github.com/oxidecomputer/progenitor" }
```

Cargo (due to rust-lang/cargo#7497) doesn't
dedup these dependencies even
though they resolve to the same branch and hash. This causes issues with
`cargo doc` colliding with itself, as it tries to document both:

```
Documenting progenitor-impl v0.2.1-dev (https://github.com/oxidecomputer/progenitor#4b5cef4b)
Documenting progenitor-impl v0.2.1-dev (https://github.com/oxidecomputer/progenitor?branch=main#4b5cef4b)
```

Fix this by always using the branch name. (We choose this option over
not specifying the branch so to maintain uniformity in cases where a
non-default branch must be used.)

This also requires a crucible update, since that didn't specify a branch
either. And the crucible update necessitates a propolis update to get
the versions aligned.

In the future it would be nice to have a lint which ensures that git
dependencies always have the branch name in them.
@graydon
Copy link

graydon commented Jan 10, 2023

See also #10256

@epage
Copy link
Contributor

epage commented Nov 1, 2023

#6921 is almost a duplicate of this. The slight difference is that this includes unifying two dependency specs to the same dependency, even if tag vs branch was used. #6921 is focusing on the user problem of wanting to keep git dependency specs in-sync with each other.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-crate-dependencies Area: [dependencies] of any kind A-git Area: anything dealing with git C-feature-request Category: proposal for a feature. Before PR, ping rust-lang/cargo if this is not `Feature accepted` S-needs-design Status: Needs someone to work further on the design for the feature or fix. NOT YET accepted.
Projects
None yet
Development

No branches or pull requests

9 participants