Skip to content
This repository has been archived by the owner on Feb 3, 2018. It is now read-only.

Expand ProjectName concept to encompass URI and URL concepts #10

Closed
sdboyer opened this issue Apr 14, 2016 · 4 comments
Closed

Expand ProjectName concept to encompass URI and URL concepts #10

sdboyer opened this issue Apr 14, 2016 · 4 comments
Milestone

Comments

@sdboyer
Copy link
Owner

sdboyer commented Apr 14, 2016

Right now, ProjectName - which is assumed to be a universal-style import path - is a URI that is assumed to be independently sufficient for locating the underlying repository resource. Of course, that's a bad assumption...it's a URI, not a URL. Now, we have sort-of-passable algorithms that infer a URL from the URI, but that's not really the problem. Actually, there are a couple problems.

First and foremost, we have to allow the user to specify the URL. Inference can still be applied if the user doesn't supply the URL, of course. But in situations where that simple inference would be wrong (e.g., the inference will always try git over https, but you want/need to specify ssh), an override must be possible. Allowing such overrides, though, creates new problems:

  • Different projects in the depgraph could specify different but equivalent forms of the same URL (e.g., https vs. ssh access to a git(hub?) repository) for a given URI.
  • Different projects in the depgraph could specify different and non-equivalent URLs for a given URI.

The first problem is easily addressed: logic, like this, can be written that effectively normalizes all equivalent URLs into a single, canonical form.

The second problem is a bit trickier because of its adjacency to another issue: aliasing. First, though, what we do know is that on any given solver run, we encounter URI/URL combos describing projects. If, at any point, we re-encounter a URI and the new normalized URL is not the same as the existing normalized URL, then we have incompatible sources, which fails one of the solver's satisfiability conditions (aka, we'll enforce this in *solver.satisfiable()).

Conceptually, this means that all projects involved in the solution (root and non-root) have equal "say" on what a URI is - and they all must agree.

Aliasing - where we transparently swap in one repository source at another's URI/import path - needs to function differently: it should be global. That is, rather than being something on which all projects can express a viewpoint and the solver must maintain satisfiability, aliases are a transformation that are allowed by only one project (the root), and are applied globally.

Aliases (necessarily) supercede URL declarations. But, because they both deal with the URL property, the significant question is whether having an alias applied to a particular URI/ProjectName is reason to suspend the satisfiability condition that all participant projects must agree on the URL for that URI.

The easy answer is "yes, suspend that satisfiability condition" - because we know the URL is going to end up being the same for that URI, it doesn't matter if everyone agrees. The problem with that, though, is that it's really just masking a potentially significant disagreement within your depgraph, and reconciling disagreement is not the purpose of what I think of as an alias. Aliases are about changing a name, GIVEN agreement on what that name refers to. The practical effect of this would be that aliases would be used to force agreement on URLs...and we'd see absurd things like projects aliased to themselves in order to make things work, and of having aliases propagate up dependency chains.

Instead - and since we know we need a concept of overrides in general, anyway - I'd prefer to see that forcing of agreement done via an override. They also operate globally (and thus are used root-only), but their intent is more clear in the name.

@sdboyer sdboyer added this to the MVP milestone Apr 14, 2016
@sdboyer
Copy link
Owner Author

sdboyer commented Apr 25, 2016

I've reflected on this more...and what I initially outlined is probably mostly wrong. Will update when I return to it and figure out the better approach.

@sdboyer
Copy link
Owner Author

sdboyer commented Apr 26, 2016

So, seems like my initial thoughts were sorta-right, sorta-wrong. I jotted down some more things:

Import path is an expression of intent to rely on something that exists at that address - which must ultimately be a local filesystem address, but may also be a network address.

  • We start from the unambiguous statement, in the source, that that's a network URI of the desired code
    • There is a subcase here, however - some of these URIs will only ever exist on disk. That is, they aren't pointers to a retrievable network location. But they're also not in stdlib. We have to know how to ignore these.
  • There is a procedure by which we can go work from a full import path upwards, to discover the root of the project (which should correspond to a repository). This mostly just follows on go get's own magic:
    • Certain code hosts, like github, get magical support
    • Non-magical hosts have to append a .git/.bzr, etc. in order to indicate where the repo is rooted
    • Hosts can make themselves magical with the use of an HTML meta tag....uuuuughhhh
    • Relative import paths are allowed, they're a tad of a weird case, but not awful
    • Everything else is either stdlib, or it's something we can't work with
  • Now, we also may fulfill that import statement using a resource from a different URI. But doing so carries risks
    • A different URI typically means a different source repository, which potentially means a wholly different history, different branches and tags
    • We can only really assume that the manifest author wrote their constraints with the URI they gave in mind. Thus, it is unsafe and an error to allow a solve to continue if we encounter two manifests that say a given import path should be fulfilled by two different URIs. THERE CAN BE ONLY ONE URI.
      • (We need a satisfiability check that ensures all deps introduced by a new package have the same URI).
    • IF a non-standard URI is swapped in for the URI given by an import path, then we're making the assumption that whatever's being swapped in is already internally set up correctly to act as a swap. (That is: no import path rewriting should be necessary.)
    • Note - this may just be a bad idea in general, as it's a giant coordination problem begging to happen. How could everyone possibly agree, simultaneously, to switch URIs for a given pkg/project?
  • As long as a URI is established and agreed upon by all solver participants (be it the default or an alternate), then it is possible for the root project to declare an alias that should be used for a given project.
    • Aliases can be used to override stdlib, if desired
    • With an alias, there can be an expectation that import paths within the project may need to be rewritten internally
    • So really, there's two things here - physically placing a project into a directory that corresponds to the alias (and not its 'rightful' place); and rewriting any internal imports within the project to reflect its new location
      • ...rewriting gets really icky, really fast. for example, how does rewriting of aliases interact with static analysis in satisfiability checks? Might be best to just not allow it
  • Once the URI is established, we have to move on to the URL. Modifications here are allowed, but they have to ultimately point to the right URI.
    • Because we're enforcing that the URLs resolve to the same URI, it's OK to do things like colocate the different URLs under the same cache repo as different e.g. git remotes
    • Individual modifications declared in the manifest itself should mostly be about putting in appropriate credentials, etc. Probably shouldn't be too common
  • However, there's also a general rewriting system that we need to allow, so that people can do things like inject a proxy.
    • This is not subject to satisfiability validation, as it's not something which is under the control of any one project participant in the solver; rather, it needs to be an environmental setting

@sdboyer
Copy link
Owner Author

sdboyer commented May 4, 2016

#19 takes care of the basic local/network naming distinction. next up is global/root-level aliasing...though we can leave that out for a bit, as it's less urgent.

@sdboyer
Copy link
Owner Author

sdboyer commented Jul 19, 2016

This kinda just doesn't apply anymore. I don't have the energy to type up another analogous issue right now, but the issues here have migrated a bit since firming up static analysis and the notion of ProjectRoot.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant