External Repositories

Summary

This is a WIP summary of approaches to external dependencies. The text isn't structured at this point. It's a raw capture of current understanding so we can collaborate on the issues and approaches. When we close on that, we can turn this into real user documentation.

Every scala build is going to have external dependencies. There are toolchain dependencies like the scala compiler and runtime library. There are dependencies used by the rules_scala infrastructure like commons-io in the scalac driver. There are dependencies in user code like com.fasterxml.jackson.core.

Some of these dependencies are java and so can be imported fairly straightforwardly.

Scala dependencies are more complicated in a multiscala environment. Now in each dependency is potentially a different dependency. For example, abstractly scalatest is a single dependency. In a multiscala environment, this can actually mean multiple jars, e.g., org.scalatest:scalatest_2_11 and org.scalatest:scalatest_2_12. We do not want any manual pre-version repetition.

There are also multiple mechanisms for downloading external dependencies. All these mechanisms end up downloading the desired contents into one or more external bazel repositories.

The mapping from dependency to label depends on the loader used: e.g., jvm.bzl, rules_jvm_external, etc. Each of these mechanisms creates labels in incompatible ways. Moreover, in many cases, the exact labels used also depends on arguments provided by the user.

Mandating a particular loading mechanism and with particular arguments would provide the ability to uniquely map scala version to label but this restriction is not acceptable.

There appear to be two options:

use pairs of helper macros to produce and consume structured labels
use bind to create labels in a canonical pattern

Both of these approaches would rely on a canonical representation of a dependency. Presumably this would be something like {org}:{artifact}:{version}. At rule time version would be dropped, e.g., {org}:{artifact}. The scala version is not included in the canonical representation since it's assume to follow the common scala pattern. (This is actually a problem for the org.scala-lang artifacts that don't follow the standard pattern.)

Using helper macros

In essence, for each loader type (and with the user being able to extend), a pair of macros would be defined, each of which takes the canonical format. The loader macro would translate the canonical coordinate to an external repo request, e.g.,

"commons-io:commons-io:2.6"

could translate to a macro the calls, approximately,

    _scala_maven_import_external(
        name = "scalac_rules_commons_io",
        artifact = "commons-io:commons-io:2.6",
        artifact_sha256 = "f877d304660ac2a142f3865badfc971dec7ed73c747c7f8d5d2f5139ca736513",
        licenses = ["notice"],
        server_urls = maven_servers,
    )

or one that calls

maven_install(
    artifacts = [
        "commons-io:commons-io:2.6",
    ],
    repositories = [
        # Private repositories are supported through HTTP Basic auth
        "http://username:password@localhost:8081/artifactory/my-repository",
        "https://jcenter.bintray.com/",
        "https://maven.google.com",
        "https://repo1.maven.org/maven2",
    ],
)

These aren't literal but accurate in direction.

In the case of a scala dep, they must make loader calls for each coordinate for each version. (FWIW, bazel is smart enough to only download the versions you ask for at run time ... but will download all if you use bazel fetch.)

The rule-time reference takes the coordinate and returns the label the loader produces. In the cases above, these would be @scalac_rules_common_io//jar and @maven//:commons_io_commons_io.

In both cases the macros needs to know whether the jar is scala-versioned and react accordingly.

This all seems very doable and can, to some extent, be built into the library so it doesn't have to be reflected in every build file.

Since loading and rule-time reference are so far apart in the structure of bazel, I believe the only way to customize this behavior is by injecting the user configuration into the synthetically-created configuration repository but this is not particularly objectionable.

The bigger concern is if multiple loading techniques are used, e.g., some jvm.bzl and some rules_jvm_external. It'd be possible to do this but the simplest approach would require each build file to know which mechanism was used for which dependencies. It's possible to imagine keeping track of this in the configuration repo but that's a bit scary ...

Using `bind`

A potentially straightforward alternative a approach is to simply require bind whatever label is created by the loader to a canonical label in the external namespace. This separates loading mechanism from consumption mechanism. Although it's assumed we'd want to automate the loading of repos, that would not be necessary and ad hoc corner-cases could be handled relatively easily, more easily than with the previous mechanism. It's also amenable to using multiple loading mechanisms without having to reflect the loader chose in build files.

It's assumed that the external path is canonically paired with the maven coordinate, e.g., commons-io:commons-io would end up at //external:maven/commons-io/commons-io.

Which to use?

I'm not sure. I don't know if we need to support mixed loaders in a single workspace. If we did, I think I'd tend to lean towards bind.

bind is considered bad in many cases for good reason. Where patterns aren't strongly enforced and/or where it's not actually adding any value, it's a significant increase in complexity. However, here we'd be using a very strict target pattern which, among other things, would make understanding meaning and search for references straightforward.

The alternative, at this point, in the face of multiple mechanism of having to reflect the mechanism in each build I find a significant complexity for build file writers.

The alternative of keeping a map of the necessary information in the configuration workspace is potentially viable but I haven't really investigated it.

Open issues

Handling different versions across different targets

Do we need this? I suspect we might, for instance if the version of commons-io we need for building the toolchain is different than the one a user wants. I think the answer to this is the idea of a scope. rules_jvm_external has this and it's just part of the name in jvm.bzl. This is easy to handle at load time but we have to figure out how to reflect it in rule dependencies since it's not reflected in the normal maven coordinate.

Handling different shas for tools that want to pass the shas inline

rules_jvm_external doesn't use this (it puts the shas in a separate out-of-band file in a way that shouldn't affect this work). Other tools like jvm.bzl do. Maybe we factor our the shas into a dict and add them at repo call time.

Handling multiple references to the same object

This is essentially the if not native.existing_rule issue and I still don't have a handle on what happens (or should happen) when you have reconvergence: where two paths want the same dependency and give it he same label but spec different versions, e.g., protobufs. IIUC, it's possible you could get non-deterministic builds because I think the results would depend on execution order of loads which I think can run concurrently and therefore non-deterministicly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExternalRepositories.md

ExternalRepositories.md

External Repositories

Summary

Using helper macros

Using `bind`

Which to use?

Open issues

Handling different versions across different targets

Handling different shas for tools that want to pass the shas inline

Handling multiple references to the same object

Files

ExternalRepositories.md

Latest commit

History

ExternalRepositories.md

File metadata and controls

External Repositories

Summary

Using helper macros

Using bind

Which to use?

Open issues

Handling different versions across different targets

Handling different shas for tools that want to pass the shas inline

Handling multiple references to the same object

Using `bind`