Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple scala version builds within the same bazel run #962

Closed
1 of 6 tasks
smparkes opened this issue Jan 27, 2020 · 28 comments
Closed
1 of 6 tasks

Multiple scala version builds within the same bazel run #962

smparkes opened this issue Jan 27, 2020 · 28 comments

Comments

@smparkes
Copy link
Contributor

smparkes commented Jan 27, 2020

PRs

There are a number of PRs in flight. Because they interact, they're sometimes based on one another so a diff against master will show all changes up to that point. That means it's probably easiest to review from in order.

Core PRs: these are cleanup/refactoring PRs for core that are blockers for multiscala

Multiscala PRs: these actually implement multiscala support. They generally require the PRs above.

tl;dr

I'm creating a new issue for this.

Many exist and I'm happy to close this in favor of another but there are (see below) so many with some amount of overlap and various comment history, personally I prefer to start with a clean slate.

I'm actively working on this.

I'd like to get buy-in early that we want to do this and on my current approach. If there is buy-in, I'd like to get feedback as I go. There are lots of choices and I'd love to get feedback/reviews early so
there's not a lot of rework.

If there's not sufficient consensus, I'm happy to close this (we don't need yet another issue) and I'll do the work independently.

Requirements:

  • Build/test targets against multiple versions of scala in the same bazel invocation
  • Minimal user overhead/incompatibility if they are using a single scala version
  • Maintain current API compatibility
  • Opt-in until stable

Stretch goals:

  • Don't require users to enumerate, manually or with starlark, targets, e.g., one scala_library call will, by default, build (suffixed) targets (optional alias for default scala version)
  • Some support for 2.13, e.g., support version-specific source code variants

Key implementation choices

  • Use configured toolchains rather than label dependencies
  • Shared configuration available at loading phase via a repository constructed via a repository rule

Dev Approach:

  • Small, incremental PRs including the fewest changes as long as possible. None break tests.

Example:

  • Will be in unstable/multiscala/private/example

Related Issues

It's hoped that these issues are likely to be addressed, completely or in majority, in this work.

#14: Support different versions of scala
#80: Cross building
#147: repeating downloads of scala library
#173: How to make rules_scala use scala-2.10?
#250: enable_dependency_analyzer attr is a feature toggle that needs to be removed at an opportune time
#252: use new inliner flag in 2.12.3
#263: Reasonable scala
#275: scalatest 3.x.x support?
#302: 'io_bazel_rules_scala/dependency/thrift/scrooge_core' not declared in package 'external'
#393: Project with multiple scala versions
#400: create a scala_test toolchain
#401: create a specs2 toolchain
#402: create a scrooge toolchain
#403: create a scala protobuf toolchain
#432: Insulate toolchain options from breaking rules_scala bootstrap code
#546: scala_maven_import_external and scala_version
#679: Dependency checkers do not detect scala macro dependencies
#703: Default Scala version should be 2.12 (instead of 2.11)
#713: Upstream toolchain fixes mean we can remove some crud around the scalac provider
#768: no matching toolchains found for types @io_bazel_rules_scala//scala_proto:toolchain_type
#809: Support Scala 2.13
#940: Tracking issue- move from deps to toolchains

Status

I've done a good portion of this work once but wasn't happy with the result. Among other things, it wasn't sufficiently "opt-in". I'm planning on taking a more incremental approach and focus on making all the extensions opt-in to start. Also want to get feedback on choices early on.

FAQ

Why not wait/use bazel platforms/toolchains/transisions?

The fundamental toolchain work is entirely usable. There may be some issues with toolchain resolution but they can be worked around.

The bazel concepts that potentially overlap with this are platforms and transitions. There are at least af few broad issues: time, fit and functionality.

Time: That work is not complete and looks to be pretty early, still. It's been on the roadmap for a while but there are lots of open issues. For example, AFAICT, they haven't resolved how file paths are going to be handled. I suspect this work is going to take a fair amount of time and my team can't wait. The current approach should not conflict with any of that work. Migration to that work when it's complete remains to be clarified (next).

Fit: It's not really possible to judge how that work is going to fit this use case since so much is yet to be defined. Most of the relevant work in that effort is around different target platforms but scala versions are not exactly different target platforms: from a bazel perspective, our platform is the jvm, not the scala runtime. I haven't seen any examples extending platforms in bazel that match the scala use case.

Functionality: it's not actually clear how we would use this support for scala versions. Transitions in bazel are applied at the rule level, so it's unclear how we would use them when we generally want to vary the version on the target level. It's not clear how to map the scala pattern of version suffixes in artifact names would work with the work going on in bazel.

rules_jvm_external?

This is obviously debatable. We use rules_jvm_external extensively, including artifact pinning. This has been very useful in tracking common dependence versions and noting conflicts. I'm sort of hoping we can standardize on this but I'm not sure what open issues may exist.

@ittaiz
Copy link
Member

ittaiz commented Jan 28, 2020

  1. I want this capability.
  2. I appreciate the level of detail and up front info, it will help us in our collaboration.
  3. I’ll respond tonight or tomorrow the latest

@andyscott
Copy link
Contributor

This is a great issue! I won't have guaranteed time to read through and respond to this until the end of this week or early next week.

@ittaiz
Copy link
Member

ittaiz commented Jan 29, 2020

@katre I'd love your thoughts on this as I know you're always on the lookout for good/bad usages of platforms and toolchains as well as cases where people avoid it. Interested if @smparkes's analysis is correct in your opinion.

@katre
Copy link
Member

katre commented Jan 29, 2020

My first goal is always to help the community: if pushing for toolchains doesn't help actual users of rules_scala, it's the wrong approach and I won't do it. I want to help you write the most useful rules you can.

One approach we initially used internally for Java rules was to embed language/runtime level as constraints and create platforms that contain those. This has turned out to be the wrong choice: there are too many overlapping platforms and it's too confusing, so I'm working on how to migrate the system to a place where there are separate language and runtime flags. I think that will happen is that the toolchain targets will point to aliases for the actual java_toolchain targets, checking the flags to see which specific java_toolchain should be used. Or possibly the implementation of java_toolchain will check the flags directly (or a third option, like selects in the java_toolchain's attributes). Eventually we're going to want to specify the language or runtime version directly on targets via an attribute, which would just be a transition.

This is pretty long-winded, and I don't know if it's useful to rules_scala. My initial thought would be that a split transition would be the right way to have a single target that generates multiple versions, but I don't know how that would work through the process of depending on libraries.

Unfortunately I don't follow enough through this issue (or through PR #964) to understand exactly what you're proposing to implement. I do want to support the choices that you are making as rules authors and make it as easy as possible.

@ittaiz
Copy link
Member

ittaiz commented Jan 29, 2020

Thanks!
@smparkes do you have any concrete questions for @katre or do you feel you understand what you should and shouldn’t use right now?

@smparkes
Copy link
Contributor Author

Typing as we speak :-)

@ittaiz
Copy link
Member

ittaiz commented Jan 29, 2020

Re rules_jvm_external- are you planning to make it part of the contract or an implementation choice of rules_scala? Right now rules_scala uses bind (and I’d like us to move to toolchains) to allow indirection and independence for users to specify the needed dependencies.

We (Wix) have our own mechanism which has significant benefits compared to bazel-deps and rules_jvm_external for our use case. We need to be able to keep using it.

The main reason I’m concerned is that as opposed to “regular” features this one is looking at the core of dependencies and rules_scala.

I’ll take a look later at your PR but if you can comment that could really help quell my concerns.

@smparkes
Copy link
Contributor Author

Thanks, @katre. I for one (and I'm sure there are others) appreciate the help. I know there are significant gaps still in my understanding of how things fit together best.

If you have time to comment:

There's more description of the user-level use case in README

AFAICT, for our use case we need to create what are in essence parallel subgraphs. At this point I think we need to use macros to wrap rules/targets, e.g., scala_library which can call scala_library_rule multiple times for each target, where each target, e.g., my-library-2.11 and my-library-2.12 is configured to depend on its version-specific dep targets in its version-specific subgraph.

I think that will happen is that the toolchain targets will point to aliases for the actual java_toolchain targets, checking the flags to see which specific java_toolchain should be used.

Not sure if this is what you're thinking, but in the in-progress POC, I look at arguments to the macro and use that to synthesize the label for the toolchains to pass to the rule to create each target.

I'm okay with this though I don't like the fact that it is an end-around toolchain resolution. It's based on label string manipulation of toolchains which avoids the type-based resolution in toolchains.

One alternative it to create toolchain types for each version and then create different rules for every toolchain type. This is still label string manipulation, but it's at the type level and we don't expect/support changing toolchain types. That would be less restrictive than not letting users define their own toolchains of the standard toolchain types. It requires different toolchain types for each version but could be resolved with toolchain resolution.

Even without version-specific toolchains, we can support user-defined toolchains, we just can't provide multiscala support out of the box in that case. It's a limitation but it's a corner-case. Users could still make things work, it would just require more work on their part, e.g., adding their own macros. I think this falls under "make easy things simple and hard things possible."

I'm pretty sure that both (single toolchain types and version-specific toolchain types) approaches will work but I'm not sure which feels more in the spirit of the design. In some ways multiple toolchain types, e.g., per-version, scalac-2.11 type, scalac-2.12 type, feels wrong, that that's what a scalac-type should represent: in almost every case, there's doing to be a single toolchain for each version. On the other hand, while it's not terribly common, there are different scalac compilers for the same version and so in the corner case, there are potentially multiple implementations of a scalac-2.11 type.

On the good side, since all this is behind a supplied macro, we could change the approach without users needing to change build files.

I've tried to work out how transitions could be used and I haven't gotten too far ... but my intuition about transitions is pretty weak. Since we need scala_library to generate multiple targets, would we create a target that depends on a set of version-dependent targets? But then it doesn't really exist itself ... there's nothing to build for the non-version-specific target. And we don't actually want any omnibus targets: we don't want any real target that depends on multiple versions: in particular, we need to make sure my-binary-2.11 depends on my-library-2.11 but not my-library-2.12. Virtual targets might be okay but I don't think I've actually seen anything like that in bazel examples. Is it a thing?

Eventually we're going to want to specify the language or runtime version directly on targets via an attribute, which would just be a transition.

IIUC, I looked at this to start and didn't make much progress, but again, we're at the edge of my understanding when we talk about transitions. This is partially because I don't think we want a target that depends out to multiple versions, we want parallel disconnected subgraphs. They only "meet" by target wildcard, e,g., command line ":all"s.

I think I win the long-winded contest. I appreciate anyone reading through all of this. I wouldn't be surprised if it's confusing, so let me know if I can answer. But also don't feel compelled to school me ...

Cheers!

@smparkes
Copy link
Contributor Author

Re rules_jvm_external- are you planning to make it part of the contract or an implementation choice of rules_scala?

I mainly added that text to spur this discussion :-) I figure I don't really understand the space of use cases. I need to learn (though it feels like a big space ...)

I’ll take a look later at your PR but if you can comment that could really help quell my concerns

It's absolutely not a blocker. We can make things work. It takes more effort, which is fine. I just wanted to be sure it was a requirement. Now I know. :-)

I use rules_jvm_external in WIP POC but

  1. It's still very early so it doesn't actually do the right thing and the example doesn't use non-compiler external dependencies
  2. Uses bind to stay compatible with the stable code

This part of the WIP is not meant to be definitive.

Right now rules_scala uses bind (and I’d like us to move to toolchains) to allow indirection and independence for users to specify the needed dependencies.

I think we definitely need to get rid of bind when determining the scala version to compile against. I don't think there's any way to get around that.

But that does just kick the can down the road: how does the toolchain depend on the jars?

This is a subcategory of the overall dependence issue, i.e.,

  • toolchain-rooted dependencies
  • user-rooted dependencies

These don't necessarily have to be the same. I can imagine that the latter is harder to constrain that the former. Do you all need to control how the toolchain-rooted dependencies are loaded? I'm just wondering. I suspect if we need to support multiple loading mechanisms for user code, we might as well for compiler code, too.

The core issue is how do we enable a macro to do version-specific dependency label manipulation, e.g., how do I depend on com.fasterxml.jackson.module:jackson_module-scala_*? There are at least two issues here:

  1. I really don't want to have to enumerate rule calls for all versions of the target. This implies a macro needs to be able to change the label for each scala dependence as it runs rules to create targets. Actually, now that I think of this, it would be possible to do this will toolchains, too, but would imply you move all your deps to the toolchain, including the user-defined deps. Were you thinking that at all? I don't suspect it is? I can't really imagine how that would be a good thing (but I haven't really tried ...)
  2. The format of the label varies radically depending on which import mechanism you're using

I have two observations, but this remains a WIP:

  1. I hate boilerplate and I hate manually mangling maven coordinates in build files. I want to get as close as I can to a dependence of {org}:{artifact}:{version} at load time and {org}:{version} at macro call time. Macros then add the version suffix in both cases (while staying at this abstract level, prior to the artifact > label mapping.
  2. We can configure/use helpers that use loader-specific information to transform coordinate to loader-specific labels

An alternative to the second choice is to use bind to map every external dependency to a single, standard label in the external namespace, independent of the real label so independent of the loader. This requires adding bind generation code for each loader which I'm not crazy about but makes the dependence-time transform uniform. I'm also not sure how hard it is to do the helpers I mentioned above. The two approaches get implemented at different points in the overall bazel lifecycle and one may be be easier to implement. This is such a well-understood and well-constrained use of bind, I'm less against it than I might be otherwise. That said, I do think it's the less desirable choice.

@smparkes
Copy link
Contributor Author

@or-shachar
Copy link
Contributor

Hey @smparkes . Or from Wix trying to catch up on the amazing work you did here.

So I just did a very shallow reading and I also cloned your branch and looked at the diffs.

I hope I understood sufficiently.

In general - I understand why you want to remove the bind and move to toolchains. It seems like it's a good candidate for a PR of it's own. And I think that we should strive to allow users to configure those toolchains manually (not necessarily macro generated, using rules_jvm_external).

I see all your points regarding on how to translate external scala deps to the right version. I need to keep thinking about it... I had a lot of naive thoughts but it seems like your insights are more advanced at this point.

Also - I wonder how this would behave in the IDE (with single / multiple scala versions).

Will do some research and hopefully post some more questions / insights throughout the weekend.

Cheers! 🌷

@smparkes
Copy link
Contributor Author

Or from Wix trying to catch up on the amazing work you did here.

👋

In general - I understand why you want to remove the bind and move to toolchains. It seems like it's a good candidate for a PR of it's own.

Agreed. I actually started working on an independent PR to factor out/cleanup the stuff related to toolchains. I can add the binding into that easily. I'll have a draft of that this morning.

Also - I wonder how this would behave in the IDE (with single / multiple scala versions).

I haven't explored this. It's a good question. We use IJ but I haven't looked at how IJ hooks into bazel to know what this affects.

Will do some research and hopefully post some more questions / insights throughout the weekend.

Thanks!

@smparkes
Copy link
Contributor Author

Also - I wonder how this would behave in the IDE (with single / multiple scala versions).

FWIW, ran into this today. This is a little bit challenging for multiscala but I'm pretty sure it's feasible. I understand the concept of attr._scala_toolchain.

I'm not sure how the plugin works. I assume some combination of parsing build files and using the bazel query command ...

@katre
Copy link
Member

katre commented Jan 31, 2020

AFAICT, for our use case we need to create what are in essence parallel subgraphs. At this point I think we need to use macros to wrap rules/targets, e.g., scala_library which can call scala_library_rule multiple times for each target, where each target, e.g., my-library-2.11 and my-library-2.12 is configured to depend on its version-specific dep targets in its version-specific subgraph.

There are, basically, two approaches here. One is to use macros (or clever rules) to take the single target graph and split it into the parallel subgraphs, one per version. The other option is to use starlark configuration with a split transition:

scala_library(
name = "top",
scala_versions = [ "1.11", "1.12", "2.0", ],
srcs = ...
deps = [ ":dep_a", ...],
)

scala_library(name = "dep_a", ...)

With the split transition, "top", "dep_a" (and other dependencies) can be built multiple times, once per scala_version, with the graphs automatically fanning out as needed. This can be confusing with queries, however, since the targets will be shown multiple times with different configs. Also, this will cause deeper non-scala targets to be duplicated (although actions will still be cached), until we can get configuration trimming to work.

Ultimately, which approach to take depends on the needs of your users. We're happy to consult and assist whichever way you want to go.

@smparkes
Copy link
Contributor Author

smparkes commented Jan 31, 2020 via email

@katre
Copy link
Member

katre commented Jan 31, 2020

You can do self-transitions as well as dependency transitions. So it looks like:

top (target config)
top (config with scala_version=1.11)
top (config with scala_version=1.12)
top (config with scala_version=2.0)
dep_a (config with scala_version=1.11)
dep_a (config with scala_version=1.12)
dep_a (config with scala_version=2.0)
etc etc

@katre
Copy link
Member

katre commented Jan 31, 2020

The targets will keep the same name, but outputs end up in separate output roots under blaze-out, one per configuration.

@ittaiz
Copy link
Member

ittaiz commented Feb 2, 2020

Hi @smparkes,
I'm sorry to say I can’t mentor this effort.
It is complex and big in scope and to do the job well with it I need more capacity than I have.
Since this is a significant change I suggest devs from companies which care about this (@andyscott , @ianoc , @borkaehw, @liucijus from Wix) review these PRs and reach an agreement.
I can say that Vaidas, Wix’s focal point, will start diving into this tomorrow but due to other constraints I estimate he’ll be able to be 100% on this in 2 weeks and help drive it through until completion.

@smparkes
Copy link
Contributor Author

smparkes commented Feb 2, 2020

Understood. Thank you.

@andyscott
Copy link
Contributor

I'm pretty interested in seeing how a solution using transitions would work. It seems like it'd be a cutting edge solution, and consequently something worth exploring but not necessarily guaranteed mergeable to rules_scala. That being said, I'd be happy to review these efforts and follow along during development.

For context, supporting multiple versions of Scala is not a priority for Stripe. It is a nice-to-have as it would potentially help future Scala migrations. I'd be participating out of my own interests simply because I believe this feature makes rules_scala more appealing for building large Scala projects.

@smparkes
Copy link
Contributor Author

smparkes commented Feb 5, 2020

cc @katre and others:

My original idea on how the toolchains attr worked was wrong.

Followed up in https://groups.google.com/d/msg/bazel-discuss/aDABPYoywWs/glGBA5XSAwAJ

@liucijus
Copy link
Collaborator

Thanks @smparkes for extensive deep dive into the problem and implementation.

I want to share a few items I found essential for us here at Wix:

  1. After going over the work done, macros seem to be the right pragmatic choice as long as they provide a stable interface for potential future refactoring into transitions. I think it can unblock us now, and research toward transition based implementation can be done separately.

  2. At Wix we have optimized loaders for our use-case, and migrating to rules_jvm_external is not possible still as it is missing some features and may cause a performance penalty for us. Also, if we decide to migrate our codebase to another type of loading, it will take significant effort/time to complete it. For such scale migration, it would be great to have a stable decision regarding how rules_scala supports loading.
    Overall it looks like a good idea to defer the decision regarding deps loading for as long as possible and unblock the rest of the effort.
    If you could rebase/squash current PRs on top of recent master, it would help us to do some internal testing to understand how much we are affected.

  3. Another constraint comes from the tools. The success of Bazel adoption is strongly tied to existing tooling, especially Intellij IDEA support. Unfortunately, its model is very limited, and it's easy to structure projects in ways not supported by the plugin. I think having the ability to specify which version of Scala to use in IDEA could be a solution (build flags or a similar way). Having a way to use the same macros as single Scala implementation would be a good idea also for a regular use case (instead of having implementation split in the code).

@stefanobaghino
Copy link

Thank you so much for working on this. Is there some way in which I can be of help without overlapping with existing work?

@smparkes
Copy link
Contributor Author

smparkes commented Apr 7, 2020

At this point, for us, this is on hold. We've had to switch to other efforts for the time being. So for now, I have no pointers.

For what it's worth, for the short term, we've gone the direction that others have mentioned of using a symlink to configure the bazel build (primarily WORKSPACE, a bzl file that defines the parameters to rules_jvm_external and the rules_jvm_external pin file) We change that link at the beginning of our CI/CD scripts and use different/separate build phases for different artifacts. It's less than ideal but is working for us right now.

We'll revaluate when we get more time. It's unclear to us right now if we can allocate the time to do a clean approach that can accommodate the complexity of all the variations out there and, if we can't, that becomes a blocker to a merge. If we don't think we can make something acceptable to merging and as a result wouldt have to maintain a fork, the approach likely changes because the priority changes from something that is clean to something that is least likely to incur merge conflicts going forward.

@smparkes
Copy link
Contributor Author

At this point, there's not enough need/support for this (including myself) to continue forward.

@ittaiz
Copy link
Member

ittaiz commented May 15, 2020

I’m sorry we were unable to converge here and I want to thank you for all your effort.

I hope that we will tackle this issue when the bazel infra will be much more friendly to this

@smparkes
Copy link
Contributor Author

No worries at all. Hope my wording didn't sound negative. Didn't mean it to be. I agree with the outcome at this point in time.

@ittaiz
Copy link
Member

ittaiz commented May 15, 2020

👍🏽 your wording had a minor effect (if any). It’s mostly my desire to have this rule set be as inviting as possible and when an effort doesn’t converge I feel a bit bad about it.
Good luck and 🤞🏽 that this feature will land in the (not distant) future

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants