Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend the way Mill invokes Zinc to enable benefits from remote caching #2153

Open
Baccata opened this issue Nov 28, 2022 · 5 comments
Open

Comments

@Baccata
Copy link
Contributor

Baccata commented Nov 28, 2022

From what I gather, Zinc is currently invoked by mill in a way that prevents the compilation cache to be shared across machines (the fact that mill doesn't expose any utility tasks to do so is irrelevant to this issue). The following happens:

  • start with by cloning the same mill project in two distinct locations A and B (either on the same or different machines)
  • run mill _.compile from clone A of a project
  • copy out folder from clone A to clone B
  • run mill _.compile from clone B
  • incremental compilation (Zinc) does not kick in and rather compilation starts from scratch

To be clear : mill's task graph is not really relevant to what I'm saying here : even if the mill cache files would be invalidated, I'd still expect Zinc to benefit from finding pre-computed .class files and a zinc file at the correct location in the out folder, and that currently doesn't seem to be the case.

I've dug some discussions from last year on this topic. This thread (and your comment) could be relevant : #1094 (comment)

@Baccata
Copy link
Contributor Author

Baccata commented Nov 28, 2022

@adpi2 you might be able to help or offer guidance ?

@lefou
Copy link
Member

lefou commented Nov 28, 2022

I think, disregarding any potential invalidation reasons, Mill is currently not abstracting away the absolute locations of the source files. Although we use VirtualFile of zinc API, we simply encode the absolute path of a source file. Instead, we should try to encode a more stable location by introducing some virtual source roots. The most relevant source root might be workspace, so we can refer all source files relative to that location. After that change, different checkouts should be able to use the same source location for the same copy of the source file, enabling it to find pre-computed .class files.

For context, we discussed the "virtual source roots" idea in the context of PathRefs, so we should probably try to use the same roots and the same (new) API code to work with it.

@Baccata
Copy link
Contributor Author

Baccata commented Nov 28, 2022

Any chance the two changes can happen in parallel one another ? Theoretically, the assumption that zinc uses the workspace as virtual root could be implemented concurrently/earlier to a change in PathRef (setting in place some PathRef => VirtualRoot indirection, the implementation of which could change later).

A change in the Zinc invocation would offer a significant improvement in some usecases, and intuitively it feels like it could happen without impacting the API surface of mill (I may be mistaken though)

@lefou
Copy link
Member

lefou commented Nov 28, 2022

I agree with you. I just wanted to point out that any API and general design should take both usages into account. I think an internal conversion of sources located under T.workspace to some workspace-relative VirtualFile (zinc API) can be applied immediately in the implementation (ZincWorkerImpl).

A change to PathRef will most likely introduce a binary incompatible change. We can start developing it now, but it needs to target Mill 0.11. After the release of Mill 0.10.10, I will tag the main branch with 0.11.0-M0 and we can start to merge stuff that changes API and behavior. Keeping some source compatibility (paired with @deprecations) would be nice.

@adpi2
Copy link
Contributor

adpi2 commented Nov 28, 2022

@Baccata, thanks for pinging me. I don't have a lot of experience with Mill or remote caching. But I fixed a few things around remote caching in sbt so, at least, I can share what I know about it.

The main idea, is to create a single and local instance of MappedFileConverter that can convert, back and forth, any machine-local Path into a Zinc VirtualFile, based on a list of rootPaths.

The list of rootPaths in sbt is defined here. It is very important that this list contains all possible root paths of any source file, class folder, jar file etc... Otherwise the MappedFileConverter will create PlainVirtualFiles that are not machine-independent and the cache won't be shareable.

The single instance of MappedFileConverter must then be used in various places:

  • to create the Zinc Inputs: classpath, sources
  • In the Zinc stamper: we need to use machine-independent stamps
  • In the Zinc reporters: we want to report full paths back to the client

Here is the PR that introduced remote caching in sbt. Beware though that there were some bugs after this PR so it would also be useful to compare this first implementation with the current one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants