-
Notifications
You must be signed in to change notification settings - Fork 841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New approach to precompiled cache? #3860
Comments
Without thinking much about the details, this makes sense to me. I do think it's worth discussing:
A common complaint is that switching between profiling enabled and disabled causes too many recompiles. With this proposal, at least hackage/git extra-deps won't have to be recompiled, but if your project has a lot of local packages there would still be a lot of recompilation. Some kind of solution to that would be great, but not sure if it should be part of this proposal or not. |
I think the profiling issue is just a bug in Stack right now. While switching from no profiling to profiling should cause a recompile, the other way around shouldn't. That should reduce at least some of the friction in going back and forth with profiling builds. We could in principle have the same kind of logic at this cache layer. |
Seems good to me! This seems very similar to how I was thinking implicit snapshots would work. Is the main difference that initially they get installed to their own package DB? This way they aren't stored under the path of a particular snapshot. If so, then I grok this and am 👍 on it. |
I guess you could look at it that way, sure. Sounds like this should move ahead, no one's raising any concerns. If I get a chance to get started, I'll comment here. If someone else wants to take a stab at it instead, I'll be very happy to advise :). |
This has really nice implications for intero's binary. 👍 |
I think overall this makes sense.
Hmm, For completeness, here is how
(https://github.com/sol/tinc/blob/8f8007e62d9953eca7cf1e3c6d9aefb988eda47a/src/Util.hs#L49)
Not sure about
then I didn't have the feeling that this was particularly slow. Isolated, per project data bases would certainly be neat (if that is what "local databases" means). |
Numbers, registering 92 packages takes 195ms on my system. |
Registering all transitive dependencies for |
To clarify: the goal of this proposal is to make extra-deps cacheable. What we don't want to allow is a local filepath (as opposed to say a package from Hackage or a Git repo) be cacheable. It's possible, by taking a hash of the entire directory contents, but it's rife with difficulties (like making sure we find all of the files that actually impact the build) and will likely end up with large amounts of garbage in the cache. It could be that the Your comments on |
I'm a Haskell beginner, but if possible, would like to contribute to implementing this solution. @snoyberg mentioned there's mentorship and I think I would be able to contribute with some help. This looks like a really neat feature! |
I'm in the same boat as @DebugSteven - Haskell beginner that thinks they can contribute w/ some help. Since @DebugSteven "got here first", I'll assume they will work on the code but happy to contribute if I can or if plans change. |
As I see it, there are going to be (at least) four pieces of work here worth doing:
Step (4) is the easiest to parallelize: once (1) is done, multiple people (myself included) can start using the new branch and report on issues. (2) is probably close to trivial. (3) is something I'm volunteering to do regularly, and encourage others to do as well. So that leaves (1). I'd recommend only one person at a time work on it, and if that's @DebugSteven, cool. If it looks like you're stuck, or won't have time to continue, please ping this issue so others can jump in if desired. All that said, how the hell do we approach this issue? ;) I've pushed a massively incomplete commit to the 3860-better-cache branch (d767905), which shows the parts of the code base that I think need to be touched. Essentially, here's what's going on:
It's a bit hairy getting all of this right, but the prior art of the precompiled cache should be good guidance on moving forward here. Also, there's very little "smart" Haskell code here, this is basically a big ol' imperative mess :) Does this give enough info to get started on? And feel free to ask questions at any point. |
Yeah I think that's enough information to get started. I'll get started on this today and probably have some questions posted here later today. |
How does Stack determine right now which packages need to be recompiled? The way I understand it is some packages are stored in the cache. Before Stack recompiles any package it checks the cache to see if it’s already been compiled. Anything that has been changed (different version of a package, new dependency, etc.) in the project is recompiled (and the immutable dependencies are added to the cache). Build.hs has configure options for how to build projects. What are the configure options and what, if anything, needs to added for immutable packages? My guess is nothing. If we determine it’s immutable we don’t need to build it, which happens in ConstructPlan, and those packages are added to the cache. It seems like the majority of the changes should be made in ConstructPlan.hs, specifically What is the package index in "look up in the package index and see if there's a recommendation available" and what does that mean in terms of adding code to accomplish seeing if there’s a recommendation? The first part of the question might lead me to answer the second part. |
Great questions. One from me in return: when you say Alright, some background. A package database is a concept from GHC itself. It contains information on installed libraries. Every project in Stack has three package databases it cares about:
When we get to
In
Today, with the precompiled cache: in the The goal here would be that, when reaching this phase, Stack will instead:
I think your intuition is correct, and most changes will occur in I hope this helps, let me know if I can further clarify. |
I just had a call with @DebugSteven to go over things. One takeaway we had is that, instead of focusing on the code in One other thing that popped up was
This is handled today by the signature of the precompiled cache functions, which as an example look like: readPrecompiledCache :: forall env. HasEnvConfig env
=> PackageLocationIndex FilePath -- ^ target package
-> ConfigureOpts
-> Set GhcPkgId -- ^ dependencies
-> RIO env (Maybe PrecompiledCache) Note that the three arguments represent the three things mentioned above. One thing that I think may need revisitng is the definition of data ConfigureOpts = ConfigureOpts
{ coDirs :: ![String]
-- ^ Options related to various paths. We separate these out since they do
-- not have an impact on the contents of the compiled binary for checking
-- if we can use an existing precompiled cache.
, coNoDirs :: ![String]
} We separate out configure options which refer to specific file paths, since they shouldn't invalidate the cache. For example, if a package was installed into an LTS 8.12 specific directory, that shouldn't preclude it from being used for LTS 8.13. With the new approach being discussed here, we may have a better method. Instead of using paths which refer to the snapshot, we can use paths which install into cache-specific directories. This will avoid the need to some awkward configuration option splits, and provide an easy way to ensure that all packages that end up in the cache are fully installed inside @DebugSteven If this idea is still hazy, don't worry about it too much. We should be able to get things working by just attacking the |
@snoyberg @DebugSteven Glad you all are syncing up; I've been looking around and decided I'd try my hand at incorporating the information about the immutable directory into
Trying to figure out integration with the lenses by way of adding one or more of the polymorphic
The current head: https://github.com/elrikdante/stack/commit/e1f4fd04d41d04dc2d79fc9bd2e93566b6454879 also please ping me next time you all are doing a call - glad to help! |
@snoyberg -Note that the three arguments represent the three things mentioned above. readPrecompiledCache :: forall env. HasEnvConfig env
=> PackageLocationIndex FilePath -- ^ target package
-> ConfigureOpts
-> Set GhcPkgId -- ^ dependencies
-> RIO env (Maybe PrecompiledCache) via ./src/Stack/Build/ConstructPlan.hs:368:packageIsImmutable :: PackageLocationIndex FilePath -> AllImmutable We could have a situation like this (toplevel or within the body doesn't matter; I don't need to change signatures to feel useful =) readPrecompiledCache :: forall env. HasEnvConfig env
=> AllImmutable
-> ConfigureOpts
-> Set GhcPkgId -- ^ dependencies
-> RIO env (Maybe PrecompiledCache) |
Just asking a clarifying question: Is it possible to get to a point where we're better than However, A midpoint between the two is if A lot of the compilation in a normal devel cycle happens at the leaves ( |
If I understand the question correctly, you're talking about bypassing the need to fully recompile local packages in a project when dependencies change. Unfortunately, with the current build system as it is, we have very little control over that. We essentially just shell out to I think we're overall better set up for this than Nix, because we at least have a chance of reusing previously built object files, but I have to admit a large level of ignorance in how the Nix build process works, and what options there are for modifying it. |
The nix process is quite transparent to me - they basically start an isolated workspace with all dependencies available, run some command (similar to
Does |
This is probably getting off topic for this issue, so I'd rather not go into too much detail here (the topic at hand is already complex enough). The basic answer is: for packages which are located on disk already, there is not isolated workspace created, everything is built in the local directory. For packages which come from Hackage, it will unpack into a temporary directory. For Git repos, it will clone into an isolated location. So in theory, object files from Git repos and local packages can be reused, but not for Hackage packages. |
I just came across this and noticed pier is not mentioned. It seems to be a great start at a more nix-like caching approach. http://hackage.haskell.org/package/pier, useful 15m intro. |
Wow, |
Closing in favor of #3922. |
Pinging @mgsloan, @borsboom, and @sol. This is a discussion issue related to the current precompiled cache, nix-style builds, and implicit snapshots.
Goal Maximize the amount of binary caching Stack is able to do.
extra-dep
package at the base of a large dependency hierarchy.stack.yaml
file. This will promote many more packages in the configuration from local to snapshot packages. Besides implementation complexity, the biggest downside is disk space usage: each time you tweakextra-deps
and rebuild, you'll get binary builds in~/.stack
which last forever.nix-style
builds/package databases. GHC 8.0 (IIRC) added the ability for package databases to allow more than one copy of a package/version combo in a database.cabal new-build
uses this to get greater sharing of binary builds than Stack. I'm not familiar with some of the details of how this is implemented (such as how this would work with a Git-based package installation), but at least three downsides arerunghc
like we can with Stack, since a single database is no longer a consistent viewNow I'm going to offer a new approach, approach 3. We have both immutable and mutable package sources. Local file paths are mutable. We first state than anything mutable, and anything that depends on a mutable package source, cannot be cached. (We could get into arguments about caching based on the file hashes, but I think it's overall not worth it.) That leaves us with immutable package sources, where the package contents come from Hackage, some archive, or a Git/Mercurial repository.
We no longer care if these packages are in snapshots or local databases, that detail is totally irrelevant. Every time we build one of these packages, we install its library and other data files into a directory under
~/.stack
, let's say~/.stack/immutable
, with a directory structure that uses hashes to fully encapsulate all of the settings for this package (package source itself, dependencies, flags, etc). In addition, we register the package into its own package database inside that new directory.We keep snapshot database and local database logic the same. But now, instead of rebuilding packages in local databases, or having special logic for the precompiled cache, we have a simple algorithm we follow every time we build an immutable package:
~/.stack/immutable
~/.stack/immutable
package database to the appropriate snapshot or local databaseThere's also some bikeshedding around whether we would even need a snapshot database after this, or if we can simply always use local databases. I'm not too terribly invested in that, but I have a feeling that due to the slowness of
ghc-pkg
's registering capabilities we'd want to keep the snapshot database concept.The motivation for this came up when I was planning how to rewrite the Stackage build process to maximize sharing by switching over to Stack as the build tool, and realized this change would greatly help the Stackage build process.
The text was updated successfully, but these errors were encountered: