Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache is not stored/reused from previous builds #3042

Closed
nordlow opened this issue May 24, 2017 · 41 comments
Closed

Cache is not stored/reused from previous builds #3042

nordlow opened this issue May 24, 2017 · 41 comments
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request

Comments

@nordlow
Copy link

nordlow commented May 24, 2017

Bazel currently doesn't use caching in step 5 of

  1. Build library X containing a.c
  2. Modify contents of a.c
  3. Build library X again
  4. Revert changes of a.c
  5. Build library X again (caching is not reused)

Test case (that generates 10000 files that are built to a static library) is here: https://github.com/nordlow/build-system-benchmark/blob/master/test_bazel.sh

This is a severe limitation in continous integration server clusters that continously switch between different features branches that differ only in a very small percentage of the code.

Buck doesn't have this limitation.

Is this a known issue?

I'm using bazel-0.4.5 on Ubuntu 17.04.

@nordlow nordlow changed the title Cache is not reused from previous builds Cache is not stored/reused from previous builds May 24, 2017
@nordlow
Copy link
Author

nordlow commented May 30, 2017

Is this, perchance, an expected behaviour and supposed to be solved in version 0.7 (P2) according to

https://bazel.build/roadmap.html

P2. All external repositories can use the local cache

?

@hlopko hlopko added category: performance P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request labels Jun 26, 2017
@ulfjack
Copy link
Contributor

ulfjack commented Jun 26, 2017

No, it doesn't have anything to do with external repositories.

The reason is that the Bazel action cache does not store file contents, so we can't recover the content from the first step - it gets overwritten in step 3. @damienmg actually wants to get rid of the action cache (as it currently exists) and use a 'spawn cache' instead, which is more like the remote caching we're working on. However, it's unclear how we can avoid making copies of all the output files and generally preventing unbounded growth of the cache.

Put differently, if you use the remote cache, you will not see this behavior.

@jgavris
Copy link
Contributor

jgavris commented Jun 26, 2017

Can't the local cache look like the remote cache (content-addressable)? I imagine most remote cache implementations are using a bounded LRU / LFU.

@damienmg
Copy link
Contributor

That is my prototype to do a LRU disk based cache for spawn action. it is unclear if it would replace the action cache.

@ulfjack
Copy link
Contributor

ulfjack commented Jun 26, 2017

@jgavris We could certainly make changes here, and that's exactly what @damienmg is proposing. And certainly, we'd want an appropriate eviction strategy. However, it'd have to be larger than the size of your output tree to be useful, so it'd at least double the disk space needed for Bazel, so this isn't a change to make lightly.

@rahul-malik
Copy link
Contributor

@ulfjack @damienmg - Is there a version of this LRU cache behind an experiment flag? I was thinking of rolling my own solution but would rather use one built into bazel itself.

@damienmg
Copy link
Contributor

damienmg commented Sep 7, 2017 via email

@davido
Copy link
Contributor

davido commented Sep 13, 2017

@ulfjack @damienmg The status of the pending change is not clear to me. Does it work as is? Any comments if it could be used already today for earlier adopters.

I also disagree, with the priority here. From the user perspective, I would just expect sane local disk cache behaviour (across all workspaces and branches on local machine). That is exactly what Buck does per default here: dir (default): Use a directory-based cache on the local filesystem:

[cache]
  mode = dir
  dir = buck-out/cache

Moreover, with my Buck patch: "Add support for user home in cache directory name" the cache directory can be generically assigned to user home directory:

[cache]
  mode = dir
  dir = ~/.gerritcodereview/buck-cache

Now, for all workspaces/clones, for all branches, you would get cache hits with Buck.

However, it'd have to be larger than the size of your output tree to be useful, so it'd at least double the disk space needed for Bazel, so this isn't a change to make lightly.

That's what cache.dir_max_size is for in Buck: The maximum cache size for directory-based caching (cache.mode must contain dir). The default size is unlimited.

[cache]
  dir_max_size = 10GB

To make my POV even more clear: If I would be on Bazel team, I would stop working on anything else, until this bug is fixed and released.

@rahul-malik
Copy link
Contributor

I'd agree but it sounds like the discussion internally hasn't reached consensus?

I'm running a remote cache locally to get around this issue but I expected this behavior from the start and it was a shock to myself and my team that it didn't work this way. The local remote cache is not an ideal solution since I'm also setting up a true remote cache on another cluster and I can't configure more than one cache per bazel invocation

@ulfjack
Copy link
Contributor

ulfjack commented Sep 14, 2017

I don't think at this point anyone is opposing adding support for a local cache, but we're not actively working on it. It should be fairly straightforward to add support for a local cache at this point. @damienmg's change is outdated - most of the classes have been refactored.

I'd go about it by moving the LruCache to a more generic location, and possibly refactoring its API. The most straightforward way to do it seems to be to reuse RemoteActionCache, and implement SimpleBlobStore. Ideally, we'd move any code that's independent of remote / local into a generic class (AbstractSpawnCache?), although it might use some protos from the remote protocol to represent action keys and whatnot.

@rahul-malik
Copy link
Contributor

@ulfjack - While not completely familiar with the code, it seems like we might be able to use OnDiskBlobStore and make SimpleBlobStoreActionCache conform to ActionCache? Please correct if I'm wrong in my interpretation

@ulfjack
Copy link
Contributor

ulfjack commented Sep 15, 2017

We have two interfaces: ActionCache and RemoteActionCache

Implementing ActionCache is a lot more complicated and a lot more difficult to get right, so I do not recommend doing that. Implementing RemoteActionCache is much simpler and easier to get right. Ignore the "Remote" prefix, it doesn't have any meaning at this point. Similarly, the "Remote" prefix of RemoteSpawnCache has no meaning (sorry about that).

The SimpleBlobStoreActionCache implements RemoteActionCache. You can use a RemoteSpawnCache with a SimpleBlobStoreActionCache with a OnDiskBlobStore to implement local caching - prototyping this should be trivial.

There are two problems with doing it that way:

  1. the RemoteSpawnCache is hard-coded to use some of the --remote_* options - I'd suggest to inject the two necessary flags in the constructor - and
  2. the OnDiskBlobStore has no eviction strategy - I'd suggest to generalize the existing LruCache, but if you don't care about eviction, the OnDiskBlobStore would work just fine, too.

@rahul-malik
Copy link
Contributor

Thanks for the tips. I'll look into prototyping something next week. Would it be reasonable to say that the usage of ActionCache would be replaced with SpawnCache or are there reasons to keep ActionCache around?

@davido
Copy link
Contributor

davido commented Sep 16, 2017

[...] You can use a RemoteSpawnCache with a SimpleBlobStoreActionCache with a OnDiskBlobStore to implement local caching - prototyping this should be trivial.

Indeed: https://bazel-review.googlesource.com/16810.

@rahul-malik
Copy link
Contributor

rahul-malik commented Sep 16, 2017

@davido - Thanks for putting this together! That is a lot more straightforward than what I was thinking of doing (replacing all usage of ActionCache with SpawnCache).

I think this is a good start and will likely benefit from a few additions:

  • Use LRUDiskCache vs OnDiskBlobStore to support eviction. Will require adding a new option similar to the dir_max_size option in Buck. The default max_size in @damienmg PR is -1 which implies an unbounded cache which is the same as the behavior with OnDiskBlobStore.
  • Since this effectively replaces the default local action cache, introduce a composite cache that allows for falling back from the local disk cache to a REST / Hazelcast cache. The composite cache would just be the composition of existing caches. The way this is implemented currently means that you can only have a local disk cache OR a rest cache rather than utilizing both.

Thoughts? Happy to help build these changes on top of your PR.

@ulfjack - Given this is behind an experimental flag, will this be able to land before having eviction implemented or is that required for this to merge?

@davido
Copy link
Contributor

davido commented Sep 17, 2017

@rahul-malik Yeah, eviction strategy is definitely something that would need to be added at some point. I'm not sure i understand your second point about replacement of action cache? There is no action cache right now for non remote execution strategy.

I've thoroughly tested my CL And while I see cache hits, when building in the same workspace and switching across different branches, i still see cache misses when switching across different workspaces. Steps to reproduce. Apply my CL and do something like that:

  $ git clone https://gerrit.googlesource.com/gerrit gerrit
  $ cd gerrit
  $ bazel build gerrit --experimental_local_disk_cache_path=/home/davido/.gerritcodereview/bazel-cache/cas --experimental_local_disk_cache
  $ cd ..
  $ git clone https://gerrit.googlesource.com/gerrit gerrit2
  $ cd gerrit2
  $ bazel build gerrit --experimental_local_disk_cache_path=/home/davido/.gerritcodereview/bazel-cache/cas --experimental_local_disk_cache

I would expect the cache hits in the last step, but I'm seeing that everythinng gets recompiled and the cache directory content was expanded with new cache entries.

Just a WAG: this is because execution root contributes to the digest algorithm and is the part of they cache key, so that this requirement can't work atm? Can we tweak key computation here to make this work?

@tawanjairew tawanjairew mentioned this issue Sep 17, 2017
@rahul-malik
Copy link
Contributor

We have two interfaces: ActionCache and RemoteActionCache
The reason is that the Bazel action cache does not store file contents, so we can't recover the content from the first step - it gets overwritten in step 3. @damienmg actually wants to get rid of the action cache (as it currently exists) and use a 'spawn cache' instead, which is more like the remote caching we're working on.

@davido - My second point was mainly around the fact that this PR provides the behavior we expected by default for the local cache. To enable this behavior you have to use the remote strategy but given the code snippet below, you would be unable to utilize this local disk cache in combination with a REST / Hazelcast cache.

  public static SimpleBlobStore create(RemoteOptions options) throws IOException {
    if (isHazelcastOptions(options)) {
      return createHazelcast(options);
    }
    if (isRestUrlOptions(options)) {
      return createRest(options);
    }
    if (isLocalDiskCache(options)) {
      return createLocalDisk(options);
    }
    throw new IllegalArgumentException(
        "Unrecognized concurrent map RemoteOptions: must specify "
            + "either Hazelcast or Rest URL options or local cache.");
  }

@ulfjack
Copy link
Contributor

ulfjack commented Sep 18, 2017

@davido Are you aware of #2574? Try setting --experimental_strict_action_env.

@davido
Copy link
Contributor

davido commented Sep 18, 2017

@ulfjack Ah, right, i saw that, but forgot. Thanks, for pointing this out. I will re-test it later today.

@rahul-malik
Copy link
Contributor

@davido - Another thought also: Does the cache file path need to be absolute? It would be easier to share the change with other developers if we were able to put it under the home directory for instance.

@davido
Copy link
Contributor

davido commented Sep 18, 2017

@rahul-malik Yeah, exactly! Right now it's not possible, but I'm planning to add another change on top of this CL, something similar, to what I did 4 years ago in Buck, as Gerrit Code Review was using Buck. We must support "~/.gerritcodereview/bazel-cache/cas" and commit it under GIT in <gerrit_project>/tools/bazel.rc.

@ittaiz
Copy link
Member

ittaiz commented Sep 18, 2017 via email

@davido
Copy link
Contributor

davido commented Sep 19, 2017

@ulfjack

Are you aware of #2574? Try setting --experimental_strict_action_env.

Passing --experimental_strict_action_env fixed cache misses problem across different workspaces. It works like a charm now!

@davido
Copy link
Contributor

davido commented Sep 19, 2017

Right now it's not possible, but I'm planning to add another change on top of this CL, something similar, to what I did 4 years ago in Buck, as Gerrit Code Review was using Buck. We must support "~/.gerritcodereview/bazel-cache/cas" and commit it under GIT in <gerrit_project>/tools/bazel.rc

This was fixed in the latest patch set of the CL. Now you can pass in generic CAS path using home directory ~ charachter. With this command you would get cache hits for gerrit code review project for both:

  • repeated builds in different branches in the same workspace and
  • repeated builds across different workspaces
  $ bazel build gerrit \
  --experimental_local_disk_cache_path=~/.gerritcodereview/bazel-cache/cas \
  --experimental_local_disk_cache \
  --experimental_strict_action_env

Also note, that these options could (and should) be committed into the GIT tree in the tools/bazel.rc file, so that the sane development environment with local action cache behaviour could be achieved per default and all contributors and maintainers would benefit from it after the first clone without setting up anything else:

  $ cat tools/bazel.rc
  build --experimental_local_disk_cache_path=~/.gerritcodereview/bazel-cache/cas --experimental_local_disk_cache --experimental_strict_action_env

@nordlow
Copy link
Author

nordlow commented Sep 19, 2017

@davido : what do you mean with latest patchset of the CL? Is this on Bazel git master?

@rahul-malik
Copy link
Contributor

@nordlow - The CL he's referring to is on Gerrit right now. You can see the patch here: https://bazel-review.googlesource.com/c/bazel/+/16810

@davido davido mentioned this issue Sep 19, 2017
@davido
Copy link
Contributor

davido commented Sep 19, 2017

@nordlow Can you build custom version of Bazel with this CL, and repeat your benchmark? Note, that you would need to pass these three experimental options:

 $ bazel build foo --experimental_local_disk_cache_path=~/bazel-cas --experimental_local_disk_cache --experimental_strict_action_env

@rahul-malik
Copy link
Contributor

@davido - did this end up merging? Looks like the PR is approved?

@davido
Copy link
Contributor

davido commented Sep 22, 2017

@rahul-malik I think it should be merged in the next days.

@damienmg Any comment what is the ETA?

@damienmg
Copy link
Contributor

I send the CR for internal review but @ulfjack is away today and there seems to be internal test failure (which should not) so I Monday afternoon probably, tuesday at worse.

@davido
Copy link
Contributor

davido commented Sep 28, 2017

@damienmg this can be closed now with 82859b0, right?

@damienmg
Copy link
Contributor

damienmg commented Sep 28, 2017 via email

@rahul-malik
Copy link
Contributor

@damienmg @davido - Did this not make it into 0.6.1? I just updated and was hoping to have this feature.

@damienmg
Copy link
Contributor

damienmg commented Oct 12, 2017 via email

@davido
Copy link
Contributor

davido commented Oct 15, 2017

Confirmed, that this feature is included in 0.7, e.g.: https://storage.googleapis.com/bazel/0.7.0/rc2/index.html .

@Globegitter
Copy link

@davido just tried
build --experimental_local_disk_cache_path=~/.gerritcodereview/bazel-cache/cas --experimental_local_disk_cache --experimental_strict_action_env with bazel 0.10.0 and it seems the home directory is not resolved. Is home directory resolution now supposed to work, or is that still something coming?

@davido
Copy link
Contributor

davido commented Feb 12, 2018

@Globegitter Unfortunately, home directory resolution (~) was undone in the latest versions of my patch. So you can only say:

$ cat ~/.bazelrc | grep cas
build --experimental_local_disk_cache_path=/home/davido/.gerritcodereview/bazel-cache/cas

Note, that's why we cannot add this line to Gerrit Code Review's own tools/bazelr.c file, so that this will be enabled per default:

$ cat tools/bazel.rc
build --experimental_local_disk_cache_path=~/.gerritcodereview/bazel-cache/cas

@davido
Copy link
Contributor

davido commented Mar 14, 2018

@ittaiz @Globegitter home directory resolution (~) should be supported now, see #4852.

@davido
Copy link
Contributor

davido commented Mar 14, 2018

@ulfjack

I'd go about it by moving the LruCache to a more generic location, and possibly refactoring its API.

What LruCache are you referencing? Buck triggers cache directory clean up of its DirArtifactCache implementation after checking on each store operation if dir cache max size was configured and whether or not the current directory size exceeds this limit.

@ulfjack
Copy link
Contributor

ulfjack commented Mar 15, 2018

Apparently I misremembered - I thought we had a LruCache class for the repository cache. Sorry for the confusion.

@ulfjack ulfjack removed their assignment Nov 19, 2018
@meisterT meisterT added team-Performance Issues for Performance teams team-Execution and removed category: performance labels Nov 29, 2018
@jin jin added team-Local-Exec Issues and PRs for the Execution (Local) team and removed team-Execution labels Jan 14, 2019
@dslomov dslomov removed the team-Performance Issues for Performance teams label Feb 15, 2019
@jmmv
Copy link
Contributor

jmmv commented May 11, 2020

I think this feature is now available via --disk_cache? At least the behavior I observe experimentally is what was being asked for here.

@jmmv jmmv closed this as completed May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Local-Exec Issues and PRs for the Execution (Local) team type: feature request
Projects
None yet
Development

No branches or pull requests