Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow repository_rules to run on stale state #23720

Open
dgoldstein0 opened this issue Sep 23, 2024 · 5 comments
Open

allow repository_rules to run on stale state #23720

dgoldstein0 opened this issue Sep 23, 2024 · 5 comments
Labels
help wanted Someone outside the Bazel team could own this P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request

Comments

@dgoldstein0
Copy link

dgoldstein0 commented Sep 23, 2024

Description of the feature request:

repository_rules are a way to run arbitrary imperative operations during bazel's loading phase. The default is that bazel clears their output folder before each execution of the repository rule - a good default, but it makes some use cases challenging. I propose adding a parameter to repository_rule: clear_output_folder_before_fetch which should be True by default (maintaining current behavior), but could be set to False for use cases that trust their rules to properly handle stale state.

Which category does this issue belong to?

Rules API

What underlying problem are you trying to solve with this feature?

I've written a repository rule using yarn to install node_modules/ which we've integrated with our custom js library rules. Yarn itself has a ton of logic to handle incremental updating of the generated node_modules/, so that yarn install after a small change to the yarn.lock can take a few seconds rather than a minute or more. Since bazel clears the repository rule output folder, either we end up with quite bad performance, or we need quite a bit of extra complexity to put the outputs in a different folder and symlink them to bazel's output folder for the repository rule, so that we only have to create and rewrite symlinks on each repeat invocation and let yarn take care of the rest, as opposed to starting from scratch; (the symlink hacks cost us a few seconds of execution time - and also have caused us to encounter bugs in yarn - so aren't free either, but much better than no workaround)

Likely other "use a package manager or other build system in a repository rule" use cases have this problem as well.

Which operating system are you running Bazel on?

linux - ubuntu 22.04

What is the output of bazel info release?

release 6.1.0-1c2df03733503215490047c398d25fe5b5553a0e

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse HEAD ?

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

@github-actions github-actions bot added the team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts label Sep 23, 2024
@mwoehlke-kitware
Copy link

Likely other "use a package manager or other build system in a repository rule" use cases have this problem as well.

Can confirm; ran into this same problem with a repository that creates a Python virtual environment.

Also, repository rules sometimes run even when there has been no change. Why? (In particular, bazel shutdown triggers a re-run.)

@Wyverald Wyverald added team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. and removed team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts labels Oct 9, 2024
@Wyverald
Copy link
Member

Wyverald commented Oct 9, 2024

Also, repository rules sometimes run even when there has been no change. Why? (In particular, bazel shutdown triggers a re-run.)

This sounds like a bug. If you could pin it down in a minimal repro case and file an issue, we'd appreciate it!

@jwnimmer-tri
Copy link
Contributor

In @mwoehlke-kitware's case we were purposefully setting local = True for other reasons, in which case I believe re-running after a shutdown is the desired / specified behavior and there is no bug.

@mwoehlke-kitware
Copy link

In @mwoehlke-kitware's case we were purposefully setting local = True ...

Ah, I missed that note in the docs. Regardless, something like the proposed feature would still be useful for us!

I propose adding a parameter to repository_rule: clear_output_folder_before_fetch ...

I'd like to propose an alternative: allow specific files and/or trees to be marked such that they aren't purged when a refetch needs to occur. (Naturally, bazel clean or at least bazel clean --expunge should still nuke these.) This would be more flexible than the original proposal, but marking the whole output tree thusly should be equivalent to the original proposal.

@meteorcloudy meteorcloudy added P2 We'll consider working on this in future. (Assignee optional) help wanted Someone outside the Bazel team could own this and removed untriaged labels Oct 29, 2024
@meteorcloudy
Copy link
Member

Sounds like interesting idea, @dgoldstein0 if you want to contribute, a simple design doc would be nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Someone outside the Bazel team could own this P2 We'll consider working on this in future. (Assignee optional) team-ExternalDeps External dependency handling, remote repositiories, WORKSPACE file. type: feature request
Projects
None yet
Development

No branches or pull requests

8 participants