Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Break cache if Gemfile.lock or .rubocop.yml change. #300

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion lib/erb_lint/cache.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ def initialize(config, cache_dir = nil)
@cache_dir = cache_dir || CACHE_DIRECTORY
@hits = []
@new_results = []
@gemfile_lock = File.read("Gemfile.lock") if File.exist?("Gemfile.lock")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I think this should effectively cache the File.read for all calls of checksum.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this will also bust the cache whenever any project dependencies change, even if the rubocop version did not. So, while I think this will work to achieve the desired effect not reusing the cache between different rubocop versions, it will also make running the erblint slower than it could/should be by not utilizing the cache when it could.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I was aware and okay with that tradeoff, erring on the side of caution.

Alternatively, you would have to hash the versions for rubocop and all rubocop related plugins in Gemfile.lock. Doable but I didn't want to spend any more time on it.

I find this simple and effective, with the caveat that non-rubocop updates will break the cache but if you want to write up the alternative solution that would be great.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @joshuapinter. Makes sense; just wanted to make sure it was known.

I agree with the tradeoffs here and that the priority should be addressing the issue over complete optimization. (I came here after hitting this same issue in our own CI workflow and being surprised by failures.)

Just for comparison / completeness, it looks like rubocop itself uses the whole Gemfile.lock in one case as a checksum, and does a plugin-aware checksum of the rubocop scour in another:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what information is available here, but is it worth doing this only if the Rubocop linter is enabled? (If that's possible, it seems like a meaningful improvement.)

@rubocop_config = File.read(".rubocop.yml") if File.exist?(".rubocop.yml")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, if there's a way to fetch this from linters.Rubocop.rubocop_config.inherit_from, that seems worthwhile, since it is parameterized in the first place: https://github.com/goatapp/rubocop/blob/97e4ffc8a71e9e5239a927c6a534dfc1e0da917f/manual/configuration.md?plain=1#L81

Also, I don't know how common it is, but it looks like this library does allow for URIs there also: https://github.com/goatapp/rubocop/blob/97e4ffc8a71e9e5239a927c6a534dfc1e0da917f/manual/configuration.md?plain=1#L86

Actually, given the complexity / variability there, if there's a way to access an instance of the Rubocop linter here, it may be worth removing dependence on the source of that configuration.

Copy link

@oehlschl oehlschl Sep 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, pardon the commentary. I think this is a good PR, and I don't want to let solving for all the cases get in the way of anything; just reading through the code and thinking aloud.

It looks like runner context is available in run_using_cache, so maybe some "runner checksum" could be passed an optional arg to cache.get() and cache.set()? That solves for:

  • through the runner, putting linters in charge of their own checksum logic
  • not being coupled to / duplicating awareness of how linters are configured

I'll give this some thought over the weekend and see if I come up with anything useful.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put up something small for feedback here: #373

puts "Cache mode is on"
end

Expand Down Expand Up @@ -76,7 +78,7 @@ def checksum(filename, file_content)
mode = File.stat(filename).mode

digester.update(
"#{mode}#{config.to_hash}#{ERBLint::VERSION}#{file_content}",
"#{mode}#{config.to_hash}#{ERBLint::VERSION}#{@rubocop_config}#{@gemfile_lock}#{file_content}",
)
digester.hexdigest
rescue Errno::ENOENT
Expand Down
Loading