Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TargetFinder performance #1427

Closed
bquorning opened this issue Nov 4, 2014 · 2 comments
Closed

TargetFinder performance #1427

bquorning opened this issue Nov 4, 2014 · 2 comments

Comments

@bquorning
Copy link
Contributor

I work on a rather large repository, and running Rubocop without explicit file arguments is becoming quite slow.

A few relevant numbers from my repository:

>> Dir.glob("vendor/**/*").size
=> 145517
>> Dir.glob(".git/**/*").size
=> 10344

Obviously, my .rubocop.yml includes

AllCops:
  Exclude:
    - vendor/**/*

Using be1f420 (latest master), I added these lines at the bottom on TargetFinder#target_files_in_dir:

       target_files
+
+      puts all_files.size
+      puts hidden_files.size
+      puts target_files.size
+
+      exit
     end

Which gives me these numbers:

time bin/rubocop
153874
21660
1901
bin/rubocop  12.84s user 2.58s system 99% cpu 15.568 total

That's 15 seconds spent even before the cops start running.

It looks like half of the time is spent by hidden_files.include?(file) inside TargetFinder#to_inspect?.

I’m thinking: would there be a way to entirely exclude my rather excessive vendor folder from Rubocop (besides moving it elsewhere)? Adding something like all_files -= all_files.grep(/\/vendor\//) to #target_files_in_dir might be a solution…

@jonas054
Copy link
Collaborator

jonas054 commented Nov 4, 2014

A patch that seems to help a bit is:

diff --git a/lib/rubocop/target_finder.rb b/lib/rubocop/target_finder.rb
index 730248f..fcfbeb5 100644
--- a/lib/rubocop/target_finder.rb
+++ b/lib/rubocop/target_finder.rb
@@ -1,5 +1,7 @@
 # encoding: utf-8

+require 'set'
+
 module RuboCop
   # This class finds target files to inspect by scanning the directory tree
   # and picking ruby files.
@@ -56,7 +58,7 @@ module RuboCop
         base_dir.gsub!(File::ALT_SEPARATOR, File::SEPARATOR)
       end
       all_files = find_files(base_dir, File::FNM_DOTMATCH)
-      hidden_files = all_files - find_files(base_dir, 0)
+      hidden_files = Set.new(all_files - find_files(base_dir, 0))
       base_dir_config = @config_store.for(base_dir)

       target_files = all_files.select do |file|

but there are some problems remaining related to finding hidden files, and it's possible that after fixing those the situation will be better and this optimization will not be necessary. I want to work on the errors first (mainly #1425), and then look at performance.

@bquorning
Copy link
Contributor Author

With this patch (still with the exit at the bottom of TargetFinder#target_files_in_dir) running rubocop without arguments now takes 1.4 seconds instead of 7.6 seconds.

diff --git a/lib/rubocop/target_finder.rb b/lib/rubocop/target_finder.rb
index 9a10173..c64f14c 100644
--- a/lib/rubocop/target_finder.rb
+++ b/lib/rubocop/target_finder.rb
@@ -80,8 +80,11 @@ module RuboCop
       base_dir_config.file_to_include?(file)
     end

-    def find_files(base_dir, flags)
-      Dir.glob("#{base_dir}/**/*", flags).select { |path| FileTest.file?(path) }
+    def find_files(base_dir, flags, excluded_toplevel_dirs = %w(.git vendor))
+      toplevel_dirs = Dir.glob("*", flags) - %w(. ..)
+      wanted_toplevel_dirs = toplevel_dirs - excluded_toplevel_dirs
+
+      Dir.glob("{#{wanted_toplevel_dirs.join(',')}}/**/*", flags).select { |path| FileTest.file?(path) }
     end

It would be nice if the base config Exclude statements could be subtracted from the Dir glob like I do here. Those 140.000 files in vendor somehow slow everything down ;-)

bbatsov added a commit that referenced this issue Nov 15, 2014
[Fix #1427] Exclude top level directories early
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants