Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement #16

Open
cg110 opened this issue Jun 18, 2020 · 2 comments
Open

Performance improvement #16

cg110 opened this issue Jun 18, 2020 · 2 comments

Comments

@cg110
Copy link

cg110 commented Jun 18, 2020

Hi,

I was using this tool to fixup some directory casing problems, IE:
git-unite -d

And found that it was very slow progress. This appears to be due to the code for indexEntries being quite slow, as we've 40-50k files in the index, and paths.

I did a bit of an optimization, and altered the code here:
https://github.com/tawman/git-unite/blob/master/src/LibGitUnite/UniteRepository.cs#L156

To

var indexEntries =
                _gitRepository.Index.Where(f =>
                {
                    var lastSeparator = f.Path.LastIndexOf(CharSeparator);

                    if (lastSeparator == -1)
                        return false;

                    var directoryPath = f.Path.Substring(0, lastSeparator);

                    return !foldersFullPathMap.Any(s => s.Contains(directoryPath));

                    //return f.Path.LastIndexOf(Separator, StringComparison.Ordinal) != -1
                    //       &&
                    //       !foldersFullPathMap.Any(s =>
                    //           s.Contains(f.Path.Substring(0,
                    //               f.Path.LastIndexOf(Separator, StringComparison.Ordinal))));
                });

Note that I also added a CharSeparator:
private const char CharSeparator = '\';

Testing on an i9-9900k with ssd, git-unite 2.1 takes 6m25s

With the above optimizations it takes 2m26s. I did wonder if more can be done, eg make the foldersFullPathMap only have the git paths in, rather than the full path.

Still it's a 50% reduction.

@tawman
Copy link
Owner

tawman commented Jun 18, 2020

@cg110 thanks for running some metrics and I will take a look again at optimizing. It has been a long time since this code was written but I know I was checking the LastIndexOf() for a specific directory naming structure to avoid false positives. A simple Contains() might match on a deeper tree.

I will try to generate a 50k file test repo when I get a chance.

@cg110
Copy link
Author

cg110 commented Jun 19, 2020

No probs, it's probably unusual to have such large repos, but now and again someone has the wrong cased directories, and this tool is great for finding that. (generally because someone has an older clone, and the name was re-cased)

I don't think I've changed the logic, I just broken the bits down, so that the .Any() doesn't redo the lastindexof and substring. It was doing s.Contains before.

I'm actually not sure which bit was the main speed up, switching to char, or trying to do each piece of work once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants