Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add woke ignore traversal #117

Merged

Conversation

armanrahman22
Copy link
Contributor

@armanrahman22 armanrahman22 commented Jul 30, 2021

Please check if the PR fulfills these requirements

  • The commit message follows our guidelines
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)
Feature

What is the current behavior? (You can also link to an open issue here)
Woke currently only looks at .gitignore and .wokeignore files in the current directory.

What is the new behavior (if this is a feature change)?
Woke will now traverse a repo and build a priority ranked list of ignore rules in exactly the same way that git does for .gitignore files at different directory levels. The tool will look for the root git directory (where the .git folder is) and use that for the root. If no root git directory is found then woke will just run with the current directory as root (still traversing children folders).

Does this PR introduce a breaking change? (What changes might users need to make due to this PR?)
No

Other information:
Addresses #98

@codecov
Copy link

codecov bot commented Jul 30, 2021

Codecov Report

Merging #117 (7d1f505) into main (a79c304) will decrease coverage by 1.36%.
The diff coverage is 77.77%.

@@            Coverage Diff             @@
##             main     #117      +/-   ##
==========================================
- Coverage   95.85%   94.49%   -1.37%     
==========================================
  Files          22       22              
  Lines         748      781      +33     
==========================================
+ Hits          717      738      +21     
- Misses         19       25       +6     
- Partials       12       18       +6     
Impacted Files Coverage Δ
cmd/root.go 82.05% <33.33%> (-6.53%) ⬇️
pkg/ignore/ignore.go 86.04% <83.78%> (-13.96%) ⬇️
pkg/parser/parser.go 100.00% <100.00%> (ø)
pkg/util/string.go 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1b32c2b...7d1f505. Read the comment docs.

Copy link
Member

@caitlinelfring caitlinelfring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited about this contribution! I have a few concerns that I'd love to hear your thoughts on.

  • Inclusion of two new dependencies seems to have increased the final binary size by 6MB (from 8 to 14)
  • Duplicating of code from go-git repo -- are there any licensing concerns? Should there be a contribution to their repo to make ignore files configurable instead of copying and modifying the code in this repo?
  • Performance impact - can you provide regression benchmarking for these changes on a large repo (ie large number of files and directory traversals, not large filesizes)? I'm mostly interested in memory utilization and time.
  • Code coverage. There are a lot of uncovered lines added that should be tested for. You should be able to see it in codecov or the vscode golang plugin
  • I haven't looked at the details, but windows tests appear to be failing

pkg/util/string.go Show resolved Hide resolved
pkg/ignore/ignore.go Outdated Show resolved Hide resolved
pkg/ignore/ignore.go Outdated Show resolved Hide resolved
pkg/ignore/ignore.go Outdated Show resolved Hide resolved
pkg/ignore/ignore_test.go Outdated Show resolved Hide resolved
pkg/ignore/dir.go Outdated Show resolved Hide resolved
pkg/ignore/ignore.go Outdated Show resolved Hide resolved
go.mod Show resolved Hide resolved
pkg/ignore/ignore.go Outdated Show resolved Hide resolved
pkg/ignore/ignore.go Outdated Show resolved Hide resolved
Copy link
Member

@caitlinelfring caitlinelfring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking much better, I just have a few comments

(from my original comment)

Performance impact - can you provide regression benchmarking for these changes on a large repo (ie large number of files and directory traversals, not large filesizes)? I'm mostly interested in memory utilization and time.

Alternatively, if you could make this opt-in instead of default, this would reduce the risk of a negative impact on repos that have no use case for ignore traversal. Maybe a cli argument like --enable-ignore-traversal?

I also see code coverage is reduced, specifically pkg/ignore/ignore.go and I'd love for tests to be added to bring that back up before merging.

Lastly, this is a pretty nice feature that I'd love to have documented, a quick summary of this feature would be awesome in https://github.com/get-woke/woke/blob/main/docs/ignore.md

go.mod Show resolved Hide resolved
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect
)

replace github.com/go-git/go-git/v5 => github.com/inclusive-dev-tools/go-git/v5 v5.4.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use replace and not just import from the fork directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose replace because if go-git makes changes to allow for more flexibility with ignore file types we can just switch back to using the upstream repo instead of the fork and we only have to change this one line vs the imports on each file that uses go-git. But I can also change to using direct imports if that works better!

@kevinleung23
Copy link
Contributor

Joining/subscribing to thread to stay updated :)

@armanrahman22
Copy link
Contributor Author

@caitlinelfring, @KSLHacks will be taking over any updates to this PR till I get back from vacation. I've added a section to the readme explaining the new functionality, and from the Regression Benchmarking CI Run it looks like it's not adding any time complexity. Added a new benchmark to ignore_test.go which simulates a large repo (10,000 files and 1,000 folders) and it seems like it runs in a reasonable time. @KSLHacks will add more about the memory usage. And @KSLHacks is also looking into improving code coverage, but I think the only uncovered lines are error-catching from OS operations (not sure how to reach that in a test). Let us know if you have any more questions/concerns (and sorry this PR is taking so long haha)!

@jeremydelacruz
Copy link
Contributor

Hi @caitlinelfring, I chatted with @KSLHacks and took a little bit of a look at this as well. With the benchmarks, I'm wondering what we'd like to see in terms of memory utilization metrics. My understanding is that the current benchmarks via cob are measuring and comparing memory allocation in AllocedBytesPerOp. Is this insufficient for how we'd like to measure it? If so, would love to hear your thoughts on what we'd like to see instead. Here's an example screenshot from a CI run to show what I'm referring to:
image

As for code coverage, I agree on @armanrahman22's point about the uncovered lines being mainly around OS operation error-catching. I looked into a couple potential approaches to mock these OS operations to cover error catching and increase coverage in ignore.go. One such approach is creating some sort of IgnoreFactory that can store these dependency function references as properties/fields so that they could be more easily mocked/substituted during unit testing. The IgnoreFactory would also be used to call the GetRootGitDir and NewIgnore methods off of. Interested in what your thoughts are on this!

@caitlinelfring
Copy link
Member

Sincere apologies for taking so long to sync back on this. I'm not too worries about the benchmarking, if there's a real negative impact from users, we can revert this change and re-evaluate. And agreed on your points of code coverage. I don't think it's worth increasing the complexity of the tests just to get better coverage numbers in this case.

I'm good to merge this once merge conflicts have been resolved.

Again, apologies for the delay in response. Happy to get this published soon! Thanks for your patience and all your contributions!

@jeremydelacruz
Copy link
Contributor

No worries @caitlinelfring and thank you for your getting back to us! That all sounds good to me. I've just pushed up a commit to resolve that last merge conflict.

Please let me know if anything else is needed to get this merged in!

@caitlinelfring caitlinelfring merged commit 6531f50 into get-woke:main Mar 18, 2022
@caitlinelfring
Copy link
Member

This is now available in v0.18.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants