Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(controller): support filtering Git subscriptions using globs and path prefixes #1778

Merged
merged 5 commits into from
Apr 18, 2024

Conversation

maksimstankevic
Copy link
Contributor

…e Git Subscription

@maksimstankevic maksimstankevic requested a review from a team as a code owner April 5, 2024 16:45
Copy link

netlify bot commented Apr 5, 2024

Deploy Preview for docs-kargo-akuity-io ready!

Name Link
🔨 Latest commit 2ac11c4
🔍 Latest deploy log https://app.netlify.com/sites/docs-kargo-akuity-io/deploys/662021184e9d7e00096d9509
😎 Deploy Preview https://deploy-preview-1778.kargo.akuity.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Apr 5, 2024

Codecov Report

Attention: Patch coverage is 91.83673% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 46.14%. Comparing base (255578c) to head (2ac11c4).
Report is 17 commits behind head on main.

Files Patch % Lines
internal/controller/warehouses/git.go 91.83% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1778      +/-   ##
==========================================
+ Coverage   44.89%   46.14%   +1.25%     
==========================================
  Files         213      213              
  Lines       13624    13868     +244     
==========================================
+ Hits         6116     6399     +283     
+ Misses       7244     7183      -61     
- Partials      264      286      +22     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maksimstankevic
Copy link
Contributor Author

@hiddeco @krancour as this is a continuation of includePaths/excludePaths story, tagging you both to take a look at my changes for added glob matching when you have a second

@krancour
Copy link
Member

krancour commented Apr 5, 2024

@maksimstankevic thanks for this!

This looks pretty good. I've got just a few miscellaneous observations and questions.

  • I think the comments on IncludePaths and ExcludePaths fields need to be updated to reflect the new options (and the codegen re-run).

  • I'm not sure whether this really should be based only on globs. As it stands currently, I think people would be quite surprised to find that foo/bar isn't a match for the pattern foo/. @maksimstankevic and @hiddeco what are each of your thoughts about this?

  • I only just noticed this now and didn't notice in feat(api/controller): support filtering rules Git subscriptions #1477 -- does this path filtering only work with the NewestFromBranch commit selection strategy? That appears to be the case and I'm now wondering why that is. (We can open a separate issue for this if we determine this was a mistake.)

@maksimstankevic
Copy link
Contributor Author

maksimstankevic commented Apr 7, 2024

  • I think the comments on IncludePaths and ExcludePaths fields need to be updated to reflect the new options (and the codegen re-run).
    -----My bad, will work on it after next point is finalised.
  • I'm not sure whether this really should be based only on globs. As it stands currently, I think people would be quite surprised to find that foo/bar isn't a match for the pattern foo/. @maksimstankevic and @hiddeco what are each of your thoughts about this?
    -----I agree, but do we combine prefix matching and glob matching under the same unprefixed string array (that holds only globs currently)? Or we introduce new "regex/p:"-like prefix for either "prefix:" or "glob:" and leave the other one unprefixed?
  • I only just noticed this now and didn't notice in feat(api/controller): support filtering rules Git subscriptions #1477 -- does this path filtering only work with the NewestFromBranch commit selection strategy? That appears to be the case and I'm now wondering why that is. (We can open a separate issue for this if we determine this was a mistake.)
    -----I somehow thought back then that tag filtering should not interfere with commit filtering. Now I have less confidence it was a correct approach but I agree if it's decided to add includePaths/excludePaths filtering to CommitSelectionStrategies involving tags - that needs to be done in a separate issue
    @krancour @hiddeco

@krancour
Copy link
Member

krancour commented Apr 7, 2024

Ok. So second point is the one most in need of addressing. imho, yes... prefix and glob match should go together, but I am willing to hear arguments against that. My argument in favor of it is that anyone accustomed to working in a shell environment is likely to find that to be the least surprising behavior. I contend most people would never think to do foo/* and would assume that foo/ suffices.

@hiddeco
Copy link
Contributor

hiddeco commented Apr 8, 2024

One argument against combining them would be that there could be potential conflicts or overlaps in certain scenarios, such as when dealing with literals versus special characters, nested patterns, or character escaping.

To solve this, one would have to start defining (and documenting) rules around precedence. Which in the end may make it easier from both a technical and user perspective to create a separation between the two.

@maksimstankevic
Copy link
Contributor Author

Yeah, I agree, in my eyes that would be an overcomplication for a simple topic. I would go with "regex:/regexp:" prefix for regexps, "glob:" prefix for globs, these first 2 being the advanced options for folks wishing to use them (like me 😄 ) and strings without prefix would be for exact matches and string prefixes like "foo/" to match "foo/bar".

@krancour
Copy link
Member

krancour commented Apr 8, 2024

I'm a little confused because I feel as if we all probably use prefixes and globs on a regular basis in our shells without ever having to specify which we're using.

We also use both in .gitignore files.

@hiddeco
Copy link
Contributor

hiddeco commented Apr 8, 2024

There are small details to take into account here:

  1. Globbing in the shell is a two-step feature. The globbing happens before feeding the input to the command, and it's not a feature of the CLI you are working with.
  2. .gitignore is a special format, and not 1:1 equal to glob https://git-scm.com/docs/gitignore#_pattern_format.

@krancour
Copy link
Member

krancour commented Apr 8, 2024

Globbing in the shell is a two-step feature. The globbing happens before feeding the input to the command, and it's not a feature of the CLI you are working with.

Right... but that's an implementation detail. You don't think about that when you write commands like mv foo/ bar/ or mv foo/*.go bar/ you just expect either way of doing things to work.

@hiddeco
Copy link
Contributor

hiddeco commented Apr 8, 2024

But that implementation detail means that globbing > prefix matches. Which means there is an order of precedence.

@krancour
Copy link
Member

krancour commented Apr 8, 2024

Which means there is an order of precedence.

That may technically be so, but how often do you stop and think about this?

That is to say, don't we all possess an abstract intuition about how patterns are matched that doesn't require us to think so deeply about precedence? Would you not find it surprising if foo/ and foo/* (for example) didn't both work with our default matching strategy?

@hiddeco
Copy link
Contributor

hiddeco commented Apr 8, 2024

That may technically be so, but how often do you stop and think about this?

Not very often, but the shell also present me with different things to e.g. do literal string interpretations. Assuming I have a directory called foo?, I can do ls "foo?" to ensure it wouldn't be parsed as a glob. Using a declarative API, this would probably have to become foo\? as the input is a literal string already, and it needs to be escaped somehow.

These kind of details make me worry about edge cases for directory structures and (not so) great people do while naming things. In addition to it creating technical hurdles over simply using an identifier, as you can't e.g. say "if it contains one of these special characters, it's a glob".

Would you not find it surprising if foo/ and foo/* (for example) didn't both work with our default matching strategy?

It would depend on what the API tells me the features are, if the default is just simple prefix matching and there also is a regular expression feature with an identifier. I wouldn't, I think.

@krancour
Copy link
Member

krancour commented Apr 8, 2024

It would depend on what the API tells me the features are

This is a fair point, but I also want to default to the least surprising behavior for the benefit of those who didn't read the manual.

That is the main thing I care about.

If we think prefix matching would be the least surprising default, let's go with that and use the glob: prefix for glob-matching.

@maksimstankevic
Copy link
Contributor Author

Thanks @hiddeco @krancour, I'll do it with "glob:" for globs and no prefix for prefix and exact string matching.

@krancour
Copy link
Member

krancour commented Apr 8, 2024

@maksimstankevic as always, thank you for your patience and your efforts!

…e Git Subscription

Signed-off-by: Maksim Stankevic <maksim.stankevic1@gmail.com>
Signed-off-by: Maksim Stankevic <maksim.stankevic1@gmail.com>
Signed-off-by: Maksim Stankevic <maksim.stankevic1@gmail.com>
@maksimstankevic
Copy link
Contributor Author

@hiddeco @krancour pushed it, please review when you can

@@ -303,19 +306,19 @@ func ignores(tagName string, ignore []string) bool {
return false
}

// pathsFilterPositive returns true when IncludePaths and/or ExcludePaths
// matchesPathsFilters returns true when IncludePaths and/or ExcludePaths
// filters match one or more commit diffs and new Freight is
// to be produced. It returns false otherwise.
func matchesPathsFilters(includePaths []string, excludePaths []string, diffs []string) (bool, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks like it could be written more simply and more efficiently as well.

Something like the following pseudocode would probably be easier to understand and perform better:

for diff in diffs:
  if !match(diff, includePrefixes) && !match(diff, includeGlobs) && !match(diff, includeRegexes):
    // Diff not selected by any of the patterns
    continue
  // If we get to here, the diff was selected by one of the patterns. Check if it's un-selected.
  if match(diff, excludePrefixes) || match(dif, excludeGlobs) || match(diff, excludeRegexes):
    // Diff was unselected by one of the patterns
    continue
  // If we get to here, we have found one path that was selected by the include patterns
  // and not un-selected by the exclude patterns. That's enough to say we found a change
  // that should be allowed to produce Freight.
  return true

The benefits of this implementation are that it returns from the entire function as soon as it finds even one path they is selected and not un-selected. It also leverages short-circuiting in the boolean logic so that we don't ever check against more patterns than we need to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krancour absolutely, change pushed, I need to improve on my code-owner spirit :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think in addition to this, it may be good to make this interface based. I outlined an approach like this in hiddeco@43573b7.

The advantage of this is that include and exclude rules can maintain their declared order of precedence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maksimstankevic it looks like these changes don't avoid as many unnecessary comparison as I had hoped.

What I had in mind was more like this:

	for _, diffPath := range diffs {
		if len(includePathsRegexps) > 0 || len(includePathsGlobs) > 0 ||
			len(includePathsPrefixes) > 0 {
			// If none of the above had been non-empty, the path is implicitly included
			matches, err := matchesGlobs(diffPath, includePathsGlobs)
			if err != nil {
				return false, fmt.Errorf(
					"syntax error in include glob patterns: %w",
					err,
				)
			}
			if !matches && !matchesPrefix(diffPath, includePathsPrefixes) &&
				!matchesRegexpList(diffPath, includePathsRegexps) {
				// diffPath didn't match any of the include filters
				continue
			}
		}
		// If we get to here, the path was either implicitly or explicitly included
		matches, err := matchesGlobs(diffPath, excludePathsGlobs)
		if err != nil {
			return false, fmt.Errorf(
				"syntax error in exclude glob patterns: %w",
				err,
			)
		}
		if matches || matchesPrefix(diffPath, excludePathsPrefixes) ||
			matchesRegexpList(diffPath, excludePathsRegexps) {
			// diffPath matched an exclude filter
			continue
		}
		// If we get to here, the path was not explicitly excluded
		return true, nil
	}
	return false, nil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually... hold off on that... @hiddeco and I are discussing another alternative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some back and forth, @hiddeco and I agreed something along these lines will check both boxes of avoiding unnecessary comparisons and making the code easier to follow overall.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some back and forth, @hiddeco and I agreed something along these lines will check both boxes of avoiding unnecessary comparisons and making the code easier to follow overall.

@hiddeco @krancour I will refactor and test it out this week

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maksimstankevic. We didn't thoroughly test it. It was a "something like this should work." I did run the existing tests though and saw only minor issues that are probably fixed easily. Didn't go too deep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krancour @hiddeco along these lines worked out of the box, no work for me other than testing, changes pushed. Elegant and scalable design.

Signed-off-by: Maksim Stankevic <maksim.stankevic1@gmail.com>
Signed-off-by: Maksim Stankevic <maksim.stankevic1@gmail.com>
@krancour
Copy link
Member

@maksimstankevic I think this looks pretty great. Thanks for the hard work.

At a later date, when documenting the changes, I may adjust the comments on the IncludePaths and ExcludePaths fields a bit, but this functionality all looks 👍 to me.

LGTM unless @hiddeco has anything more to say about it. (@hiddeco go ahead and merge if you're good.)

@krancour krancour assigned hiddeco and unassigned krancour Apr 18, 2024
@hiddeco hiddeco changed the title feat(controller): adding glob matching in includePaths/excludePaths filters of Warehous… feat(controller): support filtering Git subscription paths using globs and path prefies Apr 18, 2024
@hiddeco hiddeco changed the title feat(controller): support filtering Git subscription paths using globs and path prefies feat(controller): support filtering Git subscription paths using globs and path prefixes Apr 18, 2024
@hiddeco hiddeco changed the title feat(controller): support filtering Git subscription paths using globs and path prefixes feat(controller): support filtering Git subscriptions using globs and path prefixes Apr 18, 2024
Copy link
Contributor

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also happy with how this looks, thanks @maksimstankevic! 🍒

As another potential follow up, I think we could improve the tests a bit to provide some guarantees when people mix-and-match patterns. But that's absolutely non-blocking for now.

@hiddeco hiddeco added this pull request to the merge queue Apr 18, 2024
@maksimstankevic
Copy link
Contributor Author

Also happy with how this looks, thanks @maksimstankevic! 🍒

As another potential follow up, I think we could improve the tests a bit to provide some guarantees when people mix-and-match patterns. But that's absolutely non-blocking for now.

Just filed a new issue for this: #1862, will take it and add thorough testing

thanks @hiddeco !

Merged via the queue into akuity:main with commit 592ba2d Apr 18, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants