-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Include/Exclude filtering capability to Unzip Task (#5169) #6018
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
src/Tasks/Resources/Strings.resx
Outdated
@@ -2792,6 +2792,9 @@ | |||
<data name="Unzip.DidNotUnzipBecauseOfFileMatch"> | |||
<value>Did not unzip from file "{0}" to file "{1}" because the "{2}" parameter was set to "{3}" in the project and the files' sizes and timestamps match.</value> | |||
</data> | |||
<data name="Unzip.DidNotUnzipBecauseOfFilter"> | |||
<value>Did not unzip file "{0}" because it didn't match the include or matched the exclude filter.</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
<value>Did not unzip file "{0}" because it didn't match the include or matched the exclude filter.</value> | |
<value>Did not unzip file "{0}" because it didn't match the include or because it matched the exclude filter.</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I have applied it in the new commits
using (TestEnvironment testEnvironment = TestEnvironment.Create()) | ||
{ | ||
TransientTestFolder source = testEnvironment.CreateFolder(createFolder: true); | ||
TransientTestFolder destination = testEnvironment.CreateFolder(createFolder: false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind modifying this test to use wildcards such that two are included, of which one is also excluded; a third is also excluded; and a fourth isn't excluded or included?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at all, I hope this new version is more what you were looking for?
@@ -1217,6 +1217,8 @@ public sealed partial class Unzip : Microsoft.Build.Tasks.TaskExtension, Microso | |||
public Unzip() { } | |||
[Microsoft.Build.Framework.RequiredAttribute] | |||
public Microsoft.Build.Framework.ITaskItem DestinationFolder { get { throw null; } set { } } | |||
public string Exclude { get { throw null; } set { } } | |||
public string Include { get { throw null; } set { } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we stick with regular expressions for this, we should change the names for Exclude
and Include
as they are "default" msbuild names. ie. items are added via Include and the patterns differ between them and regular expressions. A suggested name during PR review was IncludePattern
& ExcludePattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, the functionality differs indeed. I have pushed the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benvillalobos, I thought we'd agreed globs were more MSBuild-y than regex? Confused by your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did, I posted this while we were talking about it, hence the "If we stick with regular expressions." The current implementation is still regex, is this how we process includes and excludes normally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; we use globs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benvillalobos ok, thank you for the clearing up. Is there a good example in MSBuild of how you'd like this to function I can use as a lead for the implementation? Just want to make sure that it feels native without any quirks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out Expander.cs
, ExpandIntoItemsLeaveEscaped
may have what you're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ItemSpec.cs
is also relevant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @benvillalobos , I studied Expander and ItemSpec but they were outside of reach for the Tasks assembly and I did not want to introduce dependencies so I leveraged FileMatcher to handle the normalization and verification of the globs and paths. It does not support property references due to this. I hope this matches with your expected behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MSBuildGlob would have been great here, but unfortunately it's not visible by tasks. :(
…per Review Suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR overall looks good, pending the switch from regex to globs.
@@ -1217,6 +1217,8 @@ public sealed partial class Unzip : Microsoft.Build.Tasks.TaskExtension, Microso | |||
public Unzip() { } | |||
[Microsoft.Build.Framework.RequiredAttribute] | |||
public Microsoft.Build.Framework.ITaskItem DestinationFolder { get { throw null; } set { } } | |||
public string Exclude { get { throw null; } set { } } | |||
public string Include { get { throw null; } set { } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're in agreement 🙂
@IvanLieckens the to-do list here would be:
- Rename ExcludePattern and IncludePattern back to Exclude and Include (sorry)
- We wanted ExcludePattern if we were sticking with RegEx, but it turns out we don't want to use regex here
- Change the implementation to expect * parse a glob pattern
…o regular glob matching instead of RegEx
src/Shared/FileMatcher.cs
Outdated
@@ -31,6 +31,8 @@ internal class FileMatcher | |||
private static readonly char[] s_wildcardCharacters = { '*', '?' }; | |||
private static readonly char[] s_wildcardAndSemicolonCharacters = { '*', '?', ';' }; | |||
|
|||
private static readonly string[] s_propertyReferences = { "$(", "@(" }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static readonly string[] s_propertyReferences = { "$(", "@(" }; | |
private static readonly string[] s_propertyAndItemReferences = { "$(", "@(" }; |
Items are referred to with an @
and properties with $
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I just copied from the old name of the method. I adjusted the naming now to these.
src/Shared/FileMatcher.cs
Outdated
/// <summary> | ||
/// Determines whether the given path has any property references. | ||
/// </summary> | ||
internal static bool HasPropertyReferences(string filespec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal static bool HasPropertyReferences(string filespec) | |
internal static bool HasPropertyOrItemReferences(string filespec) |
Another option would be to create a HasPropertyReferences
that checks for $(
and a separate HasItemReferences
that checks for @(
. Though I don't feel too strongly about that extra suggestion since filematcher already has something like s_wildcardAndSemicolonCharacters
and HasWildcardsOrSemicolon
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I split up the existing method that was already there to allow more granular calling and identification. But I didn't want to introduce too many new methods if they weren't needed.
/// </summary> | ||
internal static bool HasWildcardsOrSemicolon(string filespec) | ||
{ | ||
return -1 != filespec.LastIndexOfAny(s_wildcardAndSemicolonCharacters); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any significant perf difference between using Aggregate
and -1 != filespec.LastIndexOfAny(s_wildcardAndSemicolonCharacters)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you thinking Aggregate with a function like Aggregate(false, (acc, ch) => acc || s_wildcardAndSemicolonCharacters.Contains(ch))? That would almost certainly be slower than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the difference between s_propertyReferences.Aggregate(false, (current, propertyReference) => current | filespec.Contains(propertyReference));
and -1 != filespec.LastIndexOfAny(s_propertyReferences);
If the former is more efficient, we can change HasWildcardsOrSemicolon
to do the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe LastIndexOf is going to be faster but I had to use the aggregate for the other because the s_wildcardAndSemicolonCharacters are char[]
and the s_propertyReferences are string[]
. Would need to setup a microbenchmark to validate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I didn't realize that we couldn't replicate what was already there. This looks fine to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgind improved it further now to use Any() :)
src/Tasks/Unzip.cs
Outdated
patterns = pattern.Contains(';') | ||
? pattern.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries).Select(FileMatcher.Normalize).ToArray() | ||
: new[] { pattern }; | ||
if (patterns.Any(p => p.IndexOfAny(Path.GetInvalidPathChars()) != -1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
Move this before the split?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very strange...the docs for Path.GetInvalidPathChars explicitly says that "on Windows-based desktop platforms, invalid path characters might include...less than (<), greater than (>), pipe (|),..." yet trying that out on my Windows-based desktop platform, it didn't. I submitted an issue about it: dotnet/dotnet-api-docs#5292
I'd use FileUtilities.InvalidPathChars and modify the tests to target | or a character 1-31 and make it the same across all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very interesting, I changed to using the FileUtilities.InvalidPathChars and added a pipe in the name in the test whilst removing the platform specific flag.
src/Tasks/Unzip.cs
Outdated
@@ -212,5 +275,41 @@ private bool ShouldSkipEntry(ZipArchiveEntry zipArchiveEntry, FileInfo fileInfo) | |||
&& zipArchiveEntry.LastWriteTime == fileInfo.LastWriteTimeUtc | |||
&& zipArchiveEntry.Length == fileInfo.Length; | |||
} | |||
|
|||
private bool ParseIncludeExclude() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this return anything? Looking at ParsePattern below, it either throws an error or returns true. That means that this either throws an error or returns true. I was momentarily confused when I thought it was skipping everything if there were no include/exclude present, so I'd just remove that bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I removed the return and swapped to using Log.HasLoggedErrors
to jump out preventing further execution when the Task is misconfigured.
Co-authored-by: Forgind <Forgind@users.noreply.github.com>
…n Unzip unit tests
Co-authored-by: Forgind <Forgind@users.noreply.github.com>
Co-authored-by: Forgind <Forgind@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for bearing with us on this!
@@ -204,7 +204,7 @@ internal static bool HasWildcardsSemicolonItemOrPropertyReferences(string filesp | |||
/// </summary> | |||
internal static bool HasPropertyOrItemReferences(string filespec) | |||
{ | |||
return s_propertyAndItemReferences.Any(ref=> filespec.Contains(ref)); | |||
return s_propertyAndItemReferences.Any(filespec.Contains); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Nice!
Co-authored-by: Mihai Codoban <micodoba@microsoft.com>
Co-authored-by: Mihai Codoban <micodoba@microsoft.com>
Thanks @IvanLieckens! |
Fixes #5169
Context
See #5169
Changes Made
Unzip Task now has "Include" and "Exclude" optional properties to pass a pattern to filter archive entries to be unzipped.
Testing
Added following tests:
These 3 test the ability to include/exclude files from the archive unzip.
Notes
Unable to translate the resources to all languages. Can someone provide guidance/translations?