Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce public interfaces for per-project and per-evaluation disk I/O callbacks #6728

Merged
merged 23 commits into from
Oct 21, 2021

Conversation

ladipro
Copy link
Member

@ladipro ladipro commented Aug 3, 2021

Fixes #6068

Context

We're looking to use the CPS FS cache for file/directory enumeration and file/directory existence checks done by MSBuild during evaluation. The currently exposed MSBuildFileSystemBase base class does not work well for this purpose:

  1. Due to its use of IEnumerable<string> with full paths. Even if we currently end up constructing full paths internally, the interface should be future-proof. We would like to use something more efficient, preferably close to the modern .NET Core file enumeration API, which we're switching to in another related effort.
  2. Due to the scope. MSBuildFileSystemBase is applied across all evaluations of all projects sharing the same EvaluationContext. CPS wants to know which project is being evaluated and which evaluation (as in evaluation ID) is running. CPS also prefers to set the callbacks for the lifetime of the Project object as opposed to having to pass them in EvaluationContext for each evaluation. This is because Project objects are exposed to other components and anyone can reevaluate them without going through CPS.

Changes Made

  1. Refactoring
    • Made EngineFileUtilities static. This class was wrapping a FileMatcher making the code less readable and slightly more complex. A static class is a better fit for the suffix "Utilitites" and FileMatcher is explicitly passed to methods that need it.
    • Made a few evaluation-related classes and methods take EvaluationContext rather than IFileSystem. The former also has FileMatcher on it and it's more efficient to reuse it instead of creating a new one. It is generally more future-proof to be passing around a reference to "context" rather than to only a single part of it.
    • Made FileMatcher.IsMatch take ReadOnlySpan<char> instead of string but left the string overload in the code for now. Now that we can use Microsoft.IO.Redist several callsites were converted to use the span - the common pattern is to extract the filename from full path and this can be a non-allocating operation on .NET Core and on Framework with Microsoft.IO.Redist.
  2. Interfaces
    • Added IDirectoryCache with efficient file/directory existence and file/directory enumeration API. The interface is to be implemented by hosts that have their file system caches and want them to be used during evaluation instead of having MSBuild hit disk.
    • Added IDirectoryCacheFactory with a factory method to create IDirectoryCache for an evaluation ID. The host can choose to return null to make MSBuild fall back to the existing behavior of hitting disk. IDirectoryCacheFactory is passed as a field on ProjectOptions. If the host specifies this interface and also an evaluation context with a MSBuildFileSystemBase-derived class, this new interface will be used for file/directory existence checks and file/directory enumeration while the rest of the operations (are there any such operations used during evaluation?) would go to MSBuildFileSystemBase.

Testing

Existing and new unit tests, experimental insertion, validation by the CPS team.

Notes

@arkalyanms
Copy link
Member

arkalyanms commented Aug 3, 2021

  • I believe CPS will also need the particular project evaluation asking for the enumeration, not just the project path. It's a 1:1 mapping from MSBuild.Project to ConfiguredProject that is made unique by a tuple of (ProjectPath, ProjectConfiguration(2 or 3 dimensional configuration that includes configuration, platform and targetframework in the case of netcore)). Depending on how simple you want to keep the contract you can choose between those 2 params. The 3rd dimension is project system specific.
    • CPS uses that to map globbing/file matching. This exists via GetAllGlobs today. But that doesn't cover file probing paths where if a user adds a new directory.props file we don't know if it needs an evaluation.
  • It looks like you take 1 predicate for the inclusion. Do you also need another for the recursion separately?

@arkalyanms
Copy link
Member

https://github.com/lifengl is Lifeng if you want to add him to the review.

@ladipro ladipro marked this pull request as draft August 3, 2021 20:28
@ladipro
Copy link
Member Author

ladipro commented Aug 3, 2021

Thank you! I will make the change to better identify the project evaluation for which the interface is requested.

MSBuild handles recursion internally and doesn't ask the filesystem for recursive file enumeration so I omitted this from the interface.

Also adding @lifengl to reviewers as suggested.
Edit: It looks like his handle cannot be added, hope he'll see the mention.

@ladipro
Copy link
Member Author

ladipro commented Aug 4, 2021

I believe CPS will also need the particular project evaluation asking for the enumeration, not just the project path. It's a 1:1 mapping from MSBuild.Project to ConfiguredProject that is made unique by a tuple of (ProjectPath, ProjectConfiguration(2 or 3 dimensional configuration that includes configuration, platform and targetframework in the case of netcore)). Depending on how simple you want to keep the contract you can choose between those 2 params. The 3rd dimension is project system specific.

@arkalyanms I have a hard time parsing this. If there's a 1:1 mapping between Project and ConfiguredProject, then it should be enough to pass the Project being evaluated, is that not true? Or are you saying that that there's a 1:N mapping between Project and ConfiguredProject and it has to be disambiguated with an additional parameter?

@arkalyanms
Copy link
Member

I believe CPS will also need the particular project evaluation asking for the enumeration, not just the project path. It's a 1:1 mapping from MSBuild.Project to ConfiguredProject that is made unique by a tuple of (ProjectPath, ProjectConfiguration(2 or 3 dimensional configuration that includes configuration, platform and targetframework in the case of netcore)). Depending on how simple you want to keep the contract you can choose between those 2 params. The 3rd dimension is project system specific.

@arkalyanms I have a hard time parsing this. If there's a 1:1 mapping between Project and ConfiguredProject, then it should be enough to pass the Project being evaluated, is that not true? Or are you saying that that there's a 1:N mapping between Project and ConfiguredProject and it has to be disambiguated with an additional parameter?

Yes Project will work. It's a 1:1 from Project to ConfiguredProject.

I was just saying what makes a ConfiguredProject unique is the combination of Project path and ProjectConfiguration which in turn is 3 dimensional(config, platform, targetframework).

@ladipro
Copy link
Member Author

ladipro commented Aug 4, 2021

Got it, thank you for bearing with me. I have updated GetDirectoryCacheForProject to take Project rather than project path.

@ladipro
Copy link
Member Author

ladipro commented Aug 5, 2021

I have updated the interface. The predicate and transform callbacks now pass only the file name as ReadOnlySpan<char> instead of the full FileSystemEntry because the latter

  • Is not CLS compliant,
  • Exposes no construction API,
  • Contains data that CPS does not have and MSBuild does not need. MSBuild needs just names.

@ladipro
Copy link
Member Author

ladipro commented Aug 10, 2021

Notes to self:

@ladipro
Copy link
Member Author

ladipro commented Aug 10, 2021

The test failure is caused by the new code having no restrictions on file path length as filtering is always done in FileMatcher and not in file I/O routines as before. Looks like another reason for changing FileSystem. We could keep passing pattern down to the OS for full backward compatibility and filter entries MSBuild-side only if IDirectoryCache is provided by the host.

@ladipro ladipro linked an issue Aug 12, 2021 that may be closed by this pull request
5 tasks
}
public partial interface IDirectoryCacheFactory
{
Microsoft.Build.FileSystem.IDirectoryCache GetDirectoryCacheForProject(Microsoft.Build.Evaluation.Project project);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ladipro : there are some gaps between this contract & CPS design:
1, CPS cache is directory based, not project based, and not complete. So it is more like IDirectoryCache GetCacheForDirectory(string directoryPath). If we don't have cache for a directory, can we fall back to msbuild ones? (I am not sure how often this will hit SDK folders, or parent folder chains. CPS basically caches the project tree itself, and some glob cones, but it is not a complete list. Otherwise, we will repeat the same logic your team has built for other folders.

2, one key scenario we use this is the initial evaluation of a project or a new configuration (configurations will share the directory cone most of the time, which we always have after evaluating the first configuration). However, that happens in the construction of the Project, and when we receive the instance of the project back in the middle of the construction, it would be problematic -- we don't know the instance yet, and when we access its internal structure, we don't know what is valid and what is not. It is also a dangerous to leak the instance in the middle of the construction through public contracts.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe CPS can always provide IDirectoryCache, and use our cached folder if it is possible, otherwise just fall back to the file system. It will allow us to capture all probing path during evaluation, and use it in project evaluation cache in the future (@arkalyanms). In that case, maybe the design is fine, except that we will get Project before the constructor finishes, which is still a design issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe CPS can always provide IDirectoryCache, and use our cached folder if it is possible, otherwise just fall back to the file system.

@lifengl when we discussed this a few months ago, I remember mentioning the fact that CPS wants to be made aware of directories outside of the project cone - so it can cache and monitor them. So the API has no "I don't know" answer and the implementation is expected to handle all paths. Let me know if I misunderstood.

Thank you for bringing up the issue with Project being constructed when these callbacks would fire. What else would work for you, other than passing IDirectoryCache to Project during construction as described in the other comment? Earlier @arkalyanms also mentioned using the tuple of (ProjectPath, ProjectConfiguration) to identify the configured project. Still an option?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe some information similar to what is in the ProjectEvaluationStartedEventArgs, ProjectFile, some kind of context, especially the EvaluationId. See my other comments, it is important for us to know the exact evaluation each request comes from, we have already monitor evaluation start/end, which will allow us to collect what belongs a single evaluation.

@@ -872,6 +872,7 @@ public partial class EvaluationContext
{
internal EvaluationContext() { }
public static Microsoft.Build.Evaluation.Context.EvaluationContext Create(Microsoft.Build.Evaluation.Context.EvaluationContext.SharingPolicy policy) { throw null; }
public static Microsoft.Build.Evaluation.Context.EvaluationContext Create(Microsoft.Build.Evaluation.Context.EvaluationContext.SharingPolicy policy, Microsoft.Build.FileSystem.IDirectoryCacheFactory directoryCacheFactory) { throw null; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem of this design is that we need change every code calls into RevaluateProjectIfNecessary, which can be very difficult. Any chance we can provide the CacheFactory to the project, which is used in the lifetime of the project? If the evaluation should not use cache, maybe it can be controlled on CPS side?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are no scenarios where you would want to evaluate the same Project with different backing caches, then I guess we could add IDirectoryCache as a Project constructor parameter. It's not as clean, though, because this conceptually belongs in evaluation context. Which I thought CPS wanted to use for (all?) evaluations anyway.

}
public partial interface IDirectoryCacheFactory
{
Microsoft.Build.FileSystem.IDirectoryCache GetDirectoryCacheForProject(Microsoft.Build.Evaluation.Project project);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem is that we want to keep using the same disk cache for a single project evaluation, so the result can be consistent (if the project has two globs to match the same file, we should match both them together or none to keep it consistent.) So, maybe we should pass in the evaluationId here?

Copy link
Contributor

@cdmihai cdmihai Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency is controlled by the lifetime of the EvaluationContext instance. When you want msbuild evaluations to read a different state of the disk, you create a new EvaluationContext and start using that one. At least that's the assumption on which the evaluation context code is built on, there's no separation based on the evaluation ID. Also, the evaluation ID isn't propagated to all all the evaluation components, so it's hard for an arbitrary point of evaluation code to know what evaluation ID is using it.

@@ -1495,6 +1496,19 @@ public partial class ProxyTargets
}
namespace Microsoft.Build.FileSystem
{
public delegate bool FindPredicate(ref System.ReadOnlySpan<char> fileName);
public delegate TResult FindTransform<TResult>(ref System.ReadOnlySpan<char> fileName);
public partial interface IDirectoryCache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to augment MSBuildFileSystemBase with these new methods instead of creating a new interface?. With a new interface, the APIs will get harded to use. Users will have to think whether to implement one or the other, or both, Maybe a new generic MSBuildFileSystemBase that extends MSBuildFileSystemBase.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an update where the interface is now provided via Project and not EvaluationContext so having a separate one should be easier to justify. I would argue that having multiple methods providing the same functionality in MSBuildFileSystemBase would also be confusing for users because it wouldn't be clear which one is used when.


// Create a FileMatcher for the given combination of EvaluationContext and the project being evaluated.
IFileSystem fileSystem = _evaluationContext.GetFileSystemForProject(project);
_fileMatcher = new FileMatcher(fileSystem, evaluationContext.FileEntryExpansionCache);
Copy link
Contributor

@cdmihai cdmihai Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the file matcher have to be sensitive to the Project VS having one for the entire evaluation context? The purpose of the evaluation context was to share common state between projects, not keep per-project state. The Project/Evaluator can keep per Project/evaluation state, maybe building it on top of common state from the evaluation context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that CPS maintains its cache per configured project so its content can be different depending on which project is being evaluated, even if they share one evaluation context. I'll let @lifengl / @arkalyanms confirm this requirement.

I have just pushed an updated based on yesterday's sync (too bad we did not record it) where I was asked to couple IDirectoryCacheFactory with the Project rather than with EvaluationContext. I do agree that having the cache be part of EvaluationContext would be cleaner but unfortunately it wouldn't work well for the intended use here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is for CPS to maintain multiple evaluation contexts, one for each cache "universe". I'm assuming that they don't have a unique cache per project (what's the point of having a unique cache per project?), but instead a few larger buckets, maybe based on different FS states?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is primary not for that reason, but couple issues we discussed.

1, we want to use this to capture project dependencies, so when things are changed, we can trigger reevaluation. For example, we can capture the project evaluation tries to load Directory.Build.Props file in the folder, although it does not exist, so when the user adds the file, we can trigger a new evaluation. Without something connecting to a specific project evaluation, we cannot do that.

2, for each project, we want to use IDirectoryCache beyond one evaluation context, because ReevaluateIfNecessary() can be called in any random place in CPS extensions. If DirectoryCache need be passed in each of them (as evaluationContext), so if it is called outside of CPS, it may not pass in evaluationContext, and it will hit disk on those calls, then others using the cache state. This can lead into lots of problems because the cache might be slightly behind the real disk, and we can unpredictable behavior in the product scenarios.

However, i noticed a potential problem later in the FindPredicate contract. Because the file/folder patterns are not passed to the CPS side, but it always scan all files/folders under the folder, it might cause excessive invalidation of the evaluation result. I am not sure whether it would become a real problem or not, mostly depending how SDK uses those glob patterns.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point about the missing file/folder pattern. It's intentionally designed like this (i.e. that filtering is done MSBuild-side) so the logic stays in one place and we don't risk running into slightly different behavior depending on how it's implemented. Also, I don't believe that all patterns supported by MSBuild are handled by System.IO or the underlying filesystem calls so there would still be cases where invalidation would not be precise.

Can you use the result of FindPredicate to mark files/folders that are actually used by the project and should cause invalidation? In other words, items for which the predicate returns false would still be cached but if the file watcher fires for them the project will not be re-evaluated.

@ladipro ladipro changed the title Introduce IDirectoryCache & IDirectoryCacheFactory Introduce IDirectoryCache - a public interfaces for per-project and per-evaluation disk I/O callbacks Aug 19, 2021
@ladipro ladipro changed the title Introduce IDirectoryCache - a public interfaces for per-project and per-evaluation disk I/O callbacks Introduce a public interfaces for per-project and per-evaluation disk I/O callbacks Aug 19, 2021
@ladipro ladipro marked this pull request as ready for review August 19, 2021 20:49
@ladipro ladipro changed the title Introduce a public interfaces for per-project and per-evaluation disk I/O callbacks Introduce public interfaces for per-project and per-evaluation disk I/O callbacks Aug 19, 2021
@ladipro ladipro marked this pull request as draft August 31, 2021 09:24
@ladipro
Copy link
Member Author

ladipro commented Aug 31, 2021

Back to draft as we're still discussing the design. This work is now targeting 17.1.

@ladipro ladipro marked this pull request as ready for review October 15, 2021 15:12
@ladipro
Copy link
Member Author

ladipro commented Oct 15, 2021

We're investigating a code path where CPS is seeing MSBuild hit the disk despite IDirectoryCacheFactory having been passed to the Project. It may be a pre-existing issue of not propagating IFileSystem to all places where file I/O is happening. The fix will likely be a targeted change so I am marking this PR ready for review.

src/Build/Definition/Project.cs Outdated Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
src/Shared/FileMatcher.cs Show resolved Hide resolved
Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with a few nits.

src/Build/FileSystem/IDirectoryCache.cs Outdated Show resolved Hide resolved
src/Build/FileSystem/IDirectoryCache.cs Outdated Show resolved Hide resolved
src/Build/FileSystem/IDirectoryCache.cs Show resolved Hide resolved
src/Build/Evaluation/Expander.cs Show resolved Hide resolved
@ladipro ladipro merged commit ef2ecfd into dotnet:main Oct 21, 2021
@ladipro ladipro deleted the 6068-disk-scan-interface branch October 21, 2021 12:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid disk scans for projects by relying on DirectoryTree from CPS
6 participants