-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce public interfaces for per-project and per-evaluation disk I/O callbacks #6728
Conversation
|
https://github.com/lifengl is Lifeng if you want to add him to the review. |
Thank you! I will make the change to better identify the project evaluation for which the interface is requested. MSBuild handles recursion internally and doesn't ask the filesystem for recursive file enumeration so I omitted this from the interface. Also adding @lifengl to reviewers as suggested. |
@arkalyanms I have a hard time parsing this. If there's a 1:1 mapping between Project and ConfiguredProject, then it should be enough to pass the Project being evaluated, is that not true? Or are you saying that that there's a 1:N mapping between Project and ConfiguredProject and it has to be disambiguated with an additional parameter? |
Yes Project will work. It's a 1:1 from Project to ConfiguredProject. I was just saying what makes a ConfiguredProject unique is the combination of Project path and ProjectConfiguration which in turn is 3 dimensional(config, platform, targetframework). |
Got it, thank you for bearing with me. I have updated |
I have updated the interface. The predicate and transform callbacks now pass only the file name as
|
Notes to self:
|
The test failure is caused by the new code having no restrictions on file path length as filtering is always done in |
} | ||
public partial interface IDirectoryCacheFactory | ||
{ | ||
Microsoft.Build.FileSystem.IDirectoryCache GetDirectoryCacheForProject(Microsoft.Build.Evaluation.Project project); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ladipro : there are some gaps between this contract & CPS design:
1, CPS cache is directory based, not project based, and not complete. So it is more like IDirectoryCache GetCacheForDirectory(string directoryPath). If we don't have cache for a directory, can we fall back to msbuild ones? (I am not sure how often this will hit SDK folders, or parent folder chains. CPS basically caches the project tree itself, and some glob cones, but it is not a complete list. Otherwise, we will repeat the same logic your team has built for other folders.
2, one key scenario we use this is the initial evaluation of a project or a new configuration (configurations will share the directory cone most of the time, which we always have after evaluating the first configuration). However, that happens in the construction of the Project, and when we receive the instance of the project back in the middle of the construction, it would be problematic -- we don't know the instance yet, and when we access its internal structure, we don't know what is valid and what is not. It is also a dangerous to leak the instance in the middle of the construction through public contracts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe CPS can always provide IDirectoryCache, and use our cached folder if it is possible, otherwise just fall back to the file system. It will allow us to capture all probing path during evaluation, and use it in project evaluation cache in the future (@arkalyanms). In that case, maybe the design is fine, except that we will get Project before the constructor finishes, which is still a design issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe CPS can always provide IDirectoryCache, and use our cached folder if it is possible, otherwise just fall back to the file system.
@lifengl when we discussed this a few months ago, I remember mentioning the fact that CPS wants to be made aware of directories outside of the project cone - so it can cache and monitor them. So the API has no "I don't know" answer and the implementation is expected to handle all paths. Let me know if I misunderstood.
Thank you for bringing up the issue with Project
being constructed when these callbacks would fire. What else would work for you, other than passing IDirectoryCache
to Project
during construction as described in the other comment? Earlier @arkalyanms also mentioned using the tuple of (ProjectPath, ProjectConfiguration) to identify the configured project. Still an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe some information similar to what is in the ProjectEvaluationStartedEventArgs, ProjectFile, some kind of context, especially the EvaluationId. See my other comments, it is important for us to know the exact evaluation each request comes from, we have already monitor evaluation start/end, which will allow us to collect what belongs a single evaluation.
@@ -872,6 +872,7 @@ public partial class EvaluationContext | |||
{ | |||
internal EvaluationContext() { } | |||
public static Microsoft.Build.Evaluation.Context.EvaluationContext Create(Microsoft.Build.Evaluation.Context.EvaluationContext.SharingPolicy policy) { throw null; } | |||
public static Microsoft.Build.Evaluation.Context.EvaluationContext Create(Microsoft.Build.Evaluation.Context.EvaluationContext.SharingPolicy policy, Microsoft.Build.FileSystem.IDirectoryCacheFactory directoryCacheFactory) { throw null; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A problem of this design is that we need change every code calls into RevaluateProjectIfNecessary, which can be very difficult. Any chance we can provide the CacheFactory to the project, which is used in the lifetime of the project? If the evaluation should not use cache, maybe it can be controlled on CPS side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no scenarios where you would want to evaluate the same Project
with different backing caches, then I guess we could add IDirectoryCache
as a Project
constructor parameter. It's not as clean, though, because this conceptually belongs in evaluation context. Which I thought CPS wanted to use for (all?) evaluations anyway.
} | ||
public partial interface IDirectoryCacheFactory | ||
{ | ||
Microsoft.Build.FileSystem.IDirectoryCache GetDirectoryCacheForProject(Microsoft.Build.Evaluation.Project project); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem is that we want to keep using the same disk cache for a single project evaluation, so the result can be consistent (if the project has two globs to match the same file, we should match both them together or none to keep it consistent.) So, maybe we should pass in the evaluationId here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency is controlled by the lifetime of the EvaluationContext instance. When you want msbuild evaluations to read a different state of the disk, you create a new EvaluationContext and start using that one. At least that's the assumption on which the evaluation context code is built on, there's no separation based on the evaluation ID. Also, the evaluation ID isn't propagated to all all the evaluation components, so it's hard for an arbitrary point of evaluation code to know what evaluation ID is using it.
@@ -1495,6 +1496,19 @@ public partial class ProxyTargets | |||
} | |||
namespace Microsoft.Build.FileSystem | |||
{ | |||
public delegate bool FindPredicate(ref System.ReadOnlySpan<char> fileName); | |||
public delegate TResult FindTransform<TResult>(ref System.ReadOnlySpan<char> fileName); | |||
public partial interface IDirectoryCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to augment MSBuildFileSystemBase with these new methods instead of creating a new interface?. With a new interface, the APIs will get harded to use. Users will have to think whether to implement one or the other, or both, Maybe a new generic MSBuildFileSystemBase that extends MSBuildFileSystemBase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed an update where the interface is now provided via Project
and not EvaluationContext
so having a separate one should be easier to justify. I would argue that having multiple methods providing the same functionality in MSBuildFileSystemBase
would also be confusing for users because it wouldn't be clear which one is used when.
src/Build/Evaluation/Evaluator.cs
Outdated
|
||
// Create a FileMatcher for the given combination of EvaluationContext and the project being evaluated. | ||
IFileSystem fileSystem = _evaluationContext.GetFileSystemForProject(project); | ||
_fileMatcher = new FileMatcher(fileSystem, evaluationContext.FileEntryExpansionCache); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the file matcher have to be sensitive to the Project VS having one for the entire evaluation context? The purpose of the evaluation context was to share common state between projects, not keep per-project state. The Project/Evaluator can keep per Project/evaluation state, maybe building it on top of common state from the evaluation context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that CPS maintains its cache per configured project so its content can be different depending on which project is being evaluated, even if they share one evaluation context. I'll let @lifengl / @arkalyanms confirm this requirement.
I have just pushed an updated based on yesterday's sync (too bad we did not record it) where I was asked to couple IDirectoryCacheFactory
with the Project
rather than with EvaluationContext
. I do agree that having the cache be part of EvaluationContext
would be cleaner but unfortunately it wouldn't work well for the intended use here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is for CPS to maintain multiple evaluation contexts, one for each cache "universe". I'm assuming that they don't have a unique cache per project (what's the point of having a unique cache per project?), but instead a few larger buckets, maybe based on different FS states?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is primary not for that reason, but couple issues we discussed.
1, we want to use this to capture project dependencies, so when things are changed, we can trigger reevaluation. For example, we can capture the project evaluation tries to load Directory.Build.Props file in the folder, although it does not exist, so when the user adds the file, we can trigger a new evaluation. Without something connecting to a specific project evaluation, we cannot do that.
2, for each project, we want to use IDirectoryCache beyond one evaluation context, because ReevaluateIfNecessary() can be called in any random place in CPS extensions. If DirectoryCache need be passed in each of them (as evaluationContext), so if it is called outside of CPS, it may not pass in evaluationContext, and it will hit disk on those calls, then others using the cache state. This can lead into lots of problems because the cache might be slightly behind the real disk, and we can unpredictable behavior in the product scenarios.
However, i noticed a potential problem later in the FindPredicate contract. Because the file/folder patterns are not passed to the CPS side, but it always scan all files/folders under the folder, it might cause excessive invalidation of the evaluation result. I am not sure whether it would become a real problem or not, mostly depending how SDK uses those glob patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point about the missing file/folder pattern. It's intentionally designed like this (i.e. that filtering is done MSBuild-side) so the logic stays in one place and we don't risk running into slightly different behavior depending on how it's implemented. Also, I don't believe that all patterns supported by MSBuild are handled by System.IO
or the underlying filesystem calls so there would still be cases where invalidation would not be precise.
Can you use the result of FindPredicate
to mark files/folders that are actually used by the project and should cause invalidation? In other words, items for which the predicate returns false would still be cached but if the file watcher fires for them the project will not be re-evaluated.
Back to draft as we're still discussing the design. This work is now targeting 17.1. |
179f0fa
to
e800fa8
Compare
We're investigating a code path where CPS is seeing MSBuild hit the disk despite |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few nits.
Fixes #6068
Context
We're looking to use the CPS FS cache for file/directory enumeration and file/directory existence checks done by MSBuild during evaluation. The currently exposed
MSBuildFileSystemBase
base class does not work well for this purpose:IEnumerable<string>
with full paths. Even if we currently end up constructing full paths internally, the interface should be future-proof. We would like to use something more efficient, preferably close to the modern .NET Core file enumeration API, which we're switching to in another related effort.MSBuildFileSystemBase
is applied across all evaluations of all projects sharing the sameEvaluationContext
. CPS wants to know which project is being evaluated and which evaluation (as in evaluation ID) is running. CPS also prefers to set the callbacks for the lifetime of theProject
object as opposed to having to pass them inEvaluationContext
for each evaluation. This is becauseProject
objects are exposed to other components and anyone can reevaluate them without going through CPS.Changes Made
EngineFileUtilities
static. This class was wrapping aFileMatcher
making the code less readable and slightly more complex. A static class is a better fit for the suffix "Utilitites" andFileMatcher
is explicitly passed to methods that need it.EvaluationContext
rather thanIFileSystem
. The former also hasFileMatcher
on it and it's more efficient to reuse it instead of creating a new one. It is generally more future-proof to be passing around a reference to "context" rather than to only a single part of it.FileMatcher.IsMatch
takeReadOnlySpan<char>
instead of string but left the string overload in the code for now. Now that we can useMicrosoft.IO.Redist
several callsites were converted to use the span - the common pattern is to extract the filename from full path and this can be a non-allocating operation on .NET Core and on Framework withMicrosoft.IO.Redist
.IDirectoryCache
with efficient file/directory existence and file/directory enumeration API. The interface is to be implemented by hosts that have their file system caches and want them to be used during evaluation instead of having MSBuild hit disk.IDirectoryCacheFactory
with a factory method to createIDirectoryCache
for an evaluation ID. The host can choose to return null to make MSBuild fall back to the existing behavior of hitting disk.IDirectoryCacheFactory
is passed as a field onProjectOptions
. If the host specifies this interface and also an evaluation context with aMSBuildFileSystemBase
-derived class, this new interface will be used for file/directory existence checks and file/directory enumeration while the rest of the operations (are there any such operations used during evaluation?) would go toMSBuildFileSystemBase
.Testing
Existing and new unit tests, experimental insertion, validation by the CPS team.
Notes