Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache container elements in ODataUriResolver model elements cache #2623

Merged
merged 6 commits into from
Feb 28, 2023

Conversation

habbes
Copy link
Contributor

@habbes habbes commented Feb 22, 2023

Issues

This pull request fixes #2534 .

Description

This is a follow up to PR #2610, it adds container elements (operation imports and navigation sources) to the case-insensitive cache used during case-insensitive URI resolution.

Checklist (Uncheck if it is not completed)

  • Test cases added
  • Build and test with one-click build and test script passed

Additional work necessary

If documentation update is needed, please add "Docs Needed" label to the issue and provide details about the required document change in the issue.

xuzhg
xuzhg previously approved these changes Feb 22, 2023

/// <summary>
/// Builds a case-insensitive cache of schema elements from
/// the specified <paramref name="model"/>.
/// </summary>
/// <param name="model">The model whose schema elements to cache. This model should be immutable. See <see cref="ExtensionMethods.MarkAsImmutable(IEdmModel)"/>.</param>
public NormalizedSchemaElementsCache(IEdmModel model)
public NormalizedModelElementsCache(IEdmModel model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we do the cache in the 'EdmModel'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by cache in the EdmModel? Does the EdmModel have a cache?

Also, in this case we're dealing with the IEdmModel interface, we can't guarantee whether it's an EdmModel, a CsdlSemanticsModel or some other implementation.

I've checked around other users in the codebase and in WebAPI, and any time some metadata is added to the model, it's added as an "annotation". In this case this cache will also be an annotation that's bound to the model. It's not a static cache.

Have I answered your question?

/// <returns>A list of matching navigation sources, or null if no navigation source matches the name.</returns>
public List<IEdmNavigationSource> FindNavigationSources(string name)
{
if (navigationSourcesCache.TryGetValue(name, out List<IEdmNavigationSource> results))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name the simple identifier of the navigation source? not the qualified name?

How about searching the navigation source in the referenced model? with the same name but a different namespace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, is it only on Top-level model entity container?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's the simple identifier of the navigation source. And yes, it's only searching the top-level model entity container.

The reason I implemented it this way is because I wanted to retain the same behaviour as the existing implementation which performs the search as follows:

IEdmEntityContainer container = model.EntityContainer;
if (container == null)
{
    return null;
}

var result = container.Elements.OfType<IEdmNavigationSource>()
    .Where(source => string.Equals(identifier, source.Name, StringComparison.OrdinalIgnoreCase));

see: https://github.com/OData/odata.net/blob/master/src/Microsoft.OData.Core/UriParser/Resolver/ODataUriResolver.cs#L94-L102

As you can see the existing implementation only searches the top-level model's container and it only compares the source.Name, not the fully qualified name (including the container name).

If I change the behaviour of the cache to include referenced models or fully qualified identifier, then there would be an inconsistency between the cache and non-cache code paths.

On the other hand, if the expected behaviour is that the URI resolver should search referenced models and/or fully qualified name, then the current implementation is a bug. And I think that should be a separate discussion since that change of behaviour could be observable to the customer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment of the ResolveOperationImports describes the identifier parameter as follows:

The qualified name of the operation import which may or may not include the container name.

However, the existing implementation does not consider the qualified name in its search:

IEdmEntityContainer container = model.EntityContainer;
if (container == null)
{
    return Enumerable.Empty<IEdmOperationImport>();
}

return container.Elements.OfType<IEdmOperationImport>()
    .Where(source => string.Equals(identifier, source.Name, StringComparison.OrdinalIgnoreCase));

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are specifying the navigation source identifier and not searching in referenced models, is there a scenario where whatever that gets returned is a list and not just a single navigation source?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like can we have several navigation sources with the same identifier in the same model?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're searching a single container, I don't think that's possible. Maybe if we were searching in multiple containers that could be possible, but that would be arguably a new feature, extending the existing behaviour, which is beyond the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, because we are dealing with a case insensitive scenario, we can have multiple navigation sources in the same container whose names only differ by case. That's why we return a list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have navigation sources in the same container whose names only differ by case? or we can access a navigation source using different cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The edm model is case-sensitive by default, so it is possible to have navigation sources which only differ by case because they are different names as far as the IEdmModel is concerned. If the IEdmModel were case-insensitive, then this cache would not be necessary.

Since this cache is case-insensitive, for the same key we might have multiple matching entries, that's why were store the values in lists.


private void PopulateContainerElements(IEdmModel model)
{
if (model.EntityContainer is null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the model be null here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the responsibility of the caller to ensure that the model is not null here. Since this is an internal class, I placed a Debug.Assert(model != null) in the constructor.


return null;
}

private void PopulateSchemaElements(IEdmModel model)
{
foreach (IEdmSchemaElement element in model.SchemaElements)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the model be null here or the check is done by the caller?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is the responsibility of the caller. In this class I only placed a Debug.Assert(model != null) in the constructor.

That said, I checked the public methods in the public ODataUriResolver class and they're not performing any null-checks on the args.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added null argument checks to the public methods of ODataUriResolver.


if (cachedResults.Count > 1)
{
throw new ODataException(Strings.UriParserMetadata_MultipleMatchingNavigationSourcesFound(identifier));
Copy link
Contributor

@ElizabethOkerio ElizabethOkerio Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why throw an exception if cachedResults are more than one and you said this is a possibility? or what does this mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache-based code path replicates the existing behaviour of the ResolveNavigationSource method, which is to throw an exception if multiple matches are found.

Notice that this method returns a single IEdmNavigationSource, not a collection. If there are multiple matches, then it means the URI resolver doesn't know which one to use and throws an exception.


if (cachedResults != null)
{
return cachedResults;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you do the checks for multiple operations here like you doing with navigation sources?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this cache-based code path aims to replicate the existing behaviour of the ResolveOperationImports method, which is to return all the matches that were found.

Unlike the FindNavigationSource method, this method returns a collection. Keep in mind that the IEdmModel can have multiple operation overloads with the same name but different parameters. So this is handled differently from navigation sources.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you check for the params? or when two operations with the same name are returned but with different params how do you know which is which?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know exactly. That is outside the scope of the cache, and possibly resolved elsewhere. The cache was only concerned in replicating the already existing behaviour in a more efficient manner.

Here's a snippet of the existing implementation (this code path will still be used when the model is not immutable):

return container.Elements.OfType<IEdmOperationImport>()
                .Where(source => string.Equals(identifier, source.Name, StringComparison.OrdinalIgnoreCase));

It would be a bug if the cache returned something different from the non-cached version.

@@ -545,11 +579,11 @@ internal static ODataUriResolver GetUriResolver(IServiceProvider container)
}
}

private static NormalizedSchemaElementsCache GetNormalizedSchemaElementsCache(IEdmModel model)
private static NormalizedModelElementsCache GetNormalizedModelElementsCache(IEdmModel model)
{
Debug.Assert(model != null);
Copy link
Contributor

@ElizabethOkerio ElizabethOkerio Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you doing these checks here if you already did them in the constructor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could remove it. I think someone else suggested having it here just to be sure. But I don't think it makes a major difference to have it here or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed it. Added argument null checks to public methods in this class instead.


var matches = cache.FindNavigationSources(name);

Assert.Equal(2, matches.Count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do I think this should be 1 match? EntitySet for person is Persons and not people..but you are using people to find the navigation sources? Or there is no need of having that persons in the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to matches because the test is searching for "People" (and also "people"), and there are two navigation sources that match these keywords, the "People" entity set and the "peoPle" singleton.

The reason for having "Persons" in the test is to verify that navigation sources that do not match the key are not included in the result.

var container = model.AddEntityContainer("NS", "Container");
var entitySet1 = container.AddEntitySet("People", person);
var entitySet2 = container.AddSingleton("peoPle", person);
container.AddEntitySet("Persons", person);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line,,i don't think it is necessary here.

Copy link
Contributor Author

@habbes habbes Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this line to ensure that the method returns only the items that match the keyword.

To explain the rationale, let's say I had not added "Persons" to the container and only had "People" and "peoPle". Now let's say that in the implementation of cache.FindNavigationSources I had a bug that caused every navigation source in the container to be returned, as opposed to only returning those that match. Such a bug would not be caught by this test, the test would pass because everything in the container happens to match the keyword. So, to be able to catch such bugs, I added something that does not match to the container to verify that such items are not included in the result.

@pull-request-quantifier-deprecated

This PR has 264 quantified lines of changes. In general, a change size of upto 200 lines is ideal for the best PR experience!


Quantification details

Label      : Large
Size       : +235 -29
Percentile : 66.4%

Total files changed: 4

Change summary by file extension:
.cs : +235 -29

Change counts above are quantified counts, based on the PullRequestQuantifier customizations.

Why proper sizing of changes matters

Optimal pull request sizes drive a better predictable PR flow as they strike a
balance between between PR complexity and PR review overhead. PRs within the
optimal size (typical small, or medium sized PRs) mean:

  • Fast and predictable releases to production:
    • Optimal size changes are more likely to be reviewed faster with fewer
      iterations.
    • Similarity in low PR complexity drives similar review times.
  • Review quality is likely higher as complexity is lower:
    • Bugs are more likely to be detected.
    • Code inconsistencies are more likely to be detected.
  • Knowledge sharing is improved within the participants:
    • Small portions can be assimilated better.
  • Better engineering practices are exercised:
    • Solving big problems by dividing them in well contained, smaller problems.
    • Exercising separation of concerns within the code changes.

What can I do to optimize my changes

  • Use the PullRequestQuantifier to quantify your PR accurately
    • Create a context profile for your repo using the context generator
    • Exclude files that are not necessary to be reviewed or do not increase the review complexity. Example: Autogenerated code, docs, project IDE setting files, binaries, etc. Check out the Excluded section from your prquantifier.yaml context profile.
    • Understand your typical change complexity, drive towards the desired complexity by adjusting the label mapping in your prquantifier.yaml context profile.
    • Only use the labels that matter to you, see context specification to customize your prquantifier.yaml context profile.
  • Change your engineering behaviors
    • For PRs that fall outside of the desired spectrum, review the details and check if:
      • Your PR could be split in smaller, self-contained PRs instead
      • Your PR only solves one particular issue. (For example, don't refactor and code new features in the same PR).

How to interpret the change counts in git diff output

  • One line was added: +1 -0
  • One line was deleted: +0 -1
  • One line was modified: +1 -1 (git diff doesn't know about modified, it will
    interpret that line like one addition plus one deletion)
  • Change percentiles: Change characteristics (addition, deletion, modification)
    of this PR in relation to all other PRs within the repository.


Was this comment helpful? 👍  :ok_hand:  :thumbsdown: (Email)
Customize PullRequestQuantifier for this repository.

NormalizedModelElementsCache cache = GetNormalizedModelElementsCache(model);
IList<IEdmNavigationSource> cachedResults = cache.FindNavigationSources(identifier);

if (cachedResults != null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason why it's preferrable to return an empty collection rather than null from a function that has a collection return type.

Copy link
Contributor Author

@habbes habbes Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree that it's generally preferrable, return an empty collection for the cache would introduce more complexity to the code than this single null-check that's not exposed to the user. Most of the null checks against the result occur in code paths where we already have an if statement checking whether the collection count is 0, so they didn't result in that much uglier code in my opinion.

To return empty collections efficiently, I would have to create a static empty list for each element type that the cache supports:

static readonly List<IEdmSchemaType> emptySchemaTypesList = new List<IEdmSchemaType>();
static readonly List<IEdmOperation> emptyOperationsList = new List<...>;
static readonly List<IEdmTerm> emptyTermsList = ...;
static readonly List<IEdmNavigationSource> emptyNavigationSourcesList = ...;
static readonly List<IEdmOperationImport> emptyOperationImportsList = ...;

// then in the find methods:
if (navigationSources.TryGetValue(...))
{
      return results;
}

return emptyNavigationSourcesList;

The runtime overhead is not that bad since these collections are allocated only once, but it didn't feel worthwhile since the cache is an internal implementation detail used in a very limited scope.

That said, I don't hold this opinion strongly. If you still believe that I should return an empty collection despite the explanation above, I can go ahead and make the change.

Copy link
Contributor

@KenitoInc KenitoInc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@gathogojr gathogojr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@habbes habbes merged commit e00449e into OData:master Feb 28, 2023
@habbes habbes deleted the perf/cache-container-elements branch February 28, 2023 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Microsoft.OData.UriParser.ODataUriResolver!FindSchemaElements running hot on CPU
5 participants