Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Collapse subject resource mappings before building sql query #1579

Merged
merged 9 commits into from
Dec 12, 2024

Conversation

elsand
Copy link
Member

@elsand elsand commented Dec 10, 2024

Description

This implements a few optimizations for search authorization, which particularly improves performance for users with access to a large number of parties.

  • SubjectsByParties gets collapsed and merged into ResourcesByParties. This lets us avoid doing subselects in the OR-clauses added to the search SQL
  • The entire SubjectResources-table is cached, which lets us do the above without an additional database hit.
  • When constructing the SQL, parties with identical resource lists are grouped, further collapsing multiple OR clauses into single ones with two ANY checks. This vastly reduces the number of clauses for large number of reportees, which in turn reduces the number of (identical) query parameters being sent over the network.
  • Minor improvements to data structures (mostly replacing List with HashSet)

Related Issue(s)

  • N/A

Verification

  • Your code builds clean without any errors or warnings
  • Manual testing done (required)
  • Relevant automated test added

Copy link
Contributor

coderabbitai bot commented Dec 10, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

This pull request introduces significant changes to the authorization logic within the dialog port application. Key modifications include the restructuring of the PrefilterAuthorizedDialogs method in DbSetExtensions, which now groups resources by parties using a HashSetEqualityComparer. The DialogSearchAuthorizationResult class has been simplified by changing the type of ResourcesByParties and removing SubjectsByParties. Additionally, the AltinnAuthorizationClient class has been updated to enhance resource handling and streamline methods. Overall, these changes focus on improving the efficiency and clarity of resource authorization.

Changes

File Change Summary
src/Digdir.Domain.Dialogporten.Application/Common/Extensions/DbSetExtensions.cs - Updated PrefilterAuthorizedDialogs method to group resources by parties using HashSetEqualityComparer.
- Removed previous logic handling subjects by parties.
- Added HashSetEqualityComparer<T> class.
src/Digdir.Domain.Dialogporten.Application/Externals/AltinnAuthorization/DialogSearchAuthorizationResult.cs - Removed SubjectsByParties property.
- Changed ResourcesByParties from Dictionary<string, List<string>> to Dictionary<string, HashSet<string>>.
- Updated HasNoAuthorizations property accordingly.
src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AltinnAuthorizationClient.cs - Added dependencies for _subjectResourcesCache and _dialogDbContext.
- Updated GetAuthorizedResourcesForSearch and HasListAuthorizationForDialog methods.
- Added GetAllSubjectResources method.
src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/LocalDevelopmentAltinnAuthorization.cs - Updated GetAuthorizedResourcesForSearch to use HashSet for allResources.
- Removed retrieval of allRoles, simplifying the logic.
src/Digdir.Domain.Dialogporten.Infrastructure/InfrastructureExtensions.cs - Added new cache configuration for SubjectResource with a duration of 20 minutes.
src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AuthorizationHelper.cs - Introduced AuthorizationHelper class with CollapseSubjectResources method for processing authorization results.
tests/Digdir.Domain.Dialogporten.Application.Unit.Tests/DbSetExtensionsTests.cs - Added unit tests for DbSetExtensions, specifically for SQL generation logic in PrefilterAuthorizedDialogs.
tests/Digdir.Domain.Dialogporten.Infrastructure.Unit.Tests/AuthorizationHelperTests.cs - Introduced unit tests for AuthorizationHelper, validating the functionality of CollapseSubjectResources.

Possibly related PRs

Suggested reviewers

  • MagnusSandgren
  • oskogstad

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Experiment)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@elsand elsand marked this pull request as ready for review December 11, 2024 08:33
@elsand elsand requested a review from a team as a code owner December 11, 2024 08:33
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (1)
src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AltinnAuthorizationClient.cs (1)

179-227: Optimize Loop Efficiency in CollapseSubjectResources

The CollapseSubjectResources method iterates over parties and authorized roles. Consider optimizing the loops to reduce computational complexity, especially if the number of parties or roles is large.

For example, you might parallelize the loop or optimize data structures for faster access if performance becomes an issue.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between f9c2c9d and 01b0436.

📒 Files selected for processing (5)
  • src/Digdir.Domain.Dialogporten.Application/Common/Extensions/DbSetExtensions.cs (1 hunks)
  • src/Digdir.Domain.Dialogporten.Application/Externals/AltinnAuthorization/DialogSearchAuthorizationResult.cs (1 hunks)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AltinnAuthorizationClient.cs (6 hunks)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/LocalDevelopmentAltinnAuthorization.cs (1 hunks)
  • src/Digdir.Domain.Dialogporten.Infrastructure/InfrastructureExtensions.cs (2 hunks)
🔇 Additional comments (8)
src/Digdir.Domain.Dialogporten.Application/Common/Extensions/DbSetExtensions.cs (1)

23-30: Verify Correctness of Grouping Logic in PrefilterAuthorizedDialogs

The grouping operation in PrefilterAuthorizedDialogs relies on HashSetEqualityComparer<T>. Ensure that the equality comparer correctly compares hash sets to avoid incorrect grouping of parties and resources.

Please confirm that HashSetEqualityComparer<T> provides consistent and accurate comparisons for grouping, as any issues here could lead to improperly constructed SQL queries.

src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AltinnAuthorizationClient.cs (1)

Line range hint 149-175: Ensure Correct Integration of CollapseSubjectResources Method

The method PerformDialogSearchAuthorization now calls CollapseSubjectResources. Verify that this integration correctly consolidates subject resources and that all authorized resources are accounted for.

Please ensure that the CollapseSubjectResources method accurately updates dialogSearchAuthorizationResult and that there are no regressions in authorization logic.

src/Digdir.Domain.Dialogporten.Application/Externals/AltinnAuthorization/DialogSearchAuthorizationResult.cs (2)

7-7: Update ResourcesByParties to Use HashSet<string>

Changing ResourcesByParties to use HashSet<string> ensures uniqueness of resources and improves lookup performance.


12-12: Verify Logic of HasNoAuthorizations Property

With the removal of SubjectsByParties, ensure that the HasNoAuthorizations property still correctly reflects authorization status based on ResourcesByParties and DialogIds.

Please confirm that this change does not introduce any unintended side effects in authorization checks.

src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/LocalDevelopmentAltinnAuthorization.cs (2)

43-43: Use of HashSet for Resource Collection

Converting allResources to a HashSet<string> ensures resource uniqueness and may improve performance when adding to ResourcesByParties.


47-47: Consistent Use of HashSet<string> in ResourcesByParties

The update to initialize ResourcesByParties with HashSet<string> aligns with changes elsewhere in the codebase and ensures consistent data types.

src/Digdir.Domain.Dialogporten.Infrastructure/InfrastructureExtensions.cs (2)

19-19: LGTM!

The new import is correctly placed and necessary for the SubjectResource cache configuration.


164-168: Verify cache configuration consistency and memory impact

Let's verify if similar cache configurations exist elsewhere in the codebase and assess the potential memory impact.

✅ Verification successful

Based on the search results, I can now make an informed assessment about the cache configuration:

  1. The SubjectResource entity is a relatively small entity with just basic fields (Id, Subject, Resource, UpdatedAt).
  2. The cache is used in AltinnAuthorizationClient to store the entire list of subject resources, which is queried frequently for authorization checks.
  3. The data is updated through a sync process that merges changes from the resource registry.
  4. The table has a unique index on (Resource, Subject) columns, suggesting a controlled growth pattern.

Let's generate the final response:

Cache duration of 20 minutes is appropriate for SubjectResource

The 20-minute cache duration is well-balanced considering:

  • The data is synced periodically from an external source (resource registry)
  • The entity structure is lightweight
  • The cache is used for frequent authorization checks
  • The table has controlled growth through unique constraints
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for similar cache configurations and estimate memory impact

# Check for other FusionCache configurations
echo "Searching for other cache configurations..."
rg -A 5 "ConfigureFusionCache.*SubjectResource" 

# Check the size of SubjectResource table/collection
echo "Checking SubjectResource usage..."
rg -A 5 "SubjectResource" --type cs

Length of output: 67226

@elsand
Copy link
Member Author

elsand commented Dec 11, 2024

e2e tests work as expected, but as this introduces quite a bit of logic it would be appropiate with some more unit/integration tests here. I'm looking into it.

Edit: refactored and added test for CollapseSubjectResources and DbSetExtensions

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
tests/Digdir.Domain.Dialogporten.Application.Unit.Tests/DbSetExtensionsTests.cs (1)

68-71: Optimize Whitespace Removal in Test Helper Method

The RemoveWhitespace method could be optimized for better performance by using a StringBuilder instead of concatenating strings, which can be inefficient in loops.

Apply this diff to improve the method:

private static string RemoveWhitespace(string input)
{
-   return string.Concat(input.Where(c => !char.IsWhiteSpace(c)));
+   var sb = new StringBuilder(input.Length);
+   foreach (var c in input)
+   {
+       if (!char.IsWhiteSpace(c))
+       {
+           sb.Append(c);
+       }
+   }
+   return sb.ToString();
}
tests/Digdir.Domain.Dialogporten.Infrastructure.Unit.Tests/AuthorizationHelperTests.cs (4)

10-50: Consider enhancing test coverage and maintainability.

While the test data setup is comprehensive, consider the following improvements:

  1. Make the test name more specific to describe the exact scenario being tested (e.g., CollapseSubjectResources_WithMultipleRolesAndParties_ShouldCollapseToUniqueResources).
  2. Add more test methods to cover edge cases:
    • Empty resource lists
    • Null inputs
    • No matching resources
    • All resources filtered out by constraints
  3. Consider using test data builders or constants for better maintainability.

Example test data builder approach:

private static class TestData
{
    public static class Roles
    {
        public const string Role1 = "role1";
        public const string Role2 = "role2";
        public const string Role3 = "role3";
    }

    public static class Resources
    {
        public const string Resource1 = "resource1";
        public const string Resource2 = "resource2";
        // ...
    }

    public static AuthorizedPartiesResult CreateAuthorizedParties() =>
        new()
        {
            AuthorizedParties = new List<AuthorizedParty>
            {
                new() { Party = "party1", AuthorizedRoles = new[] { Roles.Role1, Roles.Role2 } },
                // ...
            }
        };
}

57-64: Consider adding performance validation.

Given that this PR aims to optimize search authorization performance, consider:

  1. Adding performance benchmarks or assertions
  2. Extracting the act part into a reusable method for multiple test cases

Example implementation:

private static async Task<TimeSpan> MeasureExecutionTime(Func<Task> action)
{
    var sw = System.Diagnostics.Stopwatch.StartNew();
    await action();
    sw.Stop();
    return sw.Elapsed;
}

[Fact]
public async Task CollapseSubjectResources_ShouldCompleteWithinTimeLimit()
{
    // Arrange
    var maxDuration = TimeSpan.FromMilliseconds(100);
    
    // Act
    var duration = await MeasureExecutionTime(() =>
        AuthorizationHelper.CollapseSubjectResources(
            // ... parameters ...
        ));
    
    // Assert
    Assert.True(duration <= maxDuration, 
        $"Execution took {duration.TotalMilliseconds}ms, " +
        $"which exceeds the limit of {maxDuration.TotalMilliseconds}ms");
}

65-77: Enhance assertion coverage and readability.

The current assertions could be improved by:

  1. Adding negative case validation (e.g., party3 should not be present)
  2. Using more fluent assertions for better readability
  3. Ensuring order-independent collection comparisons

Example improvements:

// Assert
var expected = new Dictionary<string, HashSet<string>>
{
    ["party1"] = new() { "resource1", "resource2", "resource4" },
    ["party2"] = new() { "resource2", "resource4" }
};

Assert.Equal(expected.Keys, dialogSearchAuthorizationResult.ResourcesByParties.Keys);
Assert.All(expected, kvp =>
{
    Assert.True(dialogSearchAuthorizationResult.ResourcesByParties.ContainsKey(kvp.Key));
    Assert.Equal(kvp.Value, dialogSearchAuthorizationResult.ResourcesByParties[kvp.Key]);
});

// Verify party3 is not present (negative case)
Assert.DoesNotContain("party3", dialogSearchAuthorizationResult.ResourcesByParties.Keys);

1-78: Add more test scenarios for comprehensive coverage.

Given the significant changes to authorization logic mentioned in the PR objectives, consider adding these test scenarios:

  1. Large-scale tests:

    • Test with a high number of parties (100+)
    • Test with many resources per party
    • Validate performance improvements
  2. Edge cases:

    • Empty or null inputs
    • Duplicate resources across roles
    • Invalid party/role combinations
  3. Integration scenarios:

    • End-to-end flow with actual database calls
    • Caching behavior
    • SQL query generation validation

Would you like assistance in generating these additional test scenarios?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 01b0436 and d5e418c.

📒 Files selected for processing (5)
  • src/Digdir.Domain.Dialogporten.Application/Common/Extensions/DbSetExtensions.cs (2 hunks)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AltinnAuthorizationClient.cs (6 hunks)
  • src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AuthorizationHelper.cs (1 hunks)
  • tests/Digdir.Domain.Dialogporten.Application.Unit.Tests/DbSetExtensionsTests.cs (1 hunks)
  • tests/Digdir.Domain.Dialogporten.Infrastructure.Unit.Tests/AuthorizationHelperTests.cs (1 hunks)
🔇 Additional comments (5)
src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AuthorizationHelper.cs (1)

23-24: ⚠️ Potential issue

Filter Subject Resources at the Database Level

The method getAllSubjectResources retrieves all SubjectResource entries and filters them in memory. This could lead to performance issues with large datasets.

Consider modifying getAllSubjectResources to accept parameters for filtering, so that the filters can be applied directly at the database level, retrieving only the necessary records.

Apply this diff to fix the issue:

public static async Task CollapseSubjectResources(
    DialogSearchAuthorizationResult dialogSearchAuthorizationResult,
    AuthorizedPartiesResult authorizedParties,
    List<string> constraintResources,
-   Func<CancellationToken, Task<List<SubjectResource>>> getAllSubjectResources,
+   Func<HashSet<string>, List<string>, CancellationToken, Task<List<SubjectResource>>> getFilteredSubjectResources,
    CancellationToken cancellationToken)
{
    var authorizedPartiesWithRoles = authorizedParties.AuthorizedParties
        .Where(p => p.AuthorizedRoles.Count != 0)
        .ToList();

    var uniqueSubjects = authorizedPartiesWithRoles
        .SelectMany(p => p.AuthorizedRoles)
        .ToHashSet();

-   var subjectResources = (await getAllSubjectResources(cancellationToken))
-       .Where(x => uniqueSubjects.Contains(x.Subject) && (constraintResources.Count == 0 || constraintResources.Contains(x.Resource)))
-       .ToList();
+   var subjectResources = await getFilteredSubjectResources(uniqueSubjects, constraintResources, cancellationToken);

    // ... rest of the code ...
}
src/Digdir.Domain.Dialogporten.Infrastructure/Altinn/Authorization/AltinnAuthorizationClient.cs (1)

184-187: ⚠️ Potential issue

Filter Subject Resources at the Database Level

The GetAllSubjectResources method retrieves all SubjectResource entries and filters them in memory within CollapseSubjectResources. This could lead to performance issues with large datasets.

Consider modifying GetAllSubjectResources to accept parameters for filtering, so that filters can be applied directly in the database query, retrieving only the necessary records.

Apply this diff to fix the issue:

- private async Task<List<SubjectResource>> GetAllSubjectResources(CancellationToken cancellationToken) =>
-     await _subjectResourcesCache.GetOrSetAsync(nameof(SubjectResource), async ct
-             => await _dialogDbContext.SubjectResources.ToListAsync(cancellationToken: ct),
-         token: cancellationToken);
+ private async Task<List<SubjectResource>> GetFilteredSubjectResources(
+     HashSet<string> subjects,
+     List<string> constraintResources,
+     CancellationToken cancellationToken) =>
+     await _subjectResourcesCache.GetOrSetAsync(nameof(SubjectResource), async ct
+         => await _dialogDbContext.SubjectResources
+             .Where(x => subjects.Contains(x.Subject) &&
+                 (constraintResources.Count == 0 || constraintResources.Contains(x.Resource)))
+             .ToListAsync(cancellationToken: ct),
+         token: cancellationToken);

And update the call to CollapseSubjectResources:

await AuthorizationHelper.CollapseSubjectResources(
    dialogSearchAuthorizationResult,
    authorizedParties,
    request.ConstraintServiceResources,
-   GetAllSubjectResources,
+   GetFilteredSubjectResources,
    cancellationToken);
src/Digdir.Domain.Dialogporten.Application/Common/Extensions/DbSetExtensions.cs (2)

30-40: ⚠️ Potential issue

Ensure Parameter Indices Match in SQL Construction

Using parameters.Count and parameters.Count + 1 within the query string before adding new parameters to parameters may lead to index mismatches if parameters is modified elsewhere.

Consider capturing the current index before appending to ensure consistency.

Apply this diff to fix the issue:

foreach (var (resources, parties) in groupedResult)
{
+   var index = parameters.Count;
    sb.AppendLine(CultureInfo.InvariantCulture, $"""
         OR (
-            "{nameof(DialogEntity.Party)}" = ANY(@p{parameters.Count}) 
-            AND "{nameof(DialogEntity.ServiceResource)}" = ANY(@p{parameters.Count + 1})
+            "{nameof(DialogEntity.Party)}" = ANY(@p{index}) 
+            AND "{nameof(DialogEntity.ServiceResource)}" = ANY(@p{index + 1})
         )
         """);
    parameters.Add(parties);
    parameters.Add(resources);
}

61-65: ⚠️ Potential issue

Ensure Consistent Hash Codes in HashSetEqualityComparer<T>

The GetHashCode method may produce inconsistent results due to the unordered nature of HashSet<T>. This inconsistency can lead to incorrect grouping behavior in dictionaries or hash-based collections.

To ensure consistent hash codes for equal sets, consider ordering the elements before computing the hash code.

Apply this diff to fix the issue:

public int GetHashCode(HashSet<T> obj)
{
    ArgumentNullException.ThrowIfNull(obj);
    unchecked
    {
-       return obj.Aggregate(0, (hash, item) => hash ^ (item?.GetHashCode() ?? 0));
+       int hash = 17;
+       foreach (var item in obj.OrderBy(e => e))
+       {
+           hash = hash * 31 + (item?.GetHashCode() ?? 0);
+       }
+       return hash;
    }
}
tests/Digdir.Domain.Dialogporten.Infrastructure.Unit.Tests/AuthorizationHelperTests.cs (1)

1-9: LGTM! Clean test structure and appropriate imports.

The test class follows good practices with proper namespace organization and relevant imports.

@elsand elsand merged commit b39c376 into main Dec 12, 2024
25 checks passed
@elsand elsand deleted the fix/high-reportee-count-performance-improvement branch December 12, 2024 14:43
arealmaas pushed a commit that referenced this pull request Dec 13, 2024
🤖 I have created a release *beep* *boop*
---


##
[1.41.3](v1.41.2...v1.41.3)
(2024-12-13)


### Bug Fixes

* **azure:** adjust SKU and storage for staging
([#1601](#1601))
([3fb9f95](3fb9f95))
* Collapse subject resource mappings before building sql query
([#1579](#1579))
([b39c376](b39c376))
* **webapi:** Explicit null on non-nullable lists no longer causes 500
INTERNAL SERVER ERROR
([#1602](#1602))
([2e8b3e6](2e8b3e6))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants