Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds resourceUuid to the multi-column index in ReferenceIndexEntity #2745

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

LZRS
Copy link
Collaborator

@LZRS LZRS commented Nov 28, 2024

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Fixes #[issue number]

Description
Adds resourceUuid to the multi-column index in ReferenceIndexEntity table to make it a covering index

Alternative(s) considered
Have you considered any alternatives? And if so, why have you chosen the approach in this PR?

Type
Choose one: (Bug fix | Feature | Documentation | Testing | Code health | Builds | Releases | Other)

Screenshots (if applicable)

Checklist

  • I have read and acknowledged the Code of conduct.
  • I have read the Contributing page.
  • I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
  • I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
  • I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
  • I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

@LZRS LZRS force-pushed the update-reference-indexes branch from e464ce9 to bafa004 Compare November 29, 2024 11:11
LZRS added 2 commits December 5, 2024 13:58
The order of results in include/revInclude seems to be no longer predictable when searching using resourceUuid index
the resourceUuids are randomly generated, also saved in the db as blob
hence ordered by byte representation of the resourceUuid
@LZRS LZRS marked this pull request as ready for review December 5, 2024 14:14
@LZRS LZRS requested a review from a team as a code owner December 5, 2024 14:14
@LZRS LZRS requested a review from ktarasenko December 5, 2024 14:14
@LZRS LZRS requested a review from jingtang10 December 17, 2024 09:14
@LZRS
Copy link
Collaborator Author

LZRS commented Dec 18, 2024

Previously, the sql query

SELECT *
FROM (SELECT rie.index_name, rie.index_value, re.serializedResource
      FROM ResourceEntity re
               JOIN ReferenceIndexEntity rie
                    ON re.resourceUuid = rie.resourceUuid
      WHERE rie.resourceType = 'Encounter'
        AND rie.index_name = 'service-provider'
        AND rie.index_value IN ('Organization/2c29c69f-c2d1-463f-a4b2-d90a5c2fd05d')
        AND re.resourceType = 'Encounter')

generated the query plan

QUERY PLAN
|--SEARCH rie USING INDEX index_ReferenceIndexEntity_resourceType_index_name_index_value (resourceType=? AND index_name=? AND index_value=?)
`--SEARCH re USING INDEX index_ResourceEntity_resourceUuid (resourceUuid=?)

Testing with a database that has 166293 resources and 137517 encounters, the above query could take

Run Time: real 6.171 user 0.472104 sys 0.761056

returning 137517 rows

The changes in this PR, would generate query plan

QUERY PLAN
|--SEARCH rie USING COVERING INDEX index_ReferenceIndexEntity_resourceType_index_name_index_value_resourceUuid (resourceType=? AND index_name=? AND index_value=?)
`--SEARCH re USING INDEX index_ResourceEntity_resourceUuid (resourceUuid=?)

using a covering index

Testing with a database that has 166293 resources and 137517 encounters, the above query takes around

Run Time: real 5.623 user 0.455778 sys 0.804132

returning 137517 rows

@MJ1998
Copy link
Collaborator

MJ1998 commented Dec 20, 2024

I thought adding a non-filter key to the multi-column index would not improve the performance. This is a surprise for me.

I have more questions around why our query is like this

SELECT *
FROM (SELECT rie.index_name, rie.index_value, re.serializedResource
      FROM ResourceEntity re
               JOIN ReferenceIndexEntity rie
                    ON re.resourceUuid = rie.resourceUuid
      WHERE rie.resourceType = 'Encounter'
        AND rie.index_name = 'service-provider'
        AND rie.index_value IN ('Organization/2c29c69f-c2d1-463f-a4b2-d90a5c2fd05d')
        AND re.resourceType = 'Encounter')

Since our resourceUuid is unique across all ResourceEntity do we really need resourceType filter ?

@LZRS
Copy link
Collaborator Author

LZRS commented Dec 20, 2024

I thought adding a non-filter key to the multi-column index would not improve the performance. This is a surprise for me.

The non-filter column resourceUuid added to the multi-column index would make the index covering, and would return resouceUuid column directly from the index, and thus avoiding the step of going back to the actual rows to get the resourceUuid which slightly improved the performance

SELECT *
FROM (SELECT rie.index_name, rie.index_value, re.serializedResource
      FROM ResourceEntity re
               JOIN ReferenceIndexEntity rie
                    ON re.resourceUuid = rie.resourceUuid
      WHERE rie.resourceType = 'Encounter'
        AND rie.index_name = 'service-provider'
        AND rie.index_value IN ('Organization/2c29c69f-c2d1-463f-a4b2-d90a5c2fd05d')
        AND re.resourceType = 'Encounter')

Since our resourceUuid is unique across all ResourceEntity do we really need resourceType filter ?

Yeah, I agree. For this type of query, it makes sense to remove the re.resourceType filter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants