Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dedupe ids when resolving batches of related items #9047

Merged
merged 1 commit into from
Feb 27, 2024

Conversation

molomby
Copy link
Member

@molomby molomby commented Feb 27, 2024

This reduces the amount of data sent to the DB in SQL queries and reduces the number of parameters needed.

I've seen Keystone generate queries over over 300 KB in size, where 10,000 parameters are passed with identical values, all to return a single record with a dozen columns. After this change the same query would be well under 1 KB. There should also be some marginal reduction in DB CPU load – assuming that dealing with 1 parameter is easier that 1000's, even if done so efficiently.

The DB effectively dedupes ids on it's side already (ie. select * from "Things" where id in (1,1,1,1,1,1) will only return a single record) so this code should be functionally identical. Being a data loader batch function, this fetchRelatedItems() function returns an element for each id provided, regardless of duplicates in the input.

Copy link

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 29b8b9c:

Sandbox Source
@keystone-6/sandbox Configuration

Copy link
Member

@dcousens dcousens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I might add some tests for things like this soon

@dcousens dcousens merged commit f071682 into main Feb 27, 2024
43 checks passed
@dcousens dcousens deleted the molomby/remove-duplicate-ids-from-data-loader branch February 27, 2024 22:29
@dcousens
Copy link
Member

We were de-duping the return result by using new Map a few lines down, so I expect no change in output behaviour here

@molomby
Copy link
Member Author

molomby commented Feb 28, 2024

Your half right! This change won't alter the output of the function but the map your talking about is actually used to create duplicates in the return array when needed (as well as ensure the correct order). This is a data loader batch function so it's contract insists that:

• The Array of values must be the same length as the Array of keys.
• Each index in the Array of values must correspond to the same index in the Array of keys.

Ie. if you ask for keys [1, 1, 1] you'll get back an array with three identical elements. The results returned from the DB, that are being added to the map, won't contain any duplicates regardless of this code change.

The issue here is deduping the keys. Previously this was being done but the DB engine, this PR just moves that to node.

@dcousens dcousens mentioned this pull request Mar 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants