Dedupe ids when resolving batches of related items #9047

molomby · 2024-02-27T05:55:48Z

This reduces the amount of data sent to the DB in SQL queries and reduces the number of parameters needed.

I've seen Keystone generate queries over over 300 KB in size, where 10,000 parameters are passed with identical values, all to return a single record with a dozen columns. After this change the same query would be well under 1 KB. There should also be some marginal reduction in DB CPU load – assuming that dealing with 1 parameter is easier that 1000's, even if done so efficiently.

The DB effectively dedupes ids on it's side already (ie. select * from "Things" where id in (1,1,1,1,1,1) will only return a single record) so this code should be functionally identical. Being a data loader batch function, this fetchRelatedItems() function returns an element for each id provided, regardless of duplicates in the input.

codesandbox-ci · 2024-02-27T05:59:03Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 29b8b9c:

Sandbox	Source
@keystone-6/sandbox	Configuration

dcousens

LGTM, I might add some tests for things like this soon

dcousens · 2024-02-27T22:30:44Z

We were de-duping the return result by using new Map a few lines down, so I expect no change in output behaviour here

molomby · 2024-02-28T00:23:25Z

Your half right! This change won't alter the output of the function but the map your talking about is actually used to create duplicates in the return array when needed (as well as ensure the correct order). This is a data loader batch function so it's contract insists that:

• The Array of values must be the same length as the Array of keys.
• Each index in the Array of values must correspond to the same index in the Array of keys.

Ie. if you ask for keys [1, 1, 1] you'll get back an array with three identical elements. The results returned from the DB, that are being added to the map, won't contain any duplicates regardless of this code change.

The issue here is deduping the keys. Previously this was being done but the DB engine, this PR just moves that to node.

When resolving batches of related items, only retrieve unique Ids

29b8b9c

dcousens approved these changes Feb 27, 2024

View reviewed changes

dcousens merged commit f071682 into main Feb 27, 2024
43 checks passed

dcousens deleted the molomby/remove-duplicate-ids-from-data-loader branch February 27, 2024 22:29

dcousens mentioned this pull request Mar 28, 2024

Version Packages #9071

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dedupe ids when resolving batches of related items #9047

Dedupe ids when resolving batches of related items #9047

molomby commented Feb 27, 2024

codesandbox-ci bot commented Feb 27, 2024

dcousens left a comment •

edited

Loading

dcousens commented Feb 27, 2024

molomby commented Feb 28, 2024

Dedupe ids when resolving batches of related items #9047

Dedupe ids when resolving batches of related items #9047

Conversation

molomby commented Feb 27, 2024

codesandbox-ci bot commented Feb 27, 2024

dcousens left a comment • edited Loading

Choose a reason for hiding this comment

dcousens commented Feb 27, 2024

molomby commented Feb 28, 2024

dcousens left a comment •

edited

Loading