Use the "skippableUpdate" optimization during bulk import #2285

lmsurpre · 2021-04-24T21:37:11Z

Is your feature request related to a problem? Please describe.
In #2263 we added support for a server-side update optimization.
This feature request is to start using that optimization when we process a bulk import.
My theory is that this will be much faster for "repeated import" scenarios.

Describe the solution you'd like
We should use the server optimization by default.
Optionally, we might want to introduce a config parameter to opt out of it.

Describe alternatives you've considered

Acceptance Criteria
1.
GIVEN an existing resource in the server
WHEN we get an import request that contains a copy of that same resource (same type, id, and contents)
THEN avoid performing the update when the contents match

Additional context

Signed-off-by: Paul Bastide <pbastide@us.ibm.com>

Use the "skippableUpdate" optimization during bulk import #2285

d0roppe · 2021-06-22T18:47:50Z

This does not seem to be quite right. I set the fhirServer/bulkdata/core/enableSkippableUpdates to false and it did not import the data, I set it to true and it imported the data.
With set to false:

[INFO    ] Operation Type: $import
[INFO    ] Resource Type         failures              success               processed             totalRead             totalSkip             totalValidation       totalWrite            fileSize              Resource Size         
[INFO    ] Patient               200                   0                     0                     2031                  0                     0                     412                   145284                726.0                 
[INFO    ]  ---- Total: 0 ImportRate: 0.00 ----

With set to true:

[INFO    ] Operation Type: $import
[INFO    ] Resource Type         failures              success               processed             totalRead             totalSkip             totalValidation       totalWrite            fileSize              Resource Size         
[INFO    ] Patient               0                     200                   200                   2279                  0                     0                     18218                 145284                726.0                 
[INFO    ]  ---- Total: 200 ImportRate: 8.19 ----

d0roppe · 2021-06-22T19:55:58Z

Also note I did a second import with the value set to True right after the first import with the value set to True, and this was the output:

[INFO    ] Operation Type: $import
[INFO    ] Resource Type         failures              success               processed             totalRead             totalSkip             totalValidation       totalWrite            fileSize              Resource Size         
[INFO    ] Patient               0                     200                   200                   310                   0                     0                     9071                  145284                726.0                 
[INFO    ]  ---- Total: 200 ImportRate: 17.12 ----

was wondering if the total skip should still be 0

prb112 · 2021-06-24T12:06:15Z

I had flipped the logic in a downstream, and fixed the logic in current azure provider

d0roppe · 2021-06-26T22:44:47Z

Moving this back to in progress, as importing a patient that was deleted was working earlier.

[WARNING ] Failed to import 'sniff-test--BDE-check-P100100' due to error: Resource 'Patient/sniff-test--BDE-check-P100100' is deleted.
[INFO    ] collected partition data: ImportCheckPointData [importPartitionWorkitem=input201/Patient_1.ndjson, numOfProcessedResources=200, matrixWorkItem=input201/Patient_1.ndjson, numOfImportedResources=0, numOfImportFailures=200, totalReadMilliSeconds=266, totalWriteMilliSeconds=489, totalValidationMilliSeconds=0, importFileSize=145284, currentBytes=0, inFlyRateBeginMilliSeconds=1624743725228, numOfToBeImported=20, numOfParseFailures=0, importPartitionResourceType=Patient, uniqueIDForImportOperationOutcomes=input201/Patient_1.ndjson_oo_success.ndjson, partNumForOperationOutcomes=0, uploadIdForOperationOutcomes=null, dataPacksForOperationOutcomes=null, uniqueIDForImportFailureOperationOutcomes=input201/Patient_1.ndjson_oo_errors.ndjson, partNumForFailureOperationOutcomes=0, uploadIdForFailureOperationOutcomes=null, dataPacksForFailureOperationOutcomes=null]
[INFO    ] collected partition data: ImportCheckPointData [importPartitionWorkitem=input201/Patient_1.ndjson, numOfProcessedResources=200, matrixWorkItem=input201/Patient_1.ndjson, numOfImportedResources=0, numOfImportFailures=200, totalReadMilliSeconds=266, totalWriteMilliSeconds=489, totalValidationMilliSeconds=0, importFileSize=145284, currentBytes=0, inFlyRateBeginMilliSeconds=1624743725228, numOfToBeImported=0, numOfParseFailures=0, importPartitionResourceType=Patient, uniqueIDForImportOperationOutcomes=input201/Patient_1.ndjson_oo_success.ndjson, partNumForOperationOutcomes=0, uploadIdForOperationOutcomes=null, dataPacksForOperationOutcomes=null, uniqueIDForImportFailureOperationOutcomes=input201/Patient_1.ndjson_oo_errors.ndjson, partNumForFailureOperationOutcomes=0, uploadIdForFailureOperationOutcomes=null, dataPacksForFailureOperationOutcomes=null]
[INFO    ] Operation Type: $import
[INFO    ] Resource Type         failures              success               processed             totalRead             totalSkip             totalValidation       totalWrite            fileSize              Resource Size         
[INFO    ] Patient               200                   0                     0                     266                   0                     0                     489                   145284                726.0                 
[INFO    ]  ---- Total: 0 ImportRate: 0.00 ----

prb112 · 2021-06-26T23:33:50Z

This is a conflict with the big merge. It was working as I merged it, and the behavior was changed.

d0roppe · 2021-07-07T17:34:44Z

Moving back to in progress due to issue with part of the import needing and update and other parts to be able to skip.

- Added an Integration Test for re-import which generated an NPE - Add test-import-skip.ndjson - Added an Update to include the fhirpersistenceevent object with the previous resource - Update copy-server-config.sh Signed-off-by: Paul Bastide <pbastide@us.ibm.com>

prb112 · 2021-07-08T12:51:36Z

PR #2585

d0roppe · 2021-07-08T17:34:05Z

Finished testing this issue, now works with import of deleted resource, and when importing several resoureces, and having some of them be changed, or deleted. it works fine. Closing issue

lmsurpre added the bulk-data label Apr 24, 2021

prb112 added a commit that referenced this issue May 20, 2021

Use the "skippableUpdate" optimization during bulk import #2285

0a4992c

Signed-off-by: Paul Bastide <pbastide@us.ibm.com>

prb112 added this to the Sprint 2021-07 milestone May 20, 2021

prb112 self-assigned this May 20, 2021

lmsurpre mentioned this issue Jun 2, 2021

Use the "skippableUpdate" optimization during bulk import #2285 #2398

Merged

prb112 added a commit that referenced this issue Jun 7, 2021

Merge pull request #2398 from IBM/issue-2285

acc1028

Use the "skippableUpdate" optimization during bulk import #2285

lmsurpre modified the milestones: Sprint 2021-07, Sprint 2021-08 Jun 7, 2021

prb112 modified the milestones: Sprint 2021-08, Sprint 2021-09 Jun 29, 2021

d0roppe closed this as completed Jul 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the "skippableUpdate" optimization during bulk import #2285

Use the "skippableUpdate" optimization during bulk import #2285

lmsurpre commented Apr 24, 2021

d0roppe commented Jun 22, 2021

d0roppe commented Jun 22, 2021

prb112 commented Jun 24, 2021

d0roppe commented Jun 26, 2021

prb112 commented Jun 26, 2021

d0roppe commented Jul 7, 2021

prb112 commented Jul 8, 2021

d0roppe commented Jul 8, 2021

Use the "skippableUpdate" optimization during bulk import #2285

Use the "skippableUpdate" optimization during bulk import #2285

Comments

lmsurpre commented Apr 24, 2021

d0roppe commented Jun 22, 2021

d0roppe commented Jun 22, 2021

prb112 commented Jun 24, 2021

d0roppe commented Jun 26, 2021

prb112 commented Jun 26, 2021

d0roppe commented Jul 7, 2021

prb112 commented Jul 8, 2021

d0roppe commented Jul 8, 2021