Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the "skippableUpdate" optimization during bulk import #2285

Closed
lmsurpre opened this issue Apr 24, 2021 · 8 comments
Closed

Use the "skippableUpdate" optimization during bulk import #2285

lmsurpre opened this issue Apr 24, 2021 · 8 comments
Assignees

Comments

@lmsurpre
Copy link
Member

Is your feature request related to a problem? Please describe.
In #2263 we added support for a server-side update optimization.
This feature request is to start using that optimization when we process a bulk import.
My theory is that this will be much faster for "repeated import" scenarios.

Describe the solution you'd like
We should use the server optimization by default.
Optionally, we might want to introduce a config parameter to opt out of it.

Describe alternatives you've considered

Acceptance Criteria
1.
GIVEN an existing resource in the server
WHEN we get an import request that contains a copy of that same resource (same type, id, and contents)
THEN avoid performing the update when the contents match

Additional context

prb112 added a commit that referenced this issue May 20, 2021
Signed-off-by: Paul Bastide <pbastide@us.ibm.com>
@prb112 prb112 added this to the Sprint 2021-07 milestone May 20, 2021
@prb112 prb112 self-assigned this May 20, 2021
prb112 added a commit that referenced this issue Jun 7, 2021
Use the "skippableUpdate" optimization during bulk import #2285
@d0roppe
Copy link
Collaborator

d0roppe commented Jun 22, 2021

This does not seem to be quite right. I set the fhirServer/bulkdata/core/enableSkippableUpdates to false and it did not import the data, I set it to true and it imported the data.
With set to false:

[INFO    ] Operation Type: $import
[INFO    ] Resource Type         failures              success               processed             totalRead             totalSkip             totalValidation       totalWrite            fileSize              Resource Size         
[INFO    ] Patient               200                   0                     0                     2031                  0                     0                     412                   145284                726.0                 
[INFO    ]  ---- Total: 0 ImportRate: 0.00 ----

With set to true:

[INFO    ] Operation Type: $import
[INFO    ] Resource Type         failures              success               processed             totalRead             totalSkip             totalValidation       totalWrite            fileSize              Resource Size         
[INFO    ] Patient               0                     200                   200                   2279                  0                     0                     18218                 145284                726.0                 
[INFO    ]  ---- Total: 200 ImportRate: 8.19 ----

@d0roppe
Copy link
Collaborator

d0roppe commented Jun 22, 2021

Also note I did a second import with the value set to True right after the first import with the value set to True, and this was the output:

[INFO    ] Operation Type: $import
[INFO    ] Resource Type         failures              success               processed             totalRead             totalSkip             totalValidation       totalWrite            fileSize              Resource Size         
[INFO    ] Patient               0                     200                   200                   310                   0                     0                     9071                  145284                726.0                 
[INFO    ]  ---- Total: 200 ImportRate: 17.12 ----

was wondering if the total skip should still be 0

@prb112
Copy link
Contributor

prb112 commented Jun 24, 2021

I had flipped the logic in a downstream, and fixed the logic in current azure provider

@d0roppe
Copy link
Collaborator

d0roppe commented Jun 26, 2021

Moving this back to in progress, as importing a patient that was deleted was working earlier.

[WARNING ] Failed to import 'sniff-test--BDE-check-P100100' due to error: Resource 'Patient/sniff-test--BDE-check-P100100' is deleted.
[INFO    ] collected partition data: ImportCheckPointData [importPartitionWorkitem=input201/Patient_1.ndjson, numOfProcessedResources=200, matrixWorkItem=input201/Patient_1.ndjson, numOfImportedResources=0, numOfImportFailures=200, totalReadMilliSeconds=266, totalWriteMilliSeconds=489, totalValidationMilliSeconds=0, importFileSize=145284, currentBytes=0, inFlyRateBeginMilliSeconds=1624743725228, numOfToBeImported=20, numOfParseFailures=0, importPartitionResourceType=Patient, uniqueIDForImportOperationOutcomes=input201/Patient_1.ndjson_oo_success.ndjson, partNumForOperationOutcomes=0, uploadIdForOperationOutcomes=null, dataPacksForOperationOutcomes=null, uniqueIDForImportFailureOperationOutcomes=input201/Patient_1.ndjson_oo_errors.ndjson, partNumForFailureOperationOutcomes=0, uploadIdForFailureOperationOutcomes=null, dataPacksForFailureOperationOutcomes=null]
[INFO    ] collected partition data: ImportCheckPointData [importPartitionWorkitem=input201/Patient_1.ndjson, numOfProcessedResources=200, matrixWorkItem=input201/Patient_1.ndjson, numOfImportedResources=0, numOfImportFailures=200, totalReadMilliSeconds=266, totalWriteMilliSeconds=489, totalValidationMilliSeconds=0, importFileSize=145284, currentBytes=0, inFlyRateBeginMilliSeconds=1624743725228, numOfToBeImported=0, numOfParseFailures=0, importPartitionResourceType=Patient, uniqueIDForImportOperationOutcomes=input201/Patient_1.ndjson_oo_success.ndjson, partNumForOperationOutcomes=0, uploadIdForOperationOutcomes=null, dataPacksForOperationOutcomes=null, uniqueIDForImportFailureOperationOutcomes=input201/Patient_1.ndjson_oo_errors.ndjson, partNumForFailureOperationOutcomes=0, uploadIdForFailureOperationOutcomes=null, dataPacksForFailureOperationOutcomes=null]
[INFO    ] Operation Type: $import
[INFO    ] Resource Type         failures              success               processed             totalRead             totalSkip             totalValidation       totalWrite            fileSize              Resource Size         
[INFO    ] Patient               200                   0                     0                     266                   0                     0                     489                   145284                726.0                 
[INFO    ]  ---- Total: 0 ImportRate: 0.00 ----

@prb112
Copy link
Contributor

prb112 commented Jun 26, 2021

This is a conflict with the big merge. It was working as I merged it, and the behavior was changed.

@prb112 prb112 modified the milestones: Sprint 2021-08, Sprint 2021-09 Jun 29, 2021
@d0roppe
Copy link
Collaborator

d0roppe commented Jul 7, 2021

Moving back to in progress due to issue with part of the import needing and update and other parts to be able to skip.

prb112 added a commit that referenced this issue Jul 8, 2021
- Added an Integration Test for re-import which generated an NPE
- Add test-import-skip.ndjson
- Added an Update to include the fhirpersistenceevent object with the
previous resource
- Update copy-server-config.sh

Signed-off-by: Paul Bastide <pbastide@us.ibm.com>
@prb112
Copy link
Contributor

prb112 commented Jul 8, 2021

PR #2585

@d0roppe
Copy link
Collaborator

d0roppe commented Jul 8, 2021

Finished testing this issue, now works with import of deleted resource, and when importing several resoureces, and having some of them be changed, or deleted. it works fine. Closing issue

@d0roppe d0roppe closed this as completed Jul 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants