Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reflect current code state for $import in the doc #3894

Merged
merged 10 commits into from
Jun 4, 2024

Conversation

SergeyGaluzo
Copy link
Contributor

Several corrections/clarifications

@SergeyGaluzo SergeyGaluzo added the Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs label Jun 3, 2024
@SergeyGaluzo SergeyGaluzo requested a review from a team as a code owner June 3, 2024 19:26
@@ -3,7 +3,7 @@
There are two modes of $import supported today-

1. Initial mode is intended to load FHIR resources into an empty FHIR server. Initial mode only supports CREATE operations and, when enabled, blocks API writes to the FHIR server.
1. Incremental mode is optimized to load data into FHIR server periodically and doesn't block writes via API. It also allows you to load lastUpdated and versionId from resource Meta (if present in resource JSON).
1. Incremental mode is optimized to load data into FHIR server periodically and doesn't block writes via API. It also allows you to load lastUpdated, or both lastUpdated and versionId, from resource Meta (if present in resource JSON). There are no performance differences between incremental and initial modes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xoriant in their run identified performance difference. Lets double click offline

Copy link
Contributor Author

@SergeyGaluzo SergeyGaluzo Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my tests on hospital data with ~1B resources performed on correctly scaled FHIR server and database I saw zero performance differences. This is as expected, because slight differences in compute on FHIR server related to processing modes do not change the database t-log bottleneck, and only this bottleneck identifies the performance of import.

* Conditional references in resources are not supported for initial mode import.
* If multiple resources share the same resource ID, then only one of those resources will be imported at random and an error will be logged corresponding to the remaining resources sharing the ID.
* In initial mode, resources with conditional references will be logged as errors. In incremental mode, resources with conditional references will be loaded but references will not be resolved.
* If multiple resources share the same resource ID, in initial mode, only one of those resources will be imported at random and an error will be logged corresponding to the remaining resources sharing the ID. In incremental mode, if lastUpdated is not provided, behavior is the same as in the initial. If lastUpdated is provided, then resource versions are deduplicated by lastUpdated, and only duplicates are reported as errors, the rest is loaded.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lastUpdated is provided, then resource versions are deduplicated by lastUpdated, and only duplicates are reported as errors, the rest is loaded. --- is this true with allowed version flag ? In case there are multiple resources with same version id and different lastUpdated, then error will be reported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is "universally" true irrespective of allow negative versions flag.
  2. If there are multiple records for the same resource with different lastUpdated and the same version, then version conflict will be reported.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For statement "In incremental mode, if lastUpdated is not provided, behavior is the same as in the initial." - Initial mode import creates resources and not updates. So if lastUpdated is not provided but incremental version ids are provided, then the resource is updated.

Should we update the statement?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"If there are multiple records for the same resource with different lastUpdated and the same version, then version conflict will be reported." - Version conflict will be reported on the randomly selected resource ?

Copy link
Contributor Author

@SergeyGaluzo SergeyGaluzo Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which record is rejected depends on order of processing across batches. If records are processed in different batches, there is no way to predict which record is loaded and which is rejected. If import is very small, and only single batch is created, then record with max version will be loaded.

Copy link
Contributor Author

@SergeyGaluzo SergeyGaluzo Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For statement "In incremental mode, if lastUpdated is not provided, behavior is the same as in the initial." - Initial mode import creates resources and not updates. So if lastUpdated is not provided but incremental version ids are provided, then the resource is updated.

Should we update the statement?

Input versions are ignored if lastUpdated is not provided. This has always been incremental mode behavior. Statement in question refers only to deduplication logic, which deals with case of more than one record per resource. This statement is not targeting main difference between incremental update versus initial insert. I think current statement is correct.
I added some clarifications, please look.

@SergeyGaluzo SergeyGaluzo changed the title Reflect current code state Reflect current code state for $import in the doc Jun 4, 2024
@SergeyGaluzo SergeyGaluzo merged commit c6917f9 into main Jun 4, 2024
43 of 44 checks passed
@SergeyGaluzo SergeyGaluzo deleted the users/sergal/importdocagain branch June 4, 2024 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure Healthcare APIs Label denotes that the issue or PR is relevant to the FHIR service in the Azure Healthcare APIs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants