Make exported resource IDs consistent #55

bashir2 · 2020-11-06T19:00:51Z

I noticed that in the generated Parquet files, the resource IDs include that base url of the original OpenMRS server, e.g.,
id = http://localhost:9021/openmrs/ws/fhir2/R3/Person/bee471c4-7e08-4a31-b9d8-a0c0bd2ab103
while in the exported resources to BigQuery they are not, e.g,
id = bee471c4-7e08-4a31-b9d8-a0c0bd2ab103
This is important when we are joining resources on their cross references. The reason for the latter is here where we explicitly use getIdPart() while in the former case it comes from IdType.getValue().

We need to make these consistent. This will probably be fixed once PR #30 is submitted but I am making a note here to make sure we check and then close this.

The text was updated successfully, but these errors were encountered:

bashir2 · 2020-11-06T19:23:31Z

Actually, on a second thought, it seems having the full url in the id is wrong according to the FHIR documentation. @ibacher and/or @pmanko can you please confirm this? IOW, the exported Parquet files are incorrect and we should only include getIdPart() as the id.

The problem is that the call that converts an IdType to string is inside Bunsen here which eventually ends up calling getValue(), so I am not sure if my understanding is correct.

ibacher · 2020-11-06T20:12:07Z

@bashir2 Yeah. So objects being passed through different FHIR-enabled systems aren't expected to maintain the same id or at least that's not guaranteed. The "correct" way to handle this in FHIR would be to:

Create the resource itself
Create a Provenance resource that provides a link between that resource and where the resource was extracted from. It might looks something like this:

{
    "resourceType": "Provenance",
    "id": "d3c0bf40-956e-4450-a956-3aba87369eb7",
    "target": [{ "reference": "Person/bee471c4-7e08-4a31-b9d8-a0c0bd2ab103" }],
    "recorded": "2020-11-06T15:08:24-400",
    "agent": [{ "who": { "display": "OpenMRS Analytics Engine" } }],
    "entity": [{
        "role": "source",
        "what": { "reference": "http://localhost:9021/openmrs/ws/fhir2/R3/Person/bee471c4-7e08-4a31-b9d8-a0c0bd2ab103" }
    }]
}

bashir2 · 2020-11-06T21:45:45Z

Thanks @ibacher for the Provenance note. Yes we can use that but I am afraid that in a data-warehouse scenario, it will make things more complicated (e.g., matching patient id in an Observation.subject needs an extra Provenance join).

When we set up the GCP FHIR store for our use case, we set enableUpdateCreate to avoid this problem. I feel that for the local data-warehouse/Parquet output use case, we should try to do something similar and preserve original IDs, both for the above reason and in general to make debugging/comparing with source data easier. WDYT?

ibacher · 2020-11-06T23:51:24Z

Oh yeah! I'm totally in favour of preserving ids for the analytics engine! It makes everything simpler and cleaner. The only case in which it might become necessary to do something like this is if we're combining data from multiple OpenMRS instances and need a way to segregate the data by source instance.

bashir2 added bug Something isn't working P1:must As issue that definitely needs to be implemented in near future. labels Nov 6, 2020

bashir2 self-assigned this Nov 6, 2020

bashir2 mentioned this issue Feb 8, 2021

Change Post To PUT When Uploading to a Generic Fhir Store #109

Merged

6 tasks

bashir2 added this to the AMPATH Deployment milestone Feb 10, 2021

bashir2 mentioned this issue Aug 13, 2021

Forked Bunsen and reorganized directories. #198

Merged

6 tasks

bashir2 mentioned this issue Oct 29, 2021

fixed ID conversion to keep IdPart only #229

Merged

6 tasks

bashir2 closed this as completed in 3db04b8 Nov 2, 2021

bashir2 mentioned this issue Jun 7, 2022

BigQuery Engine and Spark lib restructuring #269

Merged

1 task

bashir2 mentioned this issue Sep 30, 2022

Fixed the ID conversion issue for R4 #355

Merged

7 tasks

chandrashekar-s mentioned this issue Mar 27, 2024

Keep the resource IDs consistent when the objects are converted from HAPI->Avro->HAPI. #1003

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make exported resource IDs consistent #55

Make exported resource IDs consistent #55

bashir2 commented Nov 6, 2020

bashir2 commented Nov 6, 2020 •

edited

Loading

ibacher commented Nov 6, 2020

bashir2 commented Nov 6, 2020

ibacher commented Nov 6, 2020

Make exported resource IDs consistent #55

Make exported resource IDs consistent #55

Comments

bashir2 commented Nov 6, 2020

bashir2 commented Nov 6, 2020 • edited Loading

ibacher commented Nov 6, 2020

bashir2 commented Nov 6, 2020

ibacher commented Nov 6, 2020

bashir2 commented Nov 6, 2020 •

edited

Loading