Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make exported resource IDs consistent #55

Closed
bashir2 opened this issue Nov 6, 2020 · 4 comments
Closed

Make exported resource IDs consistent #55

bashir2 opened this issue Nov 6, 2020 · 4 comments
Assignees
Labels
bug Something isn't working P1:must As issue that definitely needs to be implemented in near future.

Comments

@bashir2
Copy link
Collaborator

bashir2 commented Nov 6, 2020

I noticed that in the generated Parquet files, the resource IDs include that base url of the original OpenMRS server, e.g.,
id = http://localhost:9021/openmrs/ws/fhir2/R3/Person/bee471c4-7e08-4a31-b9d8-a0c0bd2ab103
while in the exported resources to BigQuery they are not, e.g,
id = bee471c4-7e08-4a31-b9d8-a0c0bd2ab103
This is important when we are joining resources on their cross references. The reason for the latter is here where we explicitly use getIdPart() while in the former case it comes from IdType.getValue().

We need to make these consistent. This will probably be fixed once PR #30 is submitted but I am making a note here to make sure we check and then close this.

@bashir2 bashir2 added bug Something isn't working P1:must As issue that definitely needs to be implemented in near future. labels Nov 6, 2020
@bashir2 bashir2 self-assigned this Nov 6, 2020
@bashir2
Copy link
Collaborator Author

bashir2 commented Nov 6, 2020

Actually, on a second thought, it seems having the full url in the id is wrong according to the FHIR documentation. @ibacher and/or @pmanko can you please confirm this? IOW, the exported Parquet files are incorrect and we should only include getIdPart() as the id.

The problem is that the call that converts an IdType to string is inside Bunsen here which eventually ends up calling getValue(), so I am not sure if my understanding is correct.

@ibacher
Copy link
Collaborator

ibacher commented Nov 6, 2020

@bashir2 Yeah. So objects being passed through different FHIR-enabled systems aren't expected to maintain the same id or at least that's not guaranteed. The "correct" way to handle this in FHIR would be to:

  1. Create the resource itself
  2. Create a Provenance resource that provides a link between that resource and where the resource was extracted from. It might looks something like this:
{
    "resourceType": "Provenance",
    "id": "d3c0bf40-956e-4450-a956-3aba87369eb7",
    "target": [{ "reference": "Person/bee471c4-7e08-4a31-b9d8-a0c0bd2ab103" }],
    "recorded": "2020-11-06T15:08:24-400",
    "agent": [{ "who": { "display": "OpenMRS Analytics Engine" } }],
    "entity": [{
        "role": "source",
        "what": { "reference": "http://localhost:9021/openmrs/ws/fhir2/R3/Person/bee471c4-7e08-4a31-b9d8-a0c0bd2ab103" }
    }]
}

@bashir2
Copy link
Collaborator Author

bashir2 commented Nov 6, 2020

Thanks @ibacher for the Provenance note. Yes we can use that but I am afraid that in a data-warehouse scenario, it will make things more complicated (e.g., matching patient id in an Observation.subject needs an extra Provenance join).

When we set up the GCP FHIR store for our use case, we set enableUpdateCreate to avoid this problem. I feel that for the local data-warehouse/Parquet output use case, we should try to do something similar and preserve original IDs, both for the above reason and in general to make debugging/comparing with source data easier. WDYT?

@ibacher
Copy link
Collaborator

ibacher commented Nov 6, 2020

Oh yeah! I'm totally in favour of preserving ids for the analytics engine! It makes everything simpler and cleaner. The only case in which it might become necessary to do something like this is if we're combining data from multiple OpenMRS instances and need a way to segregate the data by source instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working P1:must As issue that definitely needs to be implemented in near future.
Projects
None yet
Development

No branches or pull requests

2 participants