-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: use determinisitic provider id #81
Conversation
- upstream ohdsi example can have non-determinisitic cases: use `patient_id` to make reproducible - order by state > city > zip > id (I think most logical form)
@@ -1,5 +1,5 @@ | |||
SELECT | |||
row_number() OVER (ORDER BY (SELECT null)) AS provider_id | |||
row_number() OVER (ORDER BY provider_state, provider_city, provider_zip, provider_id) AS provider_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we just use provider_id? it should be the PK from the source Synthea table right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we could ~ I was going for something that would order somewhat logically (rather than UUIDv4s), but that's again not necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh i see! just for readability/neatness. i'm fine to keep it this way then :)
I'll cleanup this branch when #83 is determined on this best approach :) |
As a result of #83 this only affects provider now - I think |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree! Let's go with this for now!
This PR cleans up code imported from upstream OHDSI/SyntheaETL SQL code, where models can produce non-deterministic output for
location
andprovider
tables.For both tables, output is now ordered by:
Perhaps this is too complicated, open to other ideas! I think this is somewhat more logical than doing:
city > state > zip > _id
, but may not be correct to US-centric norms