-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: build location model properly using patient and organization as a source #83
Conversation
… a source - uses locations from patients and organizations and runs a distinct ontop of them - uses a mixture of address and city as a natural key to join on in downstream models to have a valid `organization_id`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we created a hash of the address info and then joined on that? so, for person, something like:
- in int_person make the hash and store it in a location_hash column
- in int_location, union/dedupe all addresses, make the hash, and store it in location_source_value
- join person to location final models on the hash
i don't want to overengineer this but i'm trying to think of a nice way to do this that'll be somewhat future-proof.
Fair point - I considered this but thought we could get away without creating an extra model, but it's probably more elegant/scalable doing so I can experiment with this! Will see if the hashing approach is neater |
Thanks! I'm leaning towards creating the extra model either way because I like the idea of a unidirectional flow from stg-->int-->mart models. (In reality it might not be possible/practical to do this in all cases, but would like to give it a shot 😃 ) |
Agree! Best to keep it uniform |
- used for address joining - requires `colaesce` as joins with NULLs will not work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks! A couple more minor comments. (And sorry for the delay - I was traveling for work last week.)
@katy-sadowski I think that's all done now! Happy to pick apart further - we're breaking fresh ground! P.S. I hope the presentation went well! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! Thanks! The demo went great - lots of interest 😄
This PR refactors the existing location model to have a valid PK to allow referencing from the
patient
andcare_site
tables. Previously thelocation
table was orphaned and had duplicate entries.The new referencing works by using a variation of address and city as a natural key to join on from other models.
I have some ongoing thoughts/issues:
care_site_types
are0
and I think we can do better! Perhaps we can use some simple heuristics like presence ofPCP
to set this as38004247 | Ambulatory Primary Care Clinic / Center|
, but likely too much for this PR/for now!location_source_value
back to being NULL - this should probably be a concatenation of the available fieldsThere may be a far slicker approach to this but this is the best I can see for now! Happy to hear other's thoughts