Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: build location model properly using patient and organization as a source #83

Merged
merged 5 commits into from
Oct 24, 2024

Conversation

lawrenceadams
Copy link
Collaborator

@lawrenceadams lawrenceadams commented Oct 11, 2024

This PR refactors the existing location model to have a valid PK to allow referencing from the patient and care_site tables. Previously the location table was orphaned and had duplicate entries.

The new referencing works by using a variation of address and city as a natural key to join on from other models.


I have some ongoing thoughts/issues:

  • Unsure how best to join patient/care_site to location. I was going to use address and ZIP - but not all patients have ZIP codes. This may not matter at the moment, but want a setup that won't lead to duplicate joins if someone was to have a lot more synthea data input
    • We might want to join on more to prevent this in the future
  • I've gone for the somewhat arbitrary ordering of row numbers here which is done for neatness/organization but may not be best
  • At the moment all care_site_types are 0 and I think we can do better! Perhaps we can use some simple heuristics like presence of PCP to set this as 38004247 | Ambulatory Primary Care Clinic / Center|, but likely too much for this PR/for now!
  • I've pushed location_source_value back to being NULL - this should probably be a concatenation of the available fields

There may be a far slicker approach to this but this is the best I can see for now! Happy to hear other's thoughts

… a source

 - uses locations from patients and organizations and runs a distinct ontop of them
 - uses a mixture of address and city as a natural key to join on in downstream models to have a valid `organization_id`
@lawrenceadams lawrenceadams linked an issue Oct 11, 2024 that may be closed by this pull request
@lawrenceadams lawrenceadams changed the title feat: build location model properly using patient and organization as a source refactor: build location model properly using patient and organization as a source Oct 11, 2024
@lawrenceadams lawrenceadams marked this pull request as ready for review October 11, 2024 15:23
Copy link
Collaborator

@katy-sadowski katy-sadowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we created a hash of the address info and then joined on that? so, for person, something like:

  • in int_person make the hash and store it in a location_hash column
  • in int_location, union/dedupe all addresses, make the hash, and store it in location_source_value
  • join person to location final models on the hash

i don't want to overengineer this but i'm trying to think of a nice way to do this that'll be somewhat future-proof.

models/omop/location.sql Outdated Show resolved Hide resolved
@lawrenceadams
Copy link
Collaborator Author

Fair point - I considered this but thought we could get away without creating an extra model, but it's probably more elegant/scalable doing so

I can experiment with this! Will see if the hashing approach is neater

@katy-sadowski
Copy link
Collaborator

Thanks! I'm leaning towards creating the extra model either way because I like the idea of a unidirectional flow from stg-->int-->mart models. (In reality it might not be possible/practical to do this in all cases, but would like to give it a shot 😃 )

@lawrenceadams
Copy link
Collaborator Author

Agree! Best to keep it uniform

Copy link
Collaborator

@katy-sadowski katy-sadowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks! A couple more minor comments. (And sorry for the delay - I was traveling for work last week.)

models/intermediate/int__person.sql Outdated Show resolved Hide resolved
models/omop/location.sql Outdated Show resolved Hide resolved
macros/safe_hash.sql Outdated Show resolved Hide resolved
@lawrenceadams
Copy link
Collaborator Author

@katy-sadowski I think that's all done now! Happy to pick apart further - we're breaking fresh ground!

P.S. I hope the presentation went well!

Copy link
Collaborator

@katy-sadowski katy-sadowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks! The demo went great - lots of interest 😄

@lawrenceadams lawrenceadams merged commit 8ff8911 into main Oct 24, 2024
@lawrenceadams lawrenceadams deleted the ladams-location-fix-attempt branch October 24, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Location table is orphaned
2 participants