Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static Measurements #12

Closed
mmcdermott opened this issue Feb 13, 2024 · 8 comments · Fixed by #13
Closed

Static Measurements #12

mmcdermott opened this issue Feb 13, 2024 · 8 comments · Fixed by #13

Comments

@mmcdermott
Copy link
Contributor

mmcdermott commented Feb 13, 2024

I recommend we move static measurements as a separate measurements list within patients, rather than relying on them within events.

This would make the schema look more like it did originally, like this:
image

This static_measurements field would reflect variables recorded at a per-patient level in the data without a timestamp. This makes it easier to do any temporal operations on the data, better reflects the conceptual division of data in the dataset, and it is trivial to transform the data to put static measurements into an event if that is preferred by a modeler.

@rvandewater
Copy link
Contributor

I find this a good suggestion as it allows users to have it "both ways".

@tompollard
Copy link
Contributor

tompollard commented Feb 15, 2024

It would be good to document the [edit: other] options that we've considered here. I think these include:

1. Events with a null timestamp are considered "static events".

  • Allows unified structure for all events, regardless of whether they are static or dynamic.
  • Requires filtering of events to identify static measurements.
  • Not especially clear for users.
measurement = pa.struct([
    ("code", pa.string()),
    ("numeric_value", pa.float32()),
    ("text_value", pa.string()),
    ("datetime_value", pa.timestamp("us")),
])

event = pa.struct([
    ("time", pa.timestamp("us")),
    ("measurements", pa.list_(measurement)),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("events", pa.list_(event)),
])

2. Add is_static flag to the event schema:

  • Allows unified structure for all events, regardless of whether they are static or dynamic.
  • Requires filtering of events to identify static measurements.
  • Typically there will only be a few static measurements, so lot of redundancy.
measurement = pa.struct([
    ("code", pa.string()),
    ("numeric_value", pa.float32()),
    ("text_value", pa.string()),
    ("datetime_value", pa.timestamp("us")),
])

event = pa.struct([
    ("time", pa.timestamp("us")),
    ("is_static", pa.bool_()), 
    ("measurements", pa.list_(measurement)),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("events", pa.list_(event)),
])

3. Add metadata field to the patient schema that supports key-value pairs:

  • Similar to the static_measurements approach in the original post
  • ?
metadata_value = pa.struct([
    ("text_value", pa.string()),
    ("numeric_value", pa.float32()),
    ("datetime_value", pa.timestamp("us")),
])

metadata = pa.map_(
    pa.string(),
    metadata_value
)

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("metadata", metadata),
    ("events", pa.list_(event)),
])

4. Define the kind of static_measurements we support in the data structure

  • Simple for the user to understand
  • Inflexible

e.g. if static_measurements just means demographics, then:

demographics = pa.struct([
    ("gender", pa.string()),
    ("race", pa.string()),
    ("birth_date", pa.date32()),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("demographics", demographics),
    ("events", pa.list_(event)),
])

@mmcdermott
Copy link
Contributor Author

What about the approach in the former screenshot; just have static_measurements or some othe name just be a list of measurements, not a separate typed struct?

@tompollard
Copy link
Contributor

Sorry, the options that I listed were intended to be "other options".

@mmcdermott
Copy link
Contributor Author

mmcdermott commented Feb 15, 2024 via email

@tompollard
Copy link
Contributor

I kind of like option 3 (the metadata option), though I think at some point there was a metadata field on the measurements schema? If this is still there, it would get confusing.

@mmcdermott
Copy link
Contributor Author

For simplicity I think the prior approach (just have static_measurements) makes the most sense -- static data generally is also a set of codes and values, it just lacks timestamps, so this reflects that without introducing more schema bloat. We also do have metadata within the measurements that can be defined on a per-dataset basis (or at least that is my understanding) so I don't think we want to go that route for static data too.

@tompollard
Copy link
Contributor

Ok, vote cast on Slack!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants