Static Measurements #12

mmcdermott · 2024-02-13T16:04:26Z

I recommend we move static measurements as a separate measurements list within patients, rather than relying on them within events.

This would make the schema look more like it did originally, like this:

This static_measurements field would reflect variables recorded at a per-patient level in the data without a timestamp. This makes it easier to do any temporal operations on the data, better reflects the conceptual division of data in the dataset, and it is trivial to transform the data to put static measurements into an event if that is preferred by a modeler.

The text was updated successfully, but these errors were encountered:

rvandewater · 2024-02-15T08:53:04Z

I find this a good suggestion as it allows users to have it "both ways".

tompollard · 2024-02-15T16:57:04Z

It would be good to document the [edit: other] options that we've considered here. I think these include:

1. Events with a null timestamp are considered "static events".

Allows unified structure for all events, regardless of whether they are static or dynamic.
Requires filtering of events to identify static measurements.
Not especially clear for users.

measurement = pa.struct([
    ("code", pa.string()),
    ("numeric_value", pa.float32()),
    ("text_value", pa.string()),
    ("datetime_value", pa.timestamp("us")),
])

event = pa.struct([
    ("time", pa.timestamp("us")),
    ("measurements", pa.list_(measurement)),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("events", pa.list_(event)),
])

2. Add `is_static` flag to the event schema:

Allows unified structure for all events, regardless of whether they are static or dynamic.
Requires filtering of events to identify static measurements.
Typically there will only be a few static measurements, so lot of redundancy.

measurement = pa.struct([
    ("code", pa.string()),
    ("numeric_value", pa.float32()),
    ("text_value", pa.string()),
    ("datetime_value", pa.timestamp("us")),
])

event = pa.struct([
    ("time", pa.timestamp("us")),
    ("is_static", pa.bool_()), 
    ("measurements", pa.list_(measurement)),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("events", pa.list_(event)),
])

3. Add `metadata` field to the patient schema that supports key-value pairs:

Similar to the static_measurements approach in the original post
?

metadata_value = pa.struct([
    ("text_value", pa.string()),
    ("numeric_value", pa.float32()),
    ("datetime_value", pa.timestamp("us")),
])

metadata = pa.map_(
    pa.string(),
    metadata_value
)

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("metadata", metadata),
    ("events", pa.list_(event)),
])

4. Define the kind of `static_measurements` we support in the data structure

Simple for the user to understand
Inflexible

e.g. if static_measurements just means demographics, then:

demographics = pa.struct([
    ("gender", pa.string()),
    ("race", pa.string()),
    ("birth_date", pa.date32()),
])

patient = pa.schema([
    ("patient_id", pa.int64()),
    ("demographics", demographics),
    ("events", pa.list_(event)),
])

mmcdermott · 2024-02-15T17:01:48Z

What about the approach in the former screenshot; just have static_measurements or some othe name just be a list of measurements, not a separate typed struct?

tompollard · 2024-02-15T17:08:25Z

Sorry, the options that I listed were intended to be "other options".

mmcdermott · 2024-02-15T17:11:02Z

Ahh, makes more sense, sounds good.

…

On Thu, Feb 15, 2024, 12:08 PM Tom Pollard ***@***.***> wrote: Sorry, the options that I listed were intended to be "other options". — Reply to this email directly, view it on GitHub <#12 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADS5XZB7U62L7WQKT34UUTYTY6JLAVCNFSM6AAAAABDGYUD52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBWGYYTMMRYGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

tompollard · 2024-02-15T17:16:31Z

I kind of like option 3 (the metadata option), though I think at some point there was a metadata field on the measurements schema? If this is still there, it would get confusing.

mmcdermott · 2024-02-15T17:19:07Z

For simplicity I think the prior approach (just have static_measurements) makes the most sense -- static data generally is also a set of codes and values, it just lacks timestamps, so this reflects that without introducing more schema bloat. We also do have metadata within the measurements that can be defined on a per-dataset basis (or at least that is my understanding) so I don't think we want to go that route for static data too.

tompollard · 2024-02-15T17:21:09Z

Ok, vote cast on Slack!

mmcdermott mentioned this issue Feb 15, 2024

Re-add static measurements #13

Merged

mmcdermott closed this as completed in #13 Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static Measurements #12

Static Measurements #12

mmcdermott commented Feb 13, 2024 •

edited

Loading

rvandewater commented Feb 15, 2024

tompollard commented Feb 15, 2024 •

edited

Loading

mmcdermott commented Feb 15, 2024

tompollard commented Feb 15, 2024

mmcdermott commented Feb 15, 2024 via email

tompollard commented Feb 15, 2024

mmcdermott commented Feb 15, 2024

tompollard commented Feb 15, 2024

Static Measurements #12

Static Measurements #12

Comments

mmcdermott commented Feb 13, 2024 • edited Loading

rvandewater commented Feb 15, 2024

tompollard commented Feb 15, 2024 • edited Loading

1. Events with a null timestamp are considered "static events".

2. Add is_static flag to the event schema:

3. Add metadata field to the patient schema that supports key-value pairs:

4. Define the kind of static_measurements we support in the data structure

mmcdermott commented Feb 15, 2024

tompollard commented Feb 15, 2024

mmcdermott commented Feb 15, 2024 via email

tompollard commented Feb 15, 2024

mmcdermott commented Feb 15, 2024

tompollard commented Feb 15, 2024

mmcdermott commented Feb 13, 2024 •

edited

Loading

tompollard commented Feb 15, 2024 •

edited

Loading

2. Add `is_static` flag to the event schema:

3. Add `metadata` field to the patient schema that supports key-value pairs:

4. Define the kind of `static_measurements` we support in the data structure