MEDS 0.3 Release Candidate #32

mmcdermott · 2024-07-19T16:33:51Z

No description provided.

mmcdermott · 2024-07-30T02:51:41Z

For the label schema, I think we need a bit more clarity. In particular,

Is prediction_time the time at which the prediction should be made? The last timestamp at which point data is allowed to be read in (and if so, inclusive or exclusive)? The inclusivity/exclusivity point we may need to add another column to specify.
Are we comfortable with this omitting the possibility of specifying tasks that occlude some parts of an event from the input window but not others? E.g., given everything up to a visit and the data of that visit that aren't meds, predict the meds prescribed.
Do we maybe want to replace "_value" with "_label"?

mmcdermott · 2024-07-30T03:00:18Z

I assume there is no reason we don't want a TypedDict version of the split schema and will add one.

EthanSteinberg · 2024-07-30T03:10:54Z

Is prediction_time the time at which the prediction should be made? The last timestamp at which point data is allowed to be read in (and if so, inclusive or exclusive)?

Yep. The idea is that it's legal to use all features up to and including the timepoint of the prediction time (inclusive). In my experience inclusive makes things a bit easier.

The documentation for this could perhaps be improved.

The inclusivity/exclusivity point we may need to add another column to specify.

I think that's unnecessary complexity at this point.

Are we comfortable with this omitting the possibility of specifying tasks that occlude some parts of an event from the input window but not others? E.g., given everything up to a visit and the data of that visit that aren't meds, predict the meds prescribed.

I think we should avoid trying to cover those sorts of complex labels with this schema.

Do we maybe want to replace "_value" with "_label"?

I don't think that's a good idea. I think that would lead to confusion since then the word "label" would be overloaded a bit. I think we want "label" to be the pair of a value + prediction_time.

mmcdermott · 2024-07-30T03:20:52Z

I have a slight concern about omitting an explicit quantifier for inclusive or exclusive. If we don't allow people to specify it, and someone wants to use that, the next default would be to add or subtract a small amount from the prediction time, and that feels brittle to me if timestamps are binned or the granularity is reduced at all for a dataset.

I also think there are reasonable settings where inclusivity or exclusivity matters. E.g., if you are trying to predict something about a patient at a particular time without observing that value, then the natural way to express that is to say that you have an exclusive prediction time of the time at which you want the thing predicted, which enables models to leverage both the observations up to that prediction time without including the answer and the knowledge of when the target value that is being predicted is projected to happen -- this is much harder to express without a notion of exclusivity.

EthanSteinberg · 2024-07-30T13:28:25Z

I think the simplicity gains of only having one type of label outweigh the minor annoyance of people having to subtract 1 microsecond every now and then for labels that need exclusivity.

I have a slight concern about omitting an explicit quantifier for inclusive or exclusive. If we don't allow people to specify it, and someone wants to use that, the next default would be to add or subtract a small amount from the prediction time, and that feels brittle to me if timestamps are binned or the granularity is reduced at all for a dataset.

Our datasets have a specified granularity of "[us]" so brittleness shouldn't be a problem. Subtracting one microsecond is always correct.

Exclusivity does matter, but I think it's easier to just let labelers subtract the microsecond when they need to rather than force a lot more complexity throughout the entire setup.

…ed file path instructions.

mmcdermott · 2024-07-30T18:30:07Z

I suspect we'll eventually want to revisit this but for now I'm fne dropping the issue of label exclusivity. WIth the documentation changes in #35, and once tests are made to pass, I'm happy with this to go in.

…-documentation Documentation updates

EthanSteinberg added 8 commits June 20, 2024 08:27

Update schema.py

5952213

Update schema.py

ca6ab08

Update schema.py

4ca8f6a

Update schema.py

52d560f

Update schema.py

421df19

Update schema.py

28b21b0

Update schema.py

6544c3d

Update schema.py

3f7c441

Update schema.py

c922b44

mmcdermott added 2 commits July 29, 2024 23:55

Started updating README

3ed25ab

Updating to mandatory file formats.

ed9cb91

mmcdermott added 3 commits July 30, 2024 14:14

Removed controversial or unneeded terms

5985629

Updated schemas and documentation with consensus terms and deduplicat…

1da2ec0

…ed file path instructions.

Removed unneeded python object format.

e10c22d

mmcdermott added 7 commits July 30, 2024 14:40

Merge pull request #35 from Medical-Event-Data-Standard/mmd-meds-3-rc…

e0cd043

…-documentation Documentation updates

Adding missing close paren

dd4ca79

Fixed another test error to do with imports.

986b296

Standardized schema naming convention and fixed another typo.

34465fe

Maybe fixed tests

94d4ce9

Fixed a typo in the code metadata schema and the tests

d962dac

Adjust naming convention to minimize import and variable name conflicts.

ae6d269

mmcdermott marked this pull request as ready for review July 30, 2024 19:21

mmcdermott mentioned this pull request Jul 30, 2024

We need to add a finalize_data.py stage that just makes the data match the MEDS schema, much like finalize_metadata.py mmcdermott/MEDS_transforms#62

Closed

2 tasks

mmcdermott merged commit 8ba7cb1 into main Jul 30, 2024
5 checks passed

mmcdermott deleted the ethan-meds-3-rc branch July 30, 2024 20:10

mmcdermott mentioned this pull request Jul 31, 2024

Match MEDS label schema as per #72 justin13601/ACES#80

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MEDS 0.3 Release Candidate #32

MEDS 0.3 Release Candidate #32

mmcdermott commented Jul 19, 2024

mmcdermott commented Jul 30, 2024

mmcdermott commented Jul 30, 2024

EthanSteinberg commented Jul 30, 2024

mmcdermott commented Jul 30, 2024

EthanSteinberg commented Jul 30, 2024

mmcdermott commented Jul 30, 2024 •

edited

Loading

MEDS 0.3 Release Candidate #32

MEDS 0.3 Release Candidate #32

Conversation

mmcdermott commented Jul 19, 2024

mmcdermott commented Jul 30, 2024

mmcdermott commented Jul 30, 2024

EthanSteinberg commented Jul 30, 2024

mmcdermott commented Jul 30, 2024

EthanSteinberg commented Jul 30, 2024

mmcdermott commented Jul 30, 2024 • edited Loading

mmcdermott commented Jul 30, 2024 •

edited

Loading