-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MEDS 0.3 Release Candidate #32
Conversation
For the label schema, I think we need a bit more clarity. In particular,
|
I assume there is no reason we don't want a |
Yep. The idea is that it's legal to use all features up to and including the timepoint of the prediction time (inclusive). In my experience inclusive makes things a bit easier. The documentation for this could perhaps be improved.
I think that's unnecessary complexity at this point.
I think we should avoid trying to cover those sorts of complex labels with this schema.
I don't think that's a good idea. I think that would lead to confusion since then the word "label" would be overloaded a bit. I think we want "label" to be the pair of a value + prediction_time. |
I have a slight concern about omitting an explicit quantifier for inclusive or exclusive. If we don't allow people to specify it, and someone wants to use that, the next default would be to add or subtract a small amount from the prediction time, and that feels brittle to me if timestamps are binned or the granularity is reduced at all for a dataset. I also think there are reasonable settings where inclusivity or exclusivity matters. E.g., if you are trying to predict something about a patient at a particular time without observing that value, then the natural way to express that is to say that you have an exclusive prediction time of the time at which you want the thing predicted, which enables models to leverage both the observations up to that prediction time without including the answer and the knowledge of when the target value that is being predicted is projected to happen -- this is much harder to express without a notion of exclusivity. |
I think the simplicity gains of only having one type of label outweigh the minor annoyance of people having to subtract 1 microsecond every now and then for labels that need exclusivity.
Our datasets have a specified granularity of "[us]" so brittleness shouldn't be a problem. Subtracting one microsecond is always correct. Exclusivity does matter, but I think it's easier to just let labelers subtract the microsecond when they need to rather than force a lot more complexity throughout the entire setup. |
I suspect we'll eventually want to revisit this but for now I'm fne dropping the issue of label exclusivity. WIth the documentation changes in #35, and once tests are made to pass, I'm happy with this to go in. |
…-documentation Documentation updates
No description provided.