-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sample uncertainty #1163
base: master
Are you sure you want to change the base?
Sample uncertainty #1163
Conversation
Uses as a proof-of-principle the simple yet useful case of using the key "c" to loop through available color-bys.
This commit is a proof-of-principle to resample color-by and/or geo-resolution values if available. New values for discrete traits are picked for each node independently based on the discrete probabilities supplied by the dataset. For continuous values we take a uniform sample from the bounds provided by the dataset (often 95% CI). This is achieved via the following key-board shortcuts: "s" + "c": resample currently selected color-by "s" + "r": resample currently selected geo-resolution. Using capitol letters returns to the starting values rather than resampling. There remain a number of "to-do"s before this can be merged.
Pressing `x` or `e` starts an animation where each frame resamples the selected colorby or geo-resolution, respectively. The UI implementation should be considered temporary.
Very cool idea! Quick thoughts: Claus Wilke has argued that the most intuitive way to represent uncertainty for most audiences is exactly what you're doing here. Animate across realizations. https://docs.google.com/presentation/d/1zMuBSADaxdFnosOPWJNA10DaxGEheW6gDxqEPYAuado/edit#slide=id.p40 Separately, we could think about adding opacity to the transmission lines proportional to uncertainty (I guess this would need to be some function of uncertainty of parent and child states). Just doing the animation here for Zika made it clear that some transmission lines are much more supported than others.
The provided CIs represent marginal distributions. If you draw a particular value for the one node it will change the distributions at other nodes. In this example from Zika: Marginal for these two branches is Thailand or French Polynesia. It's unclear when the transition is. But if you sample the parent as French Polynesia, the child should definitely be French Polynesia as well. If you sample independently you risk distorting inference. In this case you could easily sample the unsupported French Polynesia to Thailand to French Polynesia reconstruction. Thus the animation will be appropriate for individual transmission lines but not their collection. |
How does one trigger resampling of timings in the tree? regarding the marginal distributions: there are the probabilities at a node after summing over all states at all other nodes. using their maxima as transitions is not technically correct, but will in almost all cases correspond to the most-likely transition (within the model) and this is what we do now. guaranteed consistent would be the joint ML inference, but giving confidence for this is more tricky. Resampling the marginal distributions is indeed problematic. We could calculate the joint distribution of adjacent nodes in augur and add them as branch attributes. |
As part of thinking about ways we can express uncertainty I've been experimenting with resampling trait values. This PR allows you to resample the currently selected color-by or geo-resolution from the confidence data made available in dataset JSONs. The UI is via keyboard shortcuts as it's a experiment / proof-of-concept at this stage.
s
+c
: resample color-by traitS
+C
: reset color-by values to original statex
: animate the sampling of color-by values. Press a second time to end animation.s
+r
: resample geo-res traitS
+R
: reset geo-res values to original statee
: animate the sampling of geo-res values. Press a second time to end animation.@trvrb @rneher is sampling of each node independently from their provided confidences valid? For discrete traits it's a simple sampling from a discrete distribution, for continuous values (e.g. date) I'm sampling uniformly from the provided interval.
Zika dataset resampling the country (across internal nodes)
nCoV (global) dataset, resampling dates of internal nodes. Note that the axis is not being held constant (todo)
I'll say now that I don't think this is the best way for us to represent uncertainty (which is still unknown), but has been useful for me to interpret aspects of datasets and, if the appropriate UI can be achieved, may be useful for others.