Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sample uncertainty #1163

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Sample uncertainty #1163

wants to merge 3 commits into from

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Jun 10, 2020

As part of thinking about ways we can express uncertainty I've been experimenting with resampling trait values. This PR allows you to resample the currently selected color-by or geo-resolution from the confidence data made available in dataset JSONs. The UI is via keyboard shortcuts as it's a experiment / proof-of-concept at this stage.

s + c : resample color-by trait
S + C: reset color-by values to original state
x: animate the sampling of color-by values. Press a second time to end animation.
s + r : resample geo-res trait
S + R: reset geo-res values to original state
e: animate the sampling of geo-res values. Press a second time to end animation.

@trvrb @rneher is sampling of each node independently from their provided confidences valid? For discrete traits it's a simple sampling from a discrete distribution, for continuous values (e.g. date) I'm sampling uniformly from the provided interval.

animate_map
Zika dataset resampling the country (across internal nodes)

animate_tree
nCoV (global) dataset, resampling dates of internal nodes. Note that the axis is not being held constant (todo)

I'll say now that I don't think this is the best way for us to represent uncertainty (which is still unknown), but has been useful for me to interpret aspects of datasets and, if the appropriate UI can be achieved, may be useful for others.

Uses as a proof-of-principle the simple yet useful case of using the key "c" to loop through available color-bys.
This commit is a proof-of-principle to resample color-by and/or geo-resolution values if available.

New values for discrete traits are picked for each node independently based on the discrete probabilities supplied by the dataset. For continuous values we take a uniform sample from the bounds provided by the dataset (often 95% CI).

This is achieved via the following key-board shortcuts:
"s" + "c": resample currently selected color-by
"s" + "r": resample currently selected geo-resolution.
Using capitol letters returns to the starting values rather than resampling.

There remain a number of "to-do"s before this can be merged.
Pressing `x` or `e` starts an animation where each frame resamples the selected colorby or geo-resolution, respectively.

The UI implementation should be considered temporary.
@jameshadfield jameshadfield requested review from trvrb and rneher June 10, 2020 03:41
@jameshadfield jameshadfield temporarily deployed to auspice-sample-uncertai-obw0yn June 10, 2020 03:41 Inactive
@trvrb
Copy link
Member

trvrb commented Jun 10, 2020

Very cool idea! Quick thoughts:

Claus Wilke has argued that the most intuitive way to represent uncertainty for most audiences is exactly what you're doing here. Animate across realizations. https://docs.google.com/presentation/d/1zMuBSADaxdFnosOPWJNA10DaxGEheW6gDxqEPYAuado/edit#slide=id.p40

Separately, we could think about adding opacity to the transmission lines proportional to uncertainty (I guess this would need to be some function of uncertainty of parent and child states). Just doing the animation here for Zika made it clear that some transmission lines are much more supported than others.

is sampling of each node independently from their provided confidences valid?

The provided CIs represent marginal distributions. If you draw a particular value for the one node it will change the distributions at other nodes. In this example from Zika:

zika

Marginal for these two branches is Thailand or French Polynesia. It's unclear when the transition is. But if you sample the parent as French Polynesia, the child should definitely be French Polynesia as well. If you sample independently you risk distorting inference. In this case you could easily sample the unsupported French Polynesia to Thailand to French Polynesia reconstruction.

Thus the animation will be appropriate for individual transmission lines but not their collection.

@rneher
Copy link
Member

rneher commented Jun 10, 2020

How does one trigger resampling of timings in the tree?

regarding the marginal distributions: there are the probabilities at a node after summing over all states at all other nodes. using their maxima as transitions is not technically correct, but will in almost all cases correspond to the most-likely transition (within the model) and this is what we do now. guaranteed consistent would be the joint ML inference, but giving confidence for this is more tricky.

Resampling the marginal distributions is indeed problematic. We could calculate the joint distribution of adjacent nodes in augur and add them as branch attributes.

@trvrb trvrb added the proposal Proposals that warrant further discussion label Jan 22, 2021
@jameshadfield jameshadfield added experiment PRs which may never be merged and removed proposal Proposals that warrant further discussion labels Jun 13, 2023
@jameshadfield jameshadfield marked this pull request as draft June 13, 2023 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experiment PRs which may never be merged
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

3 participants