Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flexibility in estimate_delays #4

Open
smwindecker opened this issue Oct 4, 2023 · 3 comments
Open

flexibility in estimate_delays #4

smwindecker opened this issue Oct 4, 2023 · 3 comments
Labels
time-varying delay dev features to implement relating to time-varying delay (convolution) mass function

Comments

@smwindecker
Copy link
Contributor

Currently function uses the data to estimate delays. We should have the flexibility to use the linelist for certain dates/states, but specify other dates/states for which we should use either a national average, a disease literature average, or other.

Should not make it too easy to default to using bad data.

@smwindecker
Copy link
Contributor Author

Further discussion == implementing multiple imputation for this task instead

@AugustHao AugustHao added the time-varying delay dev features to implement relating to time-varying delay (convolution) mass function label Oct 26, 2023
smwindecker pushed a commit that referenced this issue Nov 16, 2023
@AugustHao
Copy link
Contributor

AugustHao commented Jan 10, 2024

need to estimate time-varying delays from paired dates data, current approach is to construct a rolling window for paired date delay data, and then getting cdf over those rolling windows. This is computationally expensive, so a long term goal is to find a better way to implement this, but noting that we have something that works in the meantime.

Key points to consider:

the goal is to estimate delay over a continuous time period, but paired date data does not necessarily cover all of the dates in this period, ie there are gaps in the timeseries where we do not observe paired delays due to missing observation of one of the dates. This means that we necessarily have to interpolate delay distribution between some date ranges.
if we can define a parametric form of the delay distribution, with the distribution parameters as time varying variables, we can learn them from data using a modelling approach. But this relies on very strong assumptions about the shape of delay distributions, which is undesirable.
there may be a way to mix parametric and non parametric densities in an informative way?

@AugustHao
Copy link
Contributor

have a way to filter out recent days from calculation of delays

see this paper appendix A for a similar approach/justification: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05428-4#appendices

in summary, because not all recent infections had been observed yet in the latest reported cases, those that would have been observed would have shorter delays than average. So if we had observed these shorter delays, and computed time varying delays following these observations, then we would erroneously underestimate delay for the most recent time period. Thus we should ignore information about delay in the most recent days and clamp delay distribution as constant at about 1 max delay range from the present, as they have done in the paper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
time-varying delay dev features to implement relating to time-varying delay (convolution) mass function
Projects
None yet
Development

No branches or pull requests

2 participants