-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flexibility in estimate_delays #4
Comments
Further discussion == implementing multiple imputation for this task instead |
need to estimate time-varying delays from paired dates data, current approach is to construct a rolling window for paired date delay data, and then getting cdf over those rolling windows. This is computationally expensive, so a long term goal is to find a better way to implement this, but noting that we have something that works in the meantime. Key points to consider: the goal is to estimate delay over a continuous time period, but paired date data does not necessarily cover all of the dates in this period, ie there are gaps in the timeseries where we do not observe paired delays due to missing observation of one of the dates. This means that we necessarily have to interpolate delay distribution between some date ranges. |
have a way to filter out recent days from calculation of delays see this paper appendix A for a similar approach/justification: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05428-4#appendices in summary, because not all recent infections had been observed yet in the latest reported cases, those that would have been observed would have shorter delays than average. So if we had observed these shorter delays, and computed time varying delays following these observations, then we would erroneously underestimate delay for the most recent time period. Thus we should ignore information about delay in the most recent days and clamp delay distribution as constant at about 1 max delay range from the present, as they have done in the paper |
Currently function uses the data to estimate delays. We should have the flexibility to use the linelist for certain dates/states, but specify other dates/states for which we should use either a national average, a disease literature average, or other.
Should not make it too easy to default to using bad data.
The text was updated successfully, but these errors were encountered: