Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model needs to correct for anomalies in testing reporting #8

Open
k-sys opened this issue Jul 3, 2020 · 1 comment
Open

Model needs to correct for anomalies in testing reporting #8

k-sys opened this issue Jul 3, 2020 · 1 comment

Comments

@k-sys
Copy link

k-sys commented Jul 3, 2020

Increasingly states are reporting 100% positive tests on a given day (eg 215 of 215 tests came back positive). This throws the model off because it assumes positive rate of tests are roughly proportional to the actual number of tests. If the state reports 100% positive tests, Rt increases too quickly because of the faulty data point.

For instance, Ohio has a handful of days when clearly total tests have not been reported correctly and positive % shoots up to 100%:

image

And in some cases, tests are withheld one day, only to be reported together with the next day's results:

image

In this case, drops in data are often followed by 2x the number of tests the following day.

In either case, having an unstable positive % confuses the model significantly so we need to figure out a solution to either:

  • Remove these anomalies and let the model infer the true hidden value
  • Correct these anomalies using some kind of algorithm

Currently @tvladeck and I have looked at Gaussian Processes and Kalman Filters as ways of detecting and perhaps correcting these issues. Other ideas are welcome too.

@gkossakowski
Copy link

I ran into the same issue with 0 followed by 2x cases when processing data from ECDC. After checking out both Guassian Processes and Kalman Filters, I settled on using Hampel filter: https://nbviewer.jupyter.org/github/gkossakowski/covid-19/blob/master/Realtime%20Rt%20mcmc.ipynb#Hampel-filter-for-all-countries

It tends to catch ~90% of anomalies in reported case numbers. For the remaining 10% I haven't found anything better than fixing up the data manually. It happens rarely enough that's not a big issue. The advantage of Hampel filter is that it's straightforward to understand its behaviour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants