You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Increasingly states are reporting 100% positive tests on a given day (eg 215 of 215 tests came back positive). This throws the model off because it assumes positive rate of tests are roughly proportional to the actual number of tests. If the state reports 100% positive tests, Rt increases too quickly because of the faulty data point.
For instance, Ohio has a handful of days when clearly total tests have not been reported correctly and positive % shoots up to 100%:
And in some cases, tests are withheld one day, only to be reported together with the next day's results:
In this case, drops in data are often followed by 2x the number of tests the following day.
In either case, having an unstable positive % confuses the model significantly so we need to figure out a solution to either:
Remove these anomalies and let the model infer the true hidden value
Correct these anomalies using some kind of algorithm
Currently @tvladeck and I have looked at Gaussian Processes and Kalman Filters as ways of detecting and perhaps correcting these issues. Other ideas are welcome too.
The text was updated successfully, but these errors were encountered:
It tends to catch ~90% of anomalies in reported case numbers. For the remaining 10% I haven't found anything better than fixing up the data manually. It happens rarely enough that's not a big issue. The advantage of Hampel filter is that it's straightforward to understand its behaviour.
Increasingly states are reporting 100% positive tests on a given day (eg 215 of 215 tests came back positive). This throws the model off because it assumes positive rate of tests are roughly proportional to the actual number of tests. If the state reports 100% positive tests, Rt increases too quickly because of the faulty data point.
For instance, Ohio has a handful of days when clearly total tests have not been reported correctly and positive % shoots up to 100%:
And in some cases, tests are withheld one day, only to be reported together with the next day's results:
In this case, drops in data are often followed by 2x the number of tests the following day.
In either case, having an unstable positive % confuses the model significantly so we need to figure out a solution to either:
Currently @tvladeck and I have looked at Gaussian Processes and Kalman Filters as ways of detecting and perhaps correcting these issues. Other ideas are welcome too.
The text was updated successfully, but these errors were encountered: