Hyperparameters for season in the Hill model (in R demo only) #117
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have updated the R demo to incorporate hyperparameters into the Hill model that allow the final uptake and the half-maximal date to vary by season. Because this is just in the R demo, which is for brainstorming and will not be part of the final codebase, it doesn't require a detailed review. This is just to demonstrate few points, using the flu data.
As a reminder, here is what the entirety of the available flu data looks like:
![image](https://private-user-images.githubusercontent.com/67654180/410226130-7e72d57a-8b86-488b-901c-015c901c27ed.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MTE4MjEsIm5iZiI6MTczODkxMTUyMSwicGF0aCI6Ii82NzY1NDE4MC80MTAyMjYxMzAtN2U3MmQ1N2EtOGI4Ni00ODhiLTkwMWMtMDE1YzkwMWMyN2VkLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDA2NTg0MVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTk1OWExMjIxYjhjMzY5NWZhZWM2NDExNDBkNzc1MWUzMjhlYzdiOGNjNzc3ZDkxZTFiMDNmMWFiYThlNmRhMGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.O72QDZKcQif-_zegN2H7wJDV2LXJHpVtcWcehVKOHdw)
If a Hill model is fit to these data, and then a forecast for the 2024/25 season is made from late October 2024 onward, we get:
![image](https://private-user-images.githubusercontent.com/67654180/410227366-70fc645c-c718-4265-83d4-e5f65b58fc74.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MTE4MjEsIm5iZiI6MTczODkxMTUyMSwicGF0aCI6Ii82NzY1NDE4MC80MTAyMjczNjYtNzBmYzY0NWMtYzcxOC00MjY1LTgzZDQtZTVmNjViNThmYzc0LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDA2NTg0MVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQyNTYyYjczMjY4Y2Q5ODc0ZDMzYjU0Mzk0NWYzZjUyNTYyNTBhYTIyOWUyYzEzM2IzMDlhNjgzMWMxOGNiMjYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.64pCvvAEJQzJZOYRVY4MAemrvYEIn22Uf4-y2Dk0BWM)
Notice how 1) the forecast does not align with the data on the forecast date, and 2) the 95% credible interval starts off wide without widening much into the future.
Now we do the same thing, but including hyperparameters to allow the exact shape of the Hill curve to vary by season:
![image](https://private-user-images.githubusercontent.com/67654180/410227886-e75094f7-f199-4b66-804f-73964dc3e33b.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MTE4MjEsIm5iZiI6MTczODkxMTUyMSwicGF0aCI6Ii82NzY1NDE4MC80MTAyMjc4ODYtZTc1MDk0ZjctZjE5OS00YjY2LTgwNGYtNzM5NjRkYzNlMzNiLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDA2NTg0MVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTNhOTNhZjBiOGY1YzMyMTE5NzM2Yjc1Y2IxYzA1ZTBjYTNiNzFlNmVhOTdkM2Q3NTQyOTk0NWZhMzM0YTk3OWUmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.t6dY9S72k1ZafBiD4QeQ6xtoEBFhj7Ils-Sdhwtz6Zg)
Both problems are nearly solved! The forecast aligns closely (though not perfectly) with the data on the forecast date, and the credible interval starts off much thinner (though still encompassing some values lower than observed cumulative uptake on the forecast date).
We can probably ignore the last vestiges of these two problems, because we are moving toward a refactor in which we no longer trust the observed data to be synonymous with the true latent uptake. In that case, some uncertainty about what the true uptake is, even on dates that have already been observed, is totally acceptable.
I think this is a nice validation of @swo's intuition for how prediction uncertainty should be greater at the tail of a sigmoid curve as opposed to the middle, and a nice validation of our collective intuition that grouping the data on season is important.