Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Getting started documentation to point users to faster models #730

Open
jamesmbaazam opened this issue Jul 31, 2024 · 5 comments · May be fixed by #695
Open

Change Getting started documentation to point users to faster models #730

jamesmbaazam opened this issue Jul 31, 2024 · 5 comments · May be fixed by #695
Assignees

Comments

@jamesmbaazam
Copy link
Contributor

jamesmbaazam commented Jul 31, 2024

This issue can be solved after merging the benchmarking vignette in #695.

I was curious about how the paper "Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges" measured run times of the various R packages and came across this section in the supplementary material where more details are given (page 3):

Computational speed has been assessed by each author of this study with different computer specifications (see Table B). The main function in each package/tool was taken from provided examples (if available) and wrapped in the system.time() R function to measure the execution time. The main function estimated the reproduction number for each R package/tool except for epicontacts, which estimates the serial interval. We chose the following classifications: <10 seconds = very good, 10 seconds – 5 minutes = good, >5 minutes = poor. The classification allocated to each package was based on the agreement of at least 2 out of the 3 computers. We note that such direct comparisons of the runtimes of the different models may not be fair, as the examples provided by each package which we have used to assess speed vary in terms of the dataset used, model complexity, and dimensionality of the reproduction number to estimate. Nevertheless, we assume that examples will always be relatively simple and therefore their computational speed may be a good overall indicator of speed of reproduction number estimation in general using a given package.

This has got me thinking about whether we should change our docs to use the quicker models that sacrifice accuracy as that is what users will interact with first (copy & paste to try out) but with a caveat. We can then signpost to the slower but more accurate models for real-world use cases.

I'm also making a note here to raise an issue in EpiEstim to re-assess the speed score in this table using other faster and relatively accurate models with evidence in #695.

@jamesmbaazam jamesmbaazam changed the title Change Getting started documentation to point use to faster models Change Getting started documentation to point users to faster models Jul 31, 2024
@jamesmbaazam jamesmbaazam linked a pull request Jul 31, 2024 that will close this issue
7 tasks
@seabbs
Copy link
Contributor

seabbs commented Jul 31, 2024

Do we want users to use models that are less accurate? That is the implicit trade-off of showcasing faster more approximate models as the first thing people see (as a lot of people will then just use that).

I was curious about how the paper "Real-time estimation of the epidemic reproduction number: Scoping review of the applications and challenges" measured run times of the various R packages and came across this section in the supplementary material where more details are given (page 3):

I'm not sure this really represents what a real user would do when assessing a package or more general a very credible package review so I am not super keen to make decisions that optimise for it?>

All that being said I really don't feel that strongly. I think the minimum we should do is clearly point people to the fact there are different model formulations they could use that have different properties

@jamesmbaazam
Copy link
Contributor Author

All good points. I think more generally though, new users will use tools they saw others use and may also not use ones that had a bad/unfair review. So, if people see elsewhere that our models are slow, they may not even try to use them. Moreover, I may be wrong in saying this but often, users may not take the time to try out different packages before making a choice.

@seabbs
Copy link
Contributor

seabbs commented Jul 31, 2024

if people see elsewhere that our models are slow

To be honest my view is that we need better multi-model evaluations that are across groups rather than optimising for the current status quo

@jamesmbaazam
Copy link
Contributor Author

Alright. I'll close this issue.

@seabbs
Copy link
Contributor

seabbs commented Aug 1, 2024

Do we want to reopen and instead of changing the default improving the signposting to faster model configurations?

@jamesmbaazam jamesmbaazam reopened this Aug 1, 2024
@jamesmbaazam jamesmbaazam self-assigned this Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants