Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the dummy data timeout configurable #2258

Merged
merged 2 commits into from
Nov 28, 2024
Merged

Conversation

evansd
Copy link
Contributor

@evansd evansd commented Nov 28, 2024

We were deliberately being slow in adding configuration options to avoid locking ourselves in to any regrettable APIs. However this option now has a demonstrable use case, and also doesn't really lock us in to much of anything.

Docs now look like:

configure_dummy_data(population_size=10, legacy=False, timeout=60)
Configure the dummy data to be generated.

population_size
Maximum number of patients to generate.

Note that you may get fewer patients than this if the generator runs out of time – see timeout below.

legacy
Use legacy dummy data.

timeout
Maximum time in seconds to spend generating dummy data.

Related threads:
https://bennettoxford.slack.com/archives/C01D7H9LYKB/p1732791626217789
https://bennettoxford.slack.com/archives/C07MZ5WCJV6/p1732800801777969
https://bennettoxford.slack.com/archives/C07MZ5WCJV6/p1732801544406129

We're seeing cases where the generator is making progress, but slow
enough that the default timeout doesn't generate enough patients. Being
able to bump up the timeout would unblock users in this particular case,
although it's obviously not a great solution.

As part of this we use the default values from the config class to
define the keyword argument defaults, and we remove the default values
from the docstring (as they're already shown in the docs in the function
signature). These changes reduce the possibility for these values to go
out of sync.
Copy link
Contributor

@DRMacIver DRMacIver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As well as two nitpicky review comments inline, two more:

  • I wonder if it would be helpful somewhere to have a test that we actually respect the configured timeout. I think the code is correct and we do pass it through correctly, but this seems like the sort of thing that would be easy to get wrong.
  • It might be worth logging how to change the timeout in the dummy data generators. We already do this for the population size.

None of this is merge-blocking though, these are just my slightly nitpicky thoughts in the course of reading through it. Please feel free to do any or none of them.


_legacy_<br>
Use legacy dummy data.

_timeout_<br>
Maximum time in seconds to spend generating dummy data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be too pedantic to be worth noting but this isn't technically right. The timeout is the time after which the generator will stop trying to generate more dummy data. In the event you hit the timeout it will typically take longer than the timeout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is true. But I can't think of a way of writing this which makes it more correct without simultaneously making it harder to understand, and to some extend defeating the point of documenting it in the first place.

Unless we just say:

Maximum time in seconds (approximately) to spend generating dummy data.

@@ -102,7 +102,8 @@ def test_configured_population_size(legacy):
intervals=years(1).starting_on("2020-01-01"),
)

measures.configure_dummy_data(population_size=10, legacy=legacy)
measures.configure_dummy_data(population_size=99, legacy=legacy, timeout=123)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer to this doesn't really matter, but for my curiousity, why did the population size change here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that it was setting it to the default value (or what is now the default, maybe it wasn't when the test was written) and therefore this would pass even if the argument was completely ignored.

@evansd
Copy link
Contributor Author

evansd commented Nov 28, 2024

Yes, I agree with both of these, thanks. I think I might merge now though so as to unblock the user (because I need to head out in a bit) and then follow these up later. Both are worth doing though.

@evansd evansd merged commit 1552455 into main Nov 28, 2024
9 checks passed
@evansd evansd deleted the evansd/dummy-data-timeout branch November 28, 2024 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants