Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add explicit tests to check Hermes can deal with arbitrarily large description terms #66

Closed
wardle opened this issue May 19, 2024 · 0 comments

Comments

@wardle
Copy link
Owner

wardle commented May 19, 2024

As per https://confluence.snomedtools.org/mag/community-consultations/snomed-international-proposal-to-increase-description-length-limit, SNOMED International is consulting on whether to increase the description length limit to 4096. In this proposal, users are reminded that:

"The RF2 Specification for SNOMED CT Descriptions states that the overall length limit for a description is 32Kb (understood to be Kilobits), equating to 4096 single byte characters."

As such, Hermes should already check that long descriptions of arbitrary length can be stored, retrieved and searched.

A quick examination of the synthetic unit tests shows that while generative testing is good at exercising these functions, the approach to generative testing starts with small strings and increases with the number of tests. A cursory reporting of string lengths actually tested with reasonable numbers of iterations shows strings are rarely generated over 50 characters. This means the current generative tests are insufficient to prove Hermes is behaving correctly with very large description lengths.

As such, the synthetic test generators should be changed to create very large strings. Even if SNOMED International does not increase the description length in the future, the implementation of store and search within Hermes uses no fixed size buffers. In order to create reasonable synthetic data, it would be reasonable to only synthesise large strings for a small proportion of generated synthetic descriptions. Fortunately, test.check, based on Haskell's QuickCheck makes this quite easy by simply using gen/frequency to change the generator based on specific frequencies.

As such, to resolve this issue, we need to do the following

  • Alter the generator for RF2 descriptions to potentially generate very large descriptions
  • Improve the synthetic tests to check that descriptions of any length are correctly stored, retrieved, indexed and found
wardle added a commit that referenced this issue May 19, 2024
@wardle wardle closed this as completed in c278f3b May 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant