Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KPIS] Fix test for KPI45 sporadically failing #352

Merged
merged 6 commits into from
Nov 8, 2024

Conversation

anchit-chandran
Copy link
Contributor

@anchit-chandran anchit-chandran commented Nov 7, 2024

Signed-off-by: anchit-chandran anchit97123@gmail.com

Overview

A few things were going on here, and was a fun rabbit hole!

TL/DR : main things contributing to the random failing were:

  1. Floating point precision
  2. SQL median is weird
  3. Accidentally using random values for Patient.diagnosis_date

All have been fixed. Run the test many times as it should now be fine (importantly, no random values are being used now, so one pass = deterministic!).

Fun details on above:

1. Floating point precision

We have our Visit.hba1c field defined as a DecimalField. The actual kpi calculation method querying this outputs a python Decimal( ) object. My test was just using float( ) objects. Good article on difference: https://www.laac.dev/blog/float-vs-decimal-python/ but generally:

  • floats are good for speed & convenience, use the classic IEEE 754 binary representation, can lead to rounding precision errors
  • Decimal, somehow uses base-10 representation`, but means it can accurately represent decimal numbers without the traditional floating point errors

This was partly leading to errors as tests are comparing different types of values. Fixed by converting expected values into Decimal. The output of the calculate_kpi_45_median_hba1c still returns a float though for convenience

2. SQL median is weird

There is no standard implementation of median in Django's ORM because it's more complicated to calculate. I created a custom SQL class for it but that ended up being harder to reason with. Instead, did a simpler approach of just using ORM to retrieve a list of hba1c values, and then calculate the median in Python. Much more straightforward.

3. Accidentally using random values for Patient.diagnosis_date

Rookie error! In the tests, just forgot to set PatientFactory.diagnosis_date. Expected behavior of the PatientFactory is to generate random valid values for any not given (to make it easier to create Patient objects).

Fixed by just defining.

Code changes

  • tests updated to use Decimal() values for hba1c instead of float
  • tests updated to specify diagnosis_date, thus not relying on the PatientFactory() generating a random diagnosis_date
  • tests updated to be a little more clear for the invalid case
  • calculate_median implementation moved into Python instead of SQL, simplifying and circumventing harder-to-reason precision errors
  • calculate_kpi_45_median_hba1c method more straightforward because of using a Python way of calculating median
  • also update type annotations of returns from calculate_kpi_ methods to KPIResult

Related Issues

https://github.com/orgs/rcpch/projects/13?pane=issue&itemId=85211949&issue=rcpch%7Cnational-paediatric-diabetes-audit%7C338

Mentions

@mentions of the person or team responsible for reviewing proposed changes.

Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
@anchit-chandran anchit-chandran marked this pull request as ready for review November 7, 2024 10:04
@anchit-chandran anchit-chandran merged commit 383e607 into live Nov 8, 2024
1 check passed
@anchit-chandran anchit-chandran deleted the kpis/test_kpi_45-sporadic-fails branch November 8, 2024 10:24
@mbarton
Copy link
Member

mbarton commented Nov 8, 2024

Seen on STAGING (merged by @anchit-chandran 5 minutes and 24 seconds ago) Please check your changes!

anchit-chandran added a commit that referenced this pull request Nov 16, 2024
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
anchit-chandran added a commit that referenced this pull request Nov 18, 2024
* add indiv pt kpi types

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* update vscode ignore to just settings.json

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* implements vscode debugger

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* refactor to new individual pts kpi results

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* very basic refactor to implement new results obj

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* separate debugpy into completely separate override so we don't accidentally mix anything with prod. need slightly different env vars + server run cmds

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* add fix for csv heading pdu field no longer having model

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* fix csv fn

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* skip single pt test for now

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* unused imports

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* apply same fix from #352 for test kpi 44

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* forgot to remove old Median class, instead calc using same fix as calc 45

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* `string` is the dtype in pandas https://pandas.pydata.org/docs/user_guide/text.html

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* handle converting to right dtype with null vals

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* bp vals will exceed bounds of Int8 (signed bounds -128 to 127)

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* working csv fn

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* refactor mutex check to enable test

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* adds csv generator upload integration test

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* mock remote services

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* adds doc strings

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* refactor single pt kpi outcomes table

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* dont actually need nhs number here as for this object to have been made, requires a Patient object. so just use that Patient object input

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* rm nhs number

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* single pt table tweaks

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* update types

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* ASSERT TYPES

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* refactor into own module

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* adds test calc kpis single pt

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* rename test

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* adds docs re debugger

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

* rm breakpoint

Signed-off-by: anchit-chandran <anchit97123@gmail.com>

---------

Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants