Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create data generator for csv #328

Closed
anchit-chandran opened this issue Oct 29, 2024 · 1 comment
Closed

Create data generator for csv #328

anchit-chandran opened this issue Oct 29, 2024 · 1 comment
Assignees

Comments

@anchit-chandran
Copy link
Contributor

anchit-chandran commented Oct 29, 2024

Following csv values in dummy_sheet_invalid.csv differ from data model. How to map datamodel values to csv values @eatyourpeas ? Particularly null values (will be blank in resulting csv)

True / False fields

LEFT=our data model's value (thus the data generator value which outputs into csv) (enums)
RIGHT = current value of the dummy_sheet_invalid.csv (generally just true / false)

glucose_monitoring
Image

thyroid_treatment_status
Image

gluten_free_diet
Image

psychological_additional_support_status
Image

All YES_NO_UNKNOWN:

smoking_status
Does the patient smoke?

dietician_additional_appointment_offered
Was the patient offered an additional appointment with a paediatric dietitian?

verbose_name
Was the patient using (or trained to use) blood ketone testing equipment at time of visit?

nan becomes float

Pandas will cast nan columns to float. This means resulting csv columns with nans can either:

  1. Have blank values, but the existing values are floats so e.g. an enum of [1,2,99] becomes [1.0, 2.0, 99.0]
  2. We specify an int value to fill in for nan, so all rows have value
@eatyourpeas
Copy link
Member

Pandas is doing something we do not want. The numbers in the choices are the keys and values are the text passed to us by NPDA. These choices are used in the models to store the results. The keys are all integers but null is allowed, and blank in forms is allowed, as some measures are only captured once a year (eg coeliac)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants