-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Seeding] Create dummy csv generator #341
Conversation
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
This all looks good. A few small suggestions. For the seed options, I think possibly the My guess is when we ask people to test, they will want to upload spreadsheets to try it out. We should give them a choice of different ones that we pregenerate, some with errors and some without, and my guess is the parameters above will be the kind of thing they are interested in. So for example, we might generate a csv with 150 patients, 50% boys/50% girls, where 90% have T1DM, 9% have T2DM and the rest have a mix of MODY/CFRD with a range of 10-15 visits each. Of the 150, 15% might have above target HbA1c, 5% well above, the rest in range. In the visits, we could say 20% have an error of some kind, so that they could then correct the error in the csv and reupload and see that the submission is overwritten and the error is corrected. I hope that makes sense. Also I am finding that when I try and upload the spreadsheet generated i get a bunch of parse errors - these may all be related to the heading issues with extra spaces etc that @dc2007git is working on at the moment. |
definitely agreed! That's WIP: #329
Currently no way to specify errors through generator (all visittypes result in valid measures). Woud you be able to advise classic errors values for current I'll update
|
What do errors look like? I'm renaming columns to have those weird extra spaces here https://github.com/rcpch/national-paediatric-diabetes-audit/pull/341/files#diff-ae3c6e15c8db6e9b82c8e395e9afb5b7ce7c4ff9878b6567caf41a222b1b2949R248 |
The first error I got was a typo in the clean method in the The new error looks to be getting a
|
I'll take a look at that second exception, seems like it's barfing trying to return unexpected errors. Raised #345 |
After this PR you should see what field and row are causing the issue: #346 |
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
…rse_dates': 'Observation Date: Thyroid Function'`) Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
project/npda/dummy_sheets/local_generated_data/npda_seed_data-5-CDCDDHPCACDCCDCD.csv
Outdated
Show resolved
Hide resolved
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
Signed-off-by: anchit-chandran <anchit97123@gmail.com>
e986368
to
ed8fab9
Compare
I tried this out again against |
@eatyourpeas I've pulled that commit out to its own PR and added a unit test #360 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK now #360 is merged I'm able to upload a CSV file generated from this PR:
docker compose exec django python manage.py create_csv --pts 2 --visits "CDCD DHPC ACDC CDCD" --hb_target A
I get some validation errors.
They might be incorrect or rules we want to remove but let's address that in future PRs off live
as this one has got quite big.
@anchit-chandran happy to merge?
Yes I think the values for I've raised #362 which gives you back the column name and row index which makes debugging much easier |
I have hopefully fixed the datatypes clashing that we have had by defining explicitly what datatypes should be for each column in the Pandas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is now passing all tests and it is possible to upload a csv generated by the generator. I have done this (hope this is ok Anchit) but introducing so dtype constraints at the time of dataframe creation. I also discovered that the patient_generator_extended
was randomizing booleans for closed_loop
rather than the constants so that was generating an error on import. I am finding that the generator does not seem to be equally randomizing sexes but returning lots of Nones or unknowns.
Finally, none of the visit data seems to be saving but no errors are being raised.
This PR has become kinda complicated now so I am going to approve this and raise 2 new issues:
- randomization of sex in generator - hopefully a simple fix
- visit data not saving on upload
Seen on STAGING (created by @anchit-chandran and merged by @eatyourpeas 5 minutes and 39 seconds ago) Please check your changes! |
Signed-off-by: anchit-chandran anchit97123@gmail.com
Overview
Adds
create_csv
manage.py cmd to create csv of dummy pts, saving in a folder whose contents are gitignored (only takes <5secs to generate >10k rows.Docstrings have documentation, snippet:
Code changes
create_csv
fileDocumentation changes (done or required as a result of this PR)
In docstrings
Related Issues
https://github.com/orgs/rcpch/projects/13/views/1?pane=issue&itemId=85172188&issue=rcpch%7Cnational-paediatric-diabetes-audit%7C328