Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Report devel #208

Merged
merged 11 commits into from
Apr 19, 2020
Merged

Report devel #208

merged 11 commits into from
Apr 19, 2020

Conversation

jkamins7
Copy link
Contributor

No description provided.

@jkamins7 jkamins7 merged commit 9a21517 into dataseed Apr 19, 2020
jkamins7 added a commit that referenced this pull request Apr 21, 2020
* tidied up code for build_hospdeath_geoid_fixedIFR_par(); made output file faster and smaller

* Add python argument to make_makefile.R

* Remove file autogenerated by importation

* Try to print a fake secret to see if env is working

* Update ci.yml

* Update ci.yml

* to all caps

* Update README.md

* Updated to use optimize branch of covidImportation

* Getting the version of covidImportation that the script works with

* Run if not there instead of the opposite

* Print error if config not specified

* Need parallel back in here for detectCores

* Fix missing comma

* adding option to switch hosp script version

* Bad merge

* Modify report generation targets in Makefile

* Remove unused include_in_report column from test1 geodata

* Add integration test for report generation

* making hosp_run backwards compatible, adding toggle in config to run age adjusted or non age adjusted hosp function

* clarifying which time to death distribution should be used for each hosp script. Minor fixes to ensure backwards compatibility

* removing hard coded p_death_rate

* Fixed a few issues intrudocued in the updated state report template.

* Update README.md

* Create seeding.csv

* Fixed testing config

* minor text changes

* Removing comments to help refresh repo

* Trying to specify a specific commit for integration testing

* Remove packages customer in favor of local_install for containerization

* Change if clause for dockerhub deployment

* Fix typo in dockerfile

* Add awscli in dockerfile

* Add report::continue_on_error config. Use notebooks directory instead of report.

And runs importation for report generation test.

* add check models R script

* Changed state usps as a paramter and added better error handling.

* added time from ventilation to discharge and number of people currently using ventilators to hospitalization

* Add state_usps to report

* state_usps and changing assert

* export vent data

* add vent figures

* comment out vents figs

* Improve hospitalization performance

* Removes unnecessary casting and copying of data frames and simplifies data frame merge in a critical section
* New age-adjusted hospitalization is 50% faster with 35% less memory pressure
* Old legacy hospitalization is 68% faster with ~50% less memory pressure

* create filterupdate for other data

* fixing NAs in curr hosp, ICU, vent

* Add CLI args for check_model_reports.R to config it like our other scripts

* Let us control the number of simulations from the CLI for importation

* bug fix

* so bad at coding

* incidVent exported in reportloaddata

* export vents, breaks in table

* Remove this invalid config checking pattern from our scripts

* changing default vent_dur if not specified in old configs

* fixing leading 0s on read in issue

* new age specific parameter estimates with DC, PR

* Remove extra comma on check_report CLI

* update figs to compare data & model

* fig updates for CA report

* Modified base load functions to take advantage of a parallel back end using foreach.

* Add ability for filtering helpers to take arguments.

* Fixed bug of missing ... with scenarios filter.

* Update make_makefile.R

* changing hosp load function to include ventilation outcomes

* adding table function with ventilator outcomes, fixking peak ICU occupancy from table

* updating namespace for new vent function

* fixing threshold axis ordering

* adding function for scenario specific table with ventilations

* function to format current ICU, ventilator data for heatmap figures

* Updated covidcommon to have lognormal distribution.  Added a new function to hospitalization package that (slowly) includes both updated config and uses random distributions for delay/duration.  Needs to be optimized before use, but at least this way people can look at it.

* import only one death rate in data/model comparison

* update methods and static chunk text

* change limits for model-obs comparison

* uncomment current vent usage

* include only necc geoids in check script

* Documentation first draft for R script runners

* reporting rate of 20% for modeled "cases"

* Fixed some italics

* plot mean and .2*infections

* Create create_pipeline_docs.R

* Add warning for special R comments

* Update .gitignore

* Some formatting cleanup

* Update hosp_run.R

* separate random distr tests & test mn/var

* pre or post process aren't fxns test

* Fixed issues around new function arguments (e.g. num_files) breaking existing code

* export vent_curr where appropriate

* mean incident deaths figure

* change back silly caps name

* Updated all scripts and packages (except importation to support feather

* Fixing seed problems in all hospitalization scripts

* fix bug when incl_geoids=NULL

* Change documentation from knitr to Doxygen

* Add integration test_hosp_age_adjust and rename test1 to test_simple

* Updated loading code

* Altered feather to parquet and updated the skeleton

* Update requirements.txt to include the dvc[s3] dependencies

* Updated makefile stuff, and also everything should now support feather

* Updated docker requirements

* Write parameters and npi from SEIR model

* Add model_parameters to integration test

* Clean up some configs

* Version with pyarrow included

* Dependencies for arrow in R as well

* Fixed check_model script

* Updated for feather integration

* Updated test cases since `n` is reserved in yml

* adding make_excess_heatmap function for hosp outcomes

* Fixing parallelization mistake

* Minor fixes

- Use the "optimize" covidImportation version
- Always upgrade local packages if upgrade available (vs silently ignore)
- check_model_reports should ensure axis are dates

* new figure relative to threshold heatmap

* Update importation.R to match covidImportation package updates

* Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed

* Fixed typo

* Final fix to avoid numba

* Fixed path to install_local script

* Added package

* Fixed seeding creation

* rm NAs and fix create_seeding.R

* add new cum hosp/deaths check to check_models scr

* update indexes in check model script

* long form mobility

* Update reference to geoid-params.csv inside of hosp_run.R

* 10x seeding file

* Write the npi when writing parquet output

* template

* report after simulation

* Removed geodata read from hosp_run.R since it's not being used

* Updated things that feed into mobility

* Updated build_US_setup.R to account for the move

* These files got removed in a previous commit

* Removing unused (as far as I can tell anyway) data

* Fix bug when the places are also a number

* Changing back test cases to use size/prob instead of n/p

* Updated name to pass checks on case sensitive OS

* Updated to use file_extension argument`

* Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky

* Updated build_US_setup.R to work with the current setup

* Renamed parameters to avoid confusion; print out simid as 9 digits

SEIR and hospitalization phases have more standardized file format

* read parquet file times correctly

* Revert "read parquet file times correctly"

This reverts commit 521dd25.

* parquet date fixes (#207)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Report devel (#208)

* fix unit test code

* fix unit test for real

* fix unit tests

* adding ability to filter geoids in relative heatmap function

* adding template for county-specific report for a given state

* lower tolerance for distribution tests

* planning_models chunk

* planning scenario chunk

* add names to dev team

Co-authored-by: eclee25 <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Adding Javier (#210)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>

* Delete build-model-input.R (#217)

* Dataseed merge (#215)

* Adding Javier

* Adding commute data back in

* rm fixed param and comment out bad plot

* commit namesapce report gen

* fix NVentCurr name

* formatting changes to county report template, removing defaults that should be modified for each report

* adding references for county report template

* change importation seeding

* table formatting

* limitations chunk considering age specific hosp calculations

* removing build_hospdeath_geoid_par - old version not used in hosprun.R

* removing legacy hospitalization scripts. everything runs through hosp_run.R now

* using current default durations to minimize confusion

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>

* Removing config.yml and changing the variable name in create_seeding to be truthful. (#219)

* Fixed the low in followup issue (#224)

* Fixed the low in followup issue

* Adding initial ^

* adding county report template yaml (#221)

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* Fix load-bearing typo (#225)

* Fix load-bearing typo

* pretty sure it's supposed to be this

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: kkintaro <katkintaro@gmail.com>

Co-authored-by: salauer <stephenalauer@gmail.com>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: jlessler <jlessler@jhsph.edu>
Co-authored-by: eclee25 <eclee25@gmail.com>
Co-authored-by: Pierre-Yves Aquilanti <pierreya@amazon.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: Stephen A Lauer <salauer@users.noreply.github.com>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Josh Wills <josh.wills@gmail.com>
@jkamins7 jkamins7 mentioned this pull request Apr 22, 2020
jkamins7 added a commit that referenced this pull request Apr 22, 2020
* tidied up code for build_hospdeath_geoid_fixedIFR_par(); made output file faster and smaller

* Add python argument to make_makefile.R

* Remove file autogenerated by importation

* Try to print a fake secret to see if env is working

* Update ci.yml

* Update ci.yml

* to all caps

* Update README.md

* Updated to use optimize branch of covidImportation

* Getting the version of covidImportation that the script works with

* Run if not there instead of the opposite

* Print error if config not specified

* Need parallel back in here for detectCores

* Fix missing comma

* adding option to switch hosp script version

* Bad merge

* Modify report generation targets in Makefile

* Remove unused include_in_report column from test1 geodata

* Add integration test for report generation

* making hosp_run backwards compatible, adding toggle in config to run age adjusted or non age adjusted hosp function

* clarifying which time to death distribution should be used for each hosp script. Minor fixes to ensure backwards compatibility

* removing hard coded p_death_rate

* Fixed a few issues intrudocued in the updated state report template.

* Update README.md

* Create seeding.csv

* Fixed testing config

* minor text changes

* Removing comments to help refresh repo

* Trying to specify a specific commit for integration testing

* Remove packages customer in favor of local_install for containerization

* Change if clause for dockerhub deployment

* Fix typo in dockerfile

* Add awscli in dockerfile

* Add report::continue_on_error config. Use notebooks directory instead of report.

And runs importation for report generation test.

* add check models R script

* Changed state usps as a paramter and added better error handling.

* added time from ventilation to discharge and number of people currently using ventilators to hospitalization

* Add state_usps to report

* state_usps and changing assert

* export vent data

* add vent figures

* comment out vents figs

* Improve hospitalization performance

* Removes unnecessary casting and copying of data frames and simplifies data frame merge in a critical section
* New age-adjusted hospitalization is 50% faster with 35% less memory pressure
* Old legacy hospitalization is 68% faster with ~50% less memory pressure

* create filterupdate for other data

* fixing NAs in curr hosp, ICU, vent

* Add CLI args for check_model_reports.R to config it like our other scripts

* Let us control the number of simulations from the CLI for importation

* bug fix

* so bad at coding

* incidVent exported in reportloaddata

* export vents, breaks in table

* Remove this invalid config checking pattern from our scripts

* changing default vent_dur if not specified in old configs

* fixing leading 0s on read in issue

* new age specific parameter estimates with DC, PR

* Remove extra comma on check_report CLI

* update figs to compare data & model

* fig updates for CA report

* Modified base load functions to take advantage of a parallel back end using foreach.

* Add ability for filtering helpers to take arguments.

* Fixed bug of missing ... with scenarios filter.

* Update make_makefile.R

* changing hosp load function to include ventilation outcomes

* adding table function with ventilator outcomes, fixking peak ICU occupancy from table

* updating namespace for new vent function

* fixing threshold axis ordering

* adding function for scenario specific table with ventilations

* function to format current ICU, ventilator data for heatmap figures

* Updated covidcommon to have lognormal distribution.  Added a new function to hospitalization package that (slowly) includes both updated config and uses random distributions for delay/duration.  Needs to be optimized before use, but at least this way people can look at it.

* import only one death rate in data/model comparison

* update methods and static chunk text

* change limits for model-obs comparison

* uncomment current vent usage

* include only necc geoids in check script

* Documentation first draft for R script runners

* reporting rate of 20% for modeled "cases"

* Fixed some italics

* plot mean and .2*infections

* Create create_pipeline_docs.R

* Add warning for special R comments

* Update .gitignore

* Some formatting cleanup

* Update hosp_run.R

* separate random distr tests & test mn/var

* pre or post process aren't fxns test

* Fixed issues around new function arguments (e.g. num_files) breaking existing code

* export vent_curr where appropriate

* mean incident deaths figure

* change back silly caps name

* Updated all scripts and packages (except importation to support feather

* Fixing seed problems in all hospitalization scripts

* fix bug when incl_geoids=NULL

* Change documentation from knitr to Doxygen

* Add integration test_hosp_age_adjust and rename test1 to test_simple

* Updated loading code

* Altered feather to parquet and updated the skeleton

* Update requirements.txt to include the dvc[s3] dependencies

* Updated makefile stuff, and also everything should now support feather

* Updated docker requirements

* Write parameters and npi from SEIR model

* Add model_parameters to integration test

* Clean up some configs

* Version with pyarrow included

* Dependencies for arrow in R as well

* Fixed check_model script

* Updated for feather integration

* Updated test cases since `n` is reserved in yml

* adding make_excess_heatmap function for hosp outcomes

* Fixing parallelization mistake

* Minor fixes

- Use the "optimize" covidImportation version
- Always upgrade local packages if upgrade available (vs silently ignore)
- check_model_reports should ensure axis are dates

* new figure relative to threshold heatmap

* Update importation.R to match covidImportation package updates

* Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed

* Fixed typo

* Final fix to avoid numba

* Fixed path to install_local script

* Added package

* Fixed seeding creation

* rm NAs and fix create_seeding.R

* add new cum hosp/deaths check to check_models scr

* update indexes in check model script

* long form mobility

* Update reference to geoid-params.csv inside of hosp_run.R

* 10x seeding file

* Write the npi when writing parquet output

* template

* report after simulation

* Removed geodata read from hosp_run.R since it's not being used

* Updated things that feed into mobility

* Updated build_US_setup.R to account for the move

* These files got removed in a previous commit

* Removing unused (as far as I can tell anyway) data

* Fix bug when the places are also a number

* Changing back test cases to use size/prob instead of n/p

* Updated name to pass checks on case sensitive OS

* Updated to use file_extension argument`

* Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky

* Updated build_US_setup.R to work with the current setup

* Renamed parameters to avoid confusion; print out simid as 9 digits

SEIR and hospitalization phases have more standardized file format

* read parquet file times correctly

* Revert "read parquet file times correctly"

This reverts commit 521dd25.

* parquet date fixes (#207)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Report devel (#208)

* fix unit test code

* fix unit test for real

* fix unit tests

* adding ability to filter geoids in relative heatmap function

* adding template for county-specific report for a given state

* lower tolerance for distribution tests

* planning_models chunk

* planning scenario chunk

* add names to dev team

Co-authored-by: eclee25 <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Adding Javier (#210)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>

* Delete build-model-input.R (#217)

* Dataseed merge (#215)

* Adding Javier

* Adding commute data back in

* rm fixed param and comment out bad plot

* commit namesapce report gen

* fix NVentCurr name

* formatting changes to county report template, removing defaults that should be modified for each report

* adding references for county report template

* change importation seeding

* table formatting

* limitations chunk considering age specific hosp calculations

* removing build_hospdeath_geoid_par - old version not used in hosprun.R

* removing legacy hospitalization scripts. everything runs through hosp_run.R now

* using current default durations to minimize confusion

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>

* Removing config.yml and changing the variable name in create_seeding to be truthful. (#219)

* Fixed the low in followup issue (#224)

* Fixed the low in followup issue

* Adding initial ^

* adding county report template yaml (#221)

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* Fix load-bearing typo (#225)

* Fix load-bearing typo

* pretty sure it's supposed to be this

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: kkintaro <katkintaro@gmail.com>

Co-authored-by: salauer <stephenalauer@gmail.com>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: jlessler <jlessler@jhsph.edu>
Co-authored-by: eclee25 <eclee25@gmail.com>
Co-authored-by: Pierre-Yves Aquilanti <pierreya@amazon.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: Stephen A Lauer <salauer@users.noreply.github.com>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Josh Wills <josh.wills@gmail.com>

Co-authored-by: salauer <stephenalauer@gmail.com>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: jlessler <jlessler@jhsph.edu>
Co-authored-by: eclee25 <eclee25@gmail.com>
Co-authored-by: Pierre-Yves Aquilanti <pierreya@amazon.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: Stephen A Lauer <salauer@users.noreply.github.com>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Josh Wills <josh.wills@gmail.com>
eclee25 added a commit that referenced this pull request Jul 13, 2020
* Version with pyarrow included

* Dependencies for arrow in R as well

* Fixed check_model script

* Updated for feather integration

* Updated test cases since `n` is reserved in yml

* adding make_excess_heatmap function for hosp outcomes

* Fixing parallelization mistake

* Minor fixes

- Use the "optimize" covidImportation version
- Always upgrade local packages if upgrade available (vs silently ignore)
- check_model_reports should ensure axis are dates

* new figure relative to threshold heatmap

* Update importation.R to match covidImportation package updates

* Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed

* Fixed typo

* Final fix to avoid numba

* Fixed path to install_local script

* Added package

* Fixed seeding creation

* rm NAs and fix create_seeding.R

* add new cum hosp/deaths check to check_models scr

* update indexes in check model script

* long form mobility

* Update reference to geoid-params.csv inside of hosp_run.R

* 10x seeding file

* Write the npi when writing parquet output

* template

* report after simulation

* Removed geodata read from hosp_run.R since it's not being used

* Updated things that feed into mobility

* Updated build_US_setup.R to account for the move

* These files got removed in a previous commit

* Removing unused (as far as I can tell anyway) data

* Fix bug when the places are also a number

* Changing back test cases to use size/prob instead of n/p

* Updated name to pass checks on case sensitive OS

* Updated to use file_extension argument`

* Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky

* Updated build_US_setup.R to work with the current setup

* Renamed parameters to avoid confusion; print out simid as 9 digits

SEIR and hospitalization phases have more standardized file format

* read parquet file times correctly

* Revert "read parquet file times correctly"

This reverts commit 521dd25.

* parquet date fixes (#207)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Report devel (#208)

* fix unit test code

* fix unit test for real

* fix unit tests

* adding ability to filter geoids in relative heatmap function

* adding template for county-specific report for a given state

* lower tolerance for distribution tests

* planning_models chunk

* planning scenario chunk

* add names to dev team

Co-authored-by: eclee25 <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Adding Javier (#210)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>

* Delete build-model-input.R (#217)

* Dataseed merge (#215)

* Adding Javier

* Adding commute data back in

* rm fixed param and comment out bad plot

* commit namesapce report gen

* fix NVentCurr name

* formatting changes to county report template, removing defaults that should be modified for each report

* adding references for county report template

* change importation seeding

* table formatting

* limitations chunk considering age specific hosp calculations

* removing build_hospdeath_geoid_par - old version not used in hosprun.R

* removing legacy hospitalization scripts. everything runs through hosp_run.R now

* using current default durations to minimize confusion

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>

* Removing config.yml and changing the variable name in create_seeding to be truthful. (#219)

* Fixed the low in followup issue (#224)

* Fixed the low in followup issue

* Adding initial ^

* adding county report template yaml (#221)

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* Fix load-bearing typo (#225)

* Fix load-bearing typo

* pretty sure it's supposed to be this

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: kkintaro <katkintaro@gmail.com>

* Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch

* fix for 1 scenario (#230)

Co-authored-by: Elizabeth Lee <eclee25@gmail.com>

* RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

* Update covidImportation package to v1.6 (#10)

* Update covidImportation package to v1.6 (#250)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Updated indexing in simulations and hospitalization

* Added better indexing for hospitalization

* Add ability to reduce alpha, sigma, and gamma (#241)

* Add the ability to reduce multiple parameters

* Add Reduce scenario template to test_simple and documentation

* minor bug test fix

* Minor bugs

Co-authored-by: Joseph Lemaitre <joseph.lemaitre@epfl.ch>

* Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Fixed filter issues with makefile setup in case dynfilter isn't provided in config

* Updated makefile

* Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

* Packrat (#253)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Packrat (#267)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Updating docker to install current versions of local packages

* Update .Rprofile

* Update dockerhub.yaml

* Update aws.yaml

* Yet another packrat attempt

* Update ci.yml

* Generic version of the batch job launcher/runner (#257)

* Generic version of batch from the union of jwills_dfU_run and dataseed_batch2

* Fixes from running stuff on some test jobs

* Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* changing covidImportation tag to 1.6.1

* Reduce SEIR startup costs (#273)

* 60% speedup in one run SEIR performance

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

```
Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs
```

* Add Python build directory to .gitignore

* Integrate build_US_setup into pipeline and... (#271)

* Add hard-coded territory data to build_US_setup

* Create csv of island area census data since it cannot be accessed by API

* Change the report targets to follow the conventions of make_makefile

* Integrate build_US_setup into pipeline

* Some bug fixes

* git lfs pull of commute_data.csv and switch docker image

* Update ci.yml

* Update ci.yml

* Remove generated files

* Update make_makefile.R

* Update run_tests.py

* pull census year from config

* Use census year from config to build_US_setup

* Update build_US_setup.R

Co-authored-by: eclee25 <eclee25@gmail.com>

* Add check to hospitalization that geodata geoids are in geoid-params.csv (#283)

* added state level script for creating csv reporting out quantiles

* Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script

* Added countylevel script

* Varios fixes and updates to post run summarization scripts.

* Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)

* Integrate QuantileSummarizeGeoExtent.R into pipeline

* Create QuantileSummarizeGeoidLevel.py

* Working on the python script

* Integrate quantile scripts into Makefile

* Delete QuantileSummarizeGeoidLevel.py

* perf fix for quantile_report_script

* QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

* add `--name_filter` to quantile_summarize_geoid_level as per feedback

* Adjust quantile scripts so they all have the same interface

- Fixed bug in both R scripts where `num_files` was set incorrectly
- Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts

* Revert make_makefile.R to dev branch version

* setup file for international countries

* Fatiguing NPI

* tested MVP

* other implementation, maybe cleaner

* update to hosp_run to take specified geoid-params

* Added mild infections as output of hospitalization

* minor

* Hospitalization package update

* dev setup

* fixed rate

* adding apl deployment to ecr

* international seeding and setup files created

* Update to report template docs for country reports

* update to non-US scripts

* update to international branch country setup

* non-US setup Rmd and other scripts finished.

* update

* minor print edit

* updates to script to make international functional with master

* minor update to report and setup scripts

* setup fix

* non-us update

* dev setup relative min

* relative min ready

* 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R

2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.

* Delete jhucsse_case_data_crude.csv

accidental data commit

* vignette fix

* Removed man folders from packages

* fixes in the international branch before the merge

* Do not update packages

* Update covidImportation to v1.6.1

* minor fix

* fix non-US setup

* Update local_install.R

* Fix merge error

* Reload covidImportation v1.6.1 to fix tidyverse dependency

* seeding update with inputted incidence multiplier

* minor names fix

* Minor fixes to build_US and build_nonUS integration tests

* deleted a comma

* minor bug fix

* Fix reversed international tag

* fixed error message

* fixed python error

* minor

* Adding updated severity parameters

* fixing US seeding

* adding print message

* Update covidImportation with bug fix

* minor update

* Fix filter issue

* integration testing fixes

* Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".

* make_makefile.R now includes both US and non-US  functionality

* make_makefile white space fix

* Add tictoc package to dev docker

* Updated to fix a docker bug

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: jkamins7 <jkaminsky@jhu.edu>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Josh Wills <josh.wills@gmail.com>
Co-authored-by: Sam Shah <shahsam@umich.edu>
Co-authored-by: Dave <David.Witman@jhuapl.edu>
Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com>
eclee25 added a commit that referenced this pull request Jul 15, 2020
* updates to state template

* fix load_cum_inf_geounit_dates to use hosp only

* add hosp method chunks from report_devel

* adding generic mapping function

* removing grouping by time for appropriate cumsum in load_cum_inf

* fixing error in load_cum_inf

* add ventilator to scenario tbl

* add warning about loading infections from hosp data

* deprecate old functions, integration testing temp

* recreating clean NAMESPACE to remove export of setup_testing_environment preventing pkg install

* adding sim_num before post_process in load_hosp_sims_filtered for output that does not contain sim_num but requires it for post-processing

* adding warning about variable name to load_hosp_geounit_threshold

* moving make_excess_heatmap to deprecated functions

* prep report_devel2 for dev merge (#351)

* Version with pyarrow included

* Dependencies for arrow in R as well

* Fixed check_model script

* Updated for feather integration

* Updated test cases since `n` is reserved in yml

* adding make_excess_heatmap function for hosp outcomes

* Fixing parallelization mistake

* Minor fixes

- Use the "optimize" covidImportation version
- Always upgrade local packages if upgrade available (vs silently ignore)
- check_model_reports should ensure axis are dates

* new figure relative to threshold heatmap

* Update importation.R to match covidImportation package updates

* Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed

* Fixed typo

* Final fix to avoid numba

* Fixed path to install_local script

* Added package

* Fixed seeding creation

* rm NAs and fix create_seeding.R

* add new cum hosp/deaths check to check_models scr

* update indexes in check model script

* long form mobility

* Update reference to geoid-params.csv inside of hosp_run.R

* 10x seeding file

* Write the npi when writing parquet output

* template

* report after simulation

* Removed geodata read from hosp_run.R since it's not being used

* Updated things that feed into mobility

* Updated build_US_setup.R to account for the move

* These files got removed in a previous commit

* Removing unused (as far as I can tell anyway) data

* Fix bug when the places are also a number

* Changing back test cases to use size/prob instead of n/p

* Updated name to pass checks on case sensitive OS

* Updated to use file_extension argument`

* Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky

* Updated build_US_setup.R to work with the current setup

* Renamed parameters to avoid confusion; print out simid as 9 digits

SEIR and hospitalization phases have more standardized file format

* read parquet file times correctly

* Revert "read parquet file times correctly"

This reverts commit 521dd25.

* parquet date fixes (#207)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Report devel (#208)

* fix unit test code

* fix unit test for real

* fix unit tests

* adding ability to filter geoids in relative heatmap function

* adding template for county-specific report for a given state

* lower tolerance for distribution tests

* planning_models chunk

* planning scenario chunk

* add names to dev team

Co-authored-by: eclee25 <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Adding Javier (#210)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>

* Delete build-model-input.R (#217)

* Dataseed merge (#215)

* Adding Javier

* Adding commute data back in

* rm fixed param and comment out bad plot

* commit namesapce report gen

* fix NVentCurr name

* formatting changes to county report template, removing defaults that should be modified for each report

* adding references for county report template

* change importation seeding

* table formatting

* limitations chunk considering age specific hosp calculations

* removing build_hospdeath_geoid_par - old version not used in hosprun.R

* removing legacy hospitalization scripts. everything runs through hosp_run.R now

* using current default durations to minimize confusion

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>

* Removing config.yml and changing the variable name in create_seeding to be truthful. (#219)

* Fixed the low in followup issue (#224)

* Fixed the low in followup issue

* Adding initial ^

* adding county report template yaml (#221)

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* Fix load-bearing typo (#225)

* Fix load-bearing typo

* pretty sure it's supposed to be this

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: kkintaro <katkintaro@gmail.com>

* Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch

* fix for 1 scenario (#230)

Co-authored-by: Elizabeth Lee <eclee25@gmail.com>

* RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

* Update covidImportation package to v1.6 (#10)

* Update covidImportation package to v1.6 (#250)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Updated indexing in simulations and hospitalization

* Added better indexing for hospitalization

* Add ability to reduce alpha, sigma, and gamma (#241)

* Add the ability to reduce multiple parameters

* Add Reduce scenario template to test_simple and documentation

* minor bug test fix

* Minor bugs

Co-authored-by: Joseph Lemaitre <joseph.lemaitre@epfl.ch>

* Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Fixed filter issues with makefile setup in case dynfilter isn't provided in config

* Updated makefile

* Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

* Packrat (#253)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Packrat (#267)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Updating docker to install current versions of local packages

* Update .Rprofile

* Update dockerhub.yaml

* Update aws.yaml

* Yet another packrat attempt

* Update ci.yml

* Generic version of the batch job launcher/runner (#257)

* Generic version of batch from the union of jwills_dfU_run and dataseed_batch2

* Fixes from running stuff on some test jobs

* Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* changing covidImportation tag to 1.6.1

* Reduce SEIR startup costs (#273)

* 60% speedup in one run SEIR performance

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

```
Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs
```

* Add Python build directory to .gitignore

* Integrate build_US_setup into pipeline and... (#271)

* Add hard-coded territory data to build_US_setup

* Create csv of island area census data since it cannot be accessed by API

* Change the report targets to follow the conventions of make_makefile

* Integrate build_US_setup into pipeline

* Some bug fixes

* git lfs pull of commute_data.csv and switch docker image

* Update ci.yml

* Update ci.yml

* Remove generated files

* Update make_makefile.R

* Update run_tests.py

* pull census year from config

* Use census year from config to build_US_setup

* Update build_US_setup.R

Co-authored-by: eclee25 <eclee25@gmail.com>

* Add check to hospitalization that geodata geoids are in geoid-params.csv (#283)

* added state level script for creating csv reporting out quantiles

* Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script

* Added countylevel script

* Varios fixes and updates to post run summarization scripts.

* Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)

* Integrate QuantileSummarizeGeoExtent.R into pipeline

* Create QuantileSummarizeGeoidLevel.py

* Working on the python script

* Integrate quantile scripts into Makefile

* Delete QuantileSummarizeGeoidLevel.py

* perf fix for quantile_report_script

* QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

* add `--name_filter` to quantile_summarize_geoid_level as per feedback

* Adjust quantile scripts so they all have the same interface

- Fixed bug in both R scripts where `num_files` was set incorrectly
- Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts

* Revert make_makefile.R to dev branch version

* setup file for international countries

* Fatiguing NPI

* tested MVP

* other implementation, maybe cleaner

* update to hosp_run to take specified geoid-params

* Added mild infections as output of hospitalization

* minor

* Hospitalization package update

* dev setup

* fixed rate

* adding apl deployment to ecr

* international seeding and setup files created

* Update to report template docs for country reports

* update to non-US scripts

* update to international branch country setup

* non-US setup Rmd and other scripts finished.

* update

* minor print edit

* updates to script to make international functional with master

* minor update to report and setup scripts

* setup fix

* non-us update

* dev setup relative min

* relative min ready

* 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R

2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.

* Delete jhucsse_case_data_crude.csv

accidental data commit

* vignette fix

* Removed man folders from packages

* fixes in the international branch before the merge

* Do not update packages

* Update covidImportation to v1.6.1

* minor fix

* fix non-US setup

* Update local_install.R

* Fix merge error

* Reload covidImportation v1.6.1 to fix tidyverse dependency

* seeding update with inputted incidence multiplier

* minor names fix

* Minor fixes to build_US and build_nonUS integration tests

* deleted a comma

* minor bug fix

* Fix reversed international tag

* fixed error message

* fixed python error

* minor

* Adding updated severity parameters

* fixing US seeding

* adding print message

* Update covidImportation with bug fix

* minor update

* Fix filter issue

* integration testing fixes

* Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".

* make_makefile.R now includes both US and non-US  functionality

* make_makefile white space fix

* Add tictoc package to dev docker

* Updated to fix a docker bug

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: jkamins7 <jkaminsky@jhu.edu>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Josh Wills <josh.wills@gmail.com>
Co-authored-by: Sam Shah <shahsam@umich.edu>
Co-authored-by: Dave <David.Witman@jhuapl.edu>
Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com>

* rename report.generation folder

* update report.generation path in workflow test

Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: juanderone <57634493+juanderone@users.noreply.github.com>
Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: jkamins7 <jkaminsky@jhu.edu>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Josh Wills <josh.wills@gmail.com>
Co-authored-by: Sam Shah <shahsam@umich.edu>
Co-authored-by: Dave <David.Witman@jhuapl.edu>
Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com>
eclee25 added a commit that referenced this pull request Sep 9, 2020
* Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch

* RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

* Update covidImportation package to v1.6 (#10)

* Update covidImportation package to v1.6 (#250)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Updated indexing in simulations and hospitalization

* Added better indexing for hospitalization

* Add ability to reduce alpha, sigma, and gamma (#241)

* Add the ability to reduce multiple parameters

* Add Reduce scenario template to test_simple and documentation

* minor bug test fix

* Minor bugs

Co-authored-by: Joseph Lemaitre <joseph.lemaitre@epfl.ch>

* Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Fixed filter issues with makefile setup in case dynfilter isn't provided in config

* Updated makefile

* Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

* Packrat (#253)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Packrat (#267)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Updating docker to install current versions of local packages

* Update .Rprofile

* Update dockerhub.yaml

* Update aws.yaml

* Yet another packrat attempt

* Update ci.yml

* Generic version of the batch job launcher/runner (#257)

* Generic version of batch from the union of jwills_dfU_run and dataseed_batch2

* Fixes from running stuff on some test jobs

* Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* Reduce SEIR startup costs (#273)

* 60% speedup in one run SEIR performance

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

```
Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs
```

* Add Python build directory to .gitignore

* Integrate build_US_setup into pipeline and... (#271)

* Add hard-coded territory data to build_US_setup

* Create csv of island area census data since it cannot be accessed by API

* Change the report targets to follow the conventions of make_makefile

* Integrate build_US_setup into pipeline

* Some bug fixes

* git lfs pull of commute_data.csv and switch docker image

* Update ci.yml

* Update ci.yml

* Remove generated files

* Update make_makefile.R

* Update run_tests.py

* pull census year from config

* Use census year from config to build_US_setup

* Update build_US_setup.R

Co-authored-by: eclee25 <eclee25@gmail.com>

* Add check to hospitalization that geodata geoids are in geoid-params.csv (#283)

* added state level script for creating csv reporting out quantiles

* Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script

* Added countylevel script

* Varios fixes and updates to post run summarization scripts.

* Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)

* Integrate QuantileSummarizeGeoExtent.R into pipeline

* Create QuantileSummarizeGeoidLevel.py

* Working on the python script

* Integrate quantile scripts into Makefile

* Delete QuantileSummarizeGeoidLevel.py

* perf fix for quantile_report_script

* QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

* add `--name_filter` to quantile_summarize_geoid_level as per feedback

* Adjust quantile scripts so they all have the same interface

- Fixed bug in both R scripts where `num_files` was set incorrectly
- Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts

* Revert make_makefile.R to dev branch version

* setup file for international countries

* Fatiguing NPI

* tested MVP

* other implementation, maybe cleaner

* update to hosp_run to take specified geoid-params

* Added mild infections as output of hospitalization

* minor

* Hospitalization package update

* dev setup

* fixed rate

* adding apl deployment to ecr

* international seeding and setup files created

* Update to report template docs for country reports

* update to non-US scripts

* update to international branch country setup

* non-US setup Rmd and other scripts finished.

* update

* minor print edit

* updates to script to make international functional with master

* minor update to report and setup scripts

* setup fix

* non-us update

* dev setup relative min

* relative min ready

* 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R

2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.

* Delete jhucsse_case_data_crude.csv

accidental data commit

* vignette fix

* Removed man folders from packages

* fixes in the international branch before the merge

* Do not update packages

* Update covidImportation to v1.6.1

* minor fix

* fix non-US setup

* Update local_install.R

* Fix merge error

* Reload covidImportation v1.6.1 to fix tidyverse dependency

* seeding update with inputted incidence multiplier

* minor names fix

* Minor fixes to build_US and build_nonUS integration tests

* deleted a comma

* minor bug fix

* Fix reversed international tag

* fixed error message

* fixed python error

* minor

* Adding updated severity parameters

* fixing US seeding

* adding print message

* Update covidImportation with bug fix

* minor update

* Fix filter issue

* integration testing fixes

* Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".

* make_makefile.R now includes both US and non-US  functionality

* make_makefile white space fix

* Add tictoc package to dev docker

* Updated to fix a docker bug

* Report devel2 into dev (#352)

* updates to state template

* fix load_cum_inf_geounit_dates to use hosp only

* add hosp method chunks from report_devel

* adding generic mapping function

* removing grouping by time for appropriate cumsum in load_cum_inf

* fixing error in load_cum_inf

* add ventilator to scenario tbl

* add warning about loading infections from hosp data

* deprecate old functions, integration testing temp

* recreating clean NAMESPACE to remove export of setup_testing_environment preventing pkg install

* adding sim_num before post_process in load_hosp_sims_filtered for output that does not contain sim_num but requires it for post-processing

* adding warning about variable name to load_hosp_geounit_threshold

* moving make_excess_heatmap to deprecated functions

* prep report_devel2 for dev merge (#351)

* Version with pyarrow included

* Dependencies for arrow in R as well

* Fixed check_model script

* Updated for feather integration

* Updated test cases since `n` is reserved in yml

* adding make_excess_heatmap function for hosp outcomes

* Fixing parallelization mistake

* Minor fixes

- Use the "optimize" covidImportation version
- Always upgrade local packages if upgrade available (vs silently ignore)
- check_model_reports should ensure axis are dates

* new figure relative to threshold heatmap

* Update importation.R to match covidImportation package updates

* Updated model code to use the new covidImportation package, and also seed to E instead of I (and keep population fixed

* Fixed typo

* Final fix to avoid numba

* Fixed path to install_local script

* Added package

* Fixed seeding creation

* rm NAs and fix create_seeding.R

* add new cum hosp/deaths check to check_models scr

* update indexes in check model script

* long form mobility

* Update reference to geoid-params.csv inside of hosp_run.R

* 10x seeding file

* Write the npi when writing parquet output

* template

* report after simulation

* Removed geodata read from hosp_run.R since it's not being used

* Updated things that feed into mobility

* Updated build_US_setup.R to account for the move

* These files got removed in a previous commit

* Removing unused (as far as I can tell anyway) data

* Fix bug when the places are also a number

* Changing back test cases to use size/prob instead of n/p

* Updated name to pass checks on case sensitive OS

* Updated to use file_extension argument`

* Fix broken tests, though I recommend we eliminate the mean and var checks since they'll be flaky

* Updated build_US_setup.R to work with the current setup

* Renamed parameters to avoid confusion; print out simid as 9 digits

SEIR and hospitalization phases have more standardized file format

* read parquet file times correctly

* Revert "read parquet file times correctly"

This reverts commit 521dd25.

* parquet date fixes (#207)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Report devel (#208)

* fix unit test code

* fix unit test for real

* fix unit tests

* adding ability to filter geoids in relative heatmap function

* adding template for county-specific report for a given state

* lower tolerance for distribution tests

* planning_models chunk

* planning scenario chunk

* add names to dev team

Co-authored-by: eclee25 <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>

* Adding Javier (#210)

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>

* Delete build-model-input.R (#217)

* Dataseed merge (#215)

* Adding Javier

* Adding commute data back in

* rm fixed param and comment out bad plot

* commit namesapce report gen

* fix NVentCurr name

* formatting changes to county report template, removing defaults that should be modified for each report

* adding references for county report template

* change importation seeding

* table formatting

* limitations chunk considering age specific hosp calculations

* removing build_hospdeath_geoid_par - old version not used in hosprun.R

* removing legacy hospitalization scripts. everything runs through hosp_run.R now

* using current default durations to minimize confusion

Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Elizabeth Lee <eclee25@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>

* Removing config.yml and changing the variable name in create_seeding to be truthful. (#219)

* Fixed the low in followup issue (#224)

* Fixed the low in followup issue

* Adding initial ^

* adding county report template yaml (#221)

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* Fix load-bearing typo (#225)

* Fix load-bearing typo

* pretty sure it's supposed to be this

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: kkintaro <katkintaro@gmail.com>

* Add an environment variable that can be used for writing uniquely named output files across blocks of jobs from AWS batch

* fix for 1 scenario (#230)

Co-authored-by: Elizabeth Lee <eclee25@gmail.com>

* RStudio in the Docker container

RStudio is now available in the Docker container, which allows development and EDA with the same set of packages as is run in production.

* Update covidImportation package to v1.6 (#10)

* Update covidImportation package to v1.6 (#250)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Updated indexing in simulations and hospitalization

* Added better indexing for hospitalization

* Add ability to reduce alpha, sigma, and gamma (#241)

* Add the ability to reduce multiple parameters

* Add Reduce scenario template to test_simple and documentation

* minor bug test fix

* Minor bugs

Co-authored-by: Joseph Lemaitre <joseph.lemaitre@epfl.ch>

* Move the spatial setup outside of the scenarios loop since it's expensive to load and doesn't change per scenario.

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Fixed filter issues with makefile setup in case dynfilter isn't provided in config

* Updated makefile

* Reduce hospitalization memory pressure

Switch a critical split-apply-combine away from `do.call()`, which results in a 45% reduction in memory usage and a 35% speedup in execution time in my testing.

* Packrat (#253)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Packrat (#267)

* Adding final form of previous packrat + docker setup after merging weirdness

* Switching .so to git lfs

* Removing source for packages installable from cran

* Updated the python rules for reticulate (tests still pass)

* Updated based on review

* Updated to use dev's docker instead of dataseed's

* Added reticulate zoo and xts

* Updated docker with git-lfs

* Updating docker to install current versions of local packages

* Update .Rprofile

* Update dockerhub.yaml

* Update aws.yaml

* Yet another packrat attempt

* Update ci.yml

* Generic version of the batch job launcher/runner (#257)

* Generic version of batch from the union of jwills_dfU_run and dataseed_batch2

* Fixes from running stuff on some test jobs

* Add a vcpu CLI option and update sims_per_job to refer to slots per job

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>

* changing covidImportation tag to 1.6.1

* Reduce SEIR startup costs (#273)

* 60% speedup in one run SEIR performance

The biggest cost in a single sim SEIR run was importation of Numba and the JIT compilation. Change this to compile ahead of time, which results in a nice 60% lift in one run SEIR performance by saving these startup costs---which will be valuable for our large inference runs.

Minor performance benefit when running many simulations as JIT costs are amortized away.

```
Benchmark #1: single sim JIT compilation (current)
  Time (mean ± σ):     13.429 s ±  0.537 s
  Range (min … max):   12.973 s … 14.867 s    100 runs

Benchmark #2: single sim AOT compilation (new)
  Time (mean ± σ):      5.129 s ±  0.125 s
  Range (min … max):    4.901 s …  5.364 s    100 runs
```

* Add Python build directory to .gitignore

* Integrate build_US_setup into pipeline and... (#271)

* Add hard-coded territory data to build_US_setup

* Create csv of island area census data since it cannot be accessed by API

* Change the report targets to follow the conventions of make_makefile

* Integrate build_US_setup into pipeline

* Some bug fixes

* git lfs pull of commute_data.csv and switch docker image

* Update ci.yml

* Update ci.yml

* Remove generated files

* Update make_makefile.R

* Update run_tests.py

* pull census year from config

* Use census year from config to build_US_setup

* Update build_US_setup.R

Co-authored-by: eclee25 <eclee25@gmail.com>

* Add check to hospitalization that geodata geoids are in geoid-params.csv (#283)

* added state level script for creating csv reporting out quantiles

* Fixed a slight bug with static dates and added full geographic extent version of the quantile generation script

* Added countylevel script

* Varios fixes and updates to post run summarization scripts.

* Integrate QuantileSummarizeGeoExtent.R into pipeline (untested)

* Integrate QuantileSummarizeGeoExtent.R into pipeline

* Create QuantileSummarizeGeoidLevel.py

* Working on the python script

* Integrate quantile scripts into Makefile

* Delete QuantileSummarizeGeoidLevel.py

* perf fix for quantile_report_script

* QuantileSummarizeGeoidLevel on Apache Spark

This commit includes a Python implementation of `QuantileSummarizeGeoidLevel.R` running on Apache Spark. The job essentially computes quantiles grouped by geoid and time whereby Spark provides the shuffle and quantile estimation mechanism to perform this aggregation efficiently. The job can be run locally within the container (fine for USA run but takes ~45mins on a r5.24xlarge) or distributed on Amazon EMR. This commit adds Spark and consequently Java inside the container.

* add `--name_filter` to quantile_summarize_geoid_level as per feedback

* Adjust quantile scripts so they all have the same interface

- Fixed bug in both R scripts where `num_files` was set incorrectly
- Adjust quantile_summarize_geoid_level.py to take scenarios (+ config file) versus path names as input to mimic the interface of the other scripts

* Revert make_makefile.R to dev branch version

* setup file for international countries

* Fatiguing NPI

* tested MVP

* other implementation, maybe cleaner

* update to hosp_run to take specified geoid-params

* Added mild infections as output of hospitalization

* minor

* Hospitalization package update

* dev setup

* fixed rate

* adding apl deployment to ecr

* international seeding and setup files created

* Update to report template docs for country reports

* update to non-US scripts

* update to international branch country setup

* non-US setup Rmd and other scripts finished.

* update

* minor print edit

* updates to script to make international functional with master

* minor update to report and setup scripts

* setup fix

* non-us update

* dev setup relative min

* relative min ready

* 1. Added integration tests for US and non-US create_seeding.R and build_US_setup.R/build_nonUS_setup.R

2. create_seeding.R now has the option to choose "CSSE" or "USAFacts" for a US run.

* Delete jhucsse_case_data_crude.csv

accidental data commit

* vignette fix

* Removed man folders from packages

* fixes in the international branch before the merge

* Do not update packages

* Update covidImportation to v1.6.1

* minor fix

* fix non-US setup

* Update local_install.R

* Fix merge error

* Reload covidImportation v1.6.1 to fix tidyverse dependency

* seeding update with inputted incidence multiplier

* minor names fix

* Minor fixes to build_US and build_nonUS integration tests

* deleted a comma

* minor bug fix

* Fix reversed international tag

* fixed error message

* fixed python error

* minor

* Adding updated severity parameters

* fixing US seeding

* adding print message

* Update covidImportation with bug fix

* minor update

* Fix filter issue

* integration testing fixes

* Non-US makefile added. This should actually work fine for US as well. It also adds the ability to use the setup_name from the config to add a file prefix to model outputs, and then only clean those model outputs when running "make clean".

* make_makefile.R now includes both US and non-US  functionality

* make_makefile white space fix

* Add tictoc package to dev docker

* Updated to fix a docker bug

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: jkamins7 <jkaminsky@jhu.edu>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Josh Wills <josh.wills@gmail.com>
Co-authored-by: Sam Shah <shahsam@umich.edu>
Co-authored-by: Dave <David.Witman@jhuapl.edu>
Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com>

* rename report.generation folder

* update report.generation path in workflow test

Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: juanderone <57634493+juanderone@users.noreply.github.com>
Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: jkamins7 <jkaminsky@jhu.edu>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Co-authored-by: Josh Wills <josh.wills@gmail.com>
Co-authored-by: Sam Shah <shahsam@umich.edu>
Co-authored-by: Dave <David.Witman@jhuapl.edu>
Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com>

* configurable delay and ratio for seeding

* seeding file extra comma

* change path to report.generation

* rm double parens

* Dev make (#358)

* make_makefile - rm filter and add seeding & intl

* add parens

* typo

* Removed filter from tests

* fix parens issue

* fixes #338 by raising an error

* fixes #339 by raising an error

* better and correct message

* bugfixes

* better presentation

* consistency accross messages

* accidently deleted some test, putting them back

* newlines

* Updated make_makefile.R to pass tests multiple times in a row

* integ test 2x, update local install

* try to fix 2x integ test

* rm unnecessary chdir

* fix typo in aws apl workflow

Co-authored-by: jkamins7 <jkaminsky@jhu.edu>
Co-authored-by: chadi <joseph.lemaitre@epfl.ch>

* readme file changes

* change to latest docker image

* dev image

* make sensible load_config err + test

* Updated docker file

* Removed failing workflow

* Removed more rstudio config from docker file

* Removed more rstudio config from docker file

* Removed outdated vignettes

* Updated covidImportation version in docker

* Updated packrat

Co-authored-by: Josh Wills <jwills@apache.org>
Co-authored-by: kkintaro <katkintaro@gmail.com>
Co-authored-by: Sam Shah <sam@skipflag.com>
Co-authored-by: jkamins7 <jkaminsky@jhu.edu>
Co-authored-by: Joseph Lemaitre <joseph.lemaitre@epfl.ch>
Co-authored-by: Josh Wills <josh.wills@gmail.com>
Co-authored-by: Sam Shah <shahsam@umich.edu>
Co-authored-by: shauntruelove <satruelove@gmail.com>
Co-authored-by: Dave <David.Witman@jhuapl.edu>
Co-authored-by: Shaun Truelove <shauntruelove@users.noreply.github.com>
Co-authored-by: Kyra Grantz <kyragrantz@gmail.com>
Co-authored-by: juanderone <57634493+juanderone@users.noreply.github.com>
Co-authored-by: hrmeredith12 <hrmeredith12@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants