Skip to content

Commit

Permalink
Refactor ratio stats for build speed increase (#521)
Browse files Browse the repository at this point in the history
* Move functinos from assesspy into file and run spark groupby

* Try to add numba as dependency

* Add numba dependency

* Revert added packages

* Confirm numpy speed up

* Confirm numpy strat without spark

* Try importing dask

* Add multiprocessing code

* Move parallel functions inside boot_ci function

* Fix linter errors

* Fix linter errors

* Fix linter errors

* Fix linter errors

* Fix linter errors

* Remove rduplicate function

* Remove comment

* Clean up

* Change random seeds and add logs

* Add checkpoint for functioning reduced report_summarise()

* Add checks and successfully add column to final df

* Working spark code with pd sampling

* Update column names

* Test PySpark applyInPandas

* Fix col orders and types

* Add working mostly-Spark implementation

* Bump max DPUs

* Get only the first value from each group col

* Bump nboot to 1000

* Refactor ratio_stats for assesspy 2.0.0

* Update types

* Update med ratio col names

* Check that median sample is gte 2

* Fix sample constants

* Add Athena logging

* Condense Spark ratio_stats code

* Reduce nboot to 300

* Add sales chasing check

* Swap bool to Spark data type

* Add sample size check for is_sales_chased

* Repace calced ratio column

* Ignore E402 only for Spark python models

---------

Co-authored-by: Dan Snow <daniel.snow@cookcountyil.gov>
Co-authored-by: Dan Snow <dan@sno.ws>
  • Loading branch information
3 people authored Nov 26, 2024
1 parent d38a825 commit 8c6633d
Show file tree
Hide file tree
Showing 3 changed files with 152 additions and 235 deletions.
Loading

0 comments on commit 8c6633d

Please sign in to comment.