Refactor ratio stats for build speed increase (#521) · ccao-data/data-architecture@8c6633d

Commit

Refactor ratio stats for build speed increase (#521)

* Move functinos from assesspy into file and run spark groupby

* Try to add numba as dependency

* Add numba dependency

* Revert added packages

* Confirm numpy speed up

* Confirm numpy strat without spark

* Try importing dask

* Add multiprocessing code

* Move parallel functions inside boot_ci function

* Fix linter errors

* Fix linter errors

* Fix linter errors

* Fix linter errors

* Fix linter errors

* Remove rduplicate function

* Remove comment

* Clean up

* Change random seeds and add logs

* Add checkpoint for functioning reduced report_summarise()

* Add checks and successfully add column to final df

* Working spark code with pd sampling

* Update column names

* Test PySpark applyInPandas

* Fix col orders and types

* Add working mostly-Spark implementation

* Bump max DPUs

* Get only the first value from each group col

* Bump nboot to 1000

* Refactor ratio_stats for assesspy 2.0.0

* Update types

* Update med ratio col names

* Check that median sample is gte 2

* Fix sample constants

* Add Athena logging

* Condense Spark ratio_stats code

* Reduce nboot to 300

* Add sales chasing check

* Swap bool to Spark data type

* Add sample size check for is_sales_chased

* Repace calced ratio column

* Ignore E402 only for Spark python models

---------

Co-authored-by: Dan Snow <daniel.snow@cookcountyil.gov>
Co-authored-by: Dan Snow <dan@sno.ws>

Loading branch information

3 people authored Nov 26, 2024

1 parent d38a825 commit 8c6633d

0 comments on commit `8c6633d`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `8c6633d`

Commit

There are no files selected for viewing

0 comments on commit 8c6633d

0 comments on commit `8c6633d`