Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor ratio stats for build speed increase (#521)
* Move functinos from assesspy into file and run spark groupby * Try to add numba as dependency * Add numba dependency * Revert added packages * Confirm numpy speed up * Confirm numpy strat without spark * Try importing dask * Add multiprocessing code * Move parallel functions inside boot_ci function * Fix linter errors * Fix linter errors * Fix linter errors * Fix linter errors * Fix linter errors * Remove rduplicate function * Remove comment * Clean up * Change random seeds and add logs * Add checkpoint for functioning reduced report_summarise() * Add checks and successfully add column to final df * Working spark code with pd sampling * Update column names * Test PySpark applyInPandas * Fix col orders and types * Add working mostly-Spark implementation * Bump max DPUs * Get only the first value from each group col * Bump nboot to 1000 * Refactor ratio_stats for assesspy 2.0.0 * Update types * Update med ratio col names * Check that median sample is gte 2 * Fix sample constants * Add Athena logging * Condense Spark ratio_stats code * Reduce nboot to 300 * Add sales chasing check * Swap bool to Spark data type * Add sample size check for is_sales_chased * Repace calced ratio column * Ignore E402 only for Spark python models --------- Co-authored-by: Dan Snow <daniel.snow@cookcountyil.gov> Co-authored-by: Dan Snow <dan@sno.ws>
- Loading branch information