Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ratio stats for build speed increase #521

Merged
merged 42 commits into from
Nov 26, 2024
Merged
Changes from 17 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
401c7fa
Move functinos from assesspy into file and run spark groupby
wagnerlmichael Jun 20, 2024
e5e43e6
Try to add numba as dependency
wagnerlmichael Jun 24, 2024
18abede
Add numba dependency
wagnerlmichael Jun 24, 2024
4ab840e
Revert added packages
wagnerlmichael Jun 24, 2024
989be95
Confirm numpy speed up
wagnerlmichael Jun 24, 2024
b55bcd4
Confirm numpy strat without spark
wagnerlmichael Jun 24, 2024
b0053ea
Try importing dask
wagnerlmichael Jun 24, 2024
4f21886
Add multiprocessing code
wagnerlmichael Jun 24, 2024
bf02f5b
Move parallel functions inside boot_ci function
wagnerlmichael Jun 25, 2024
e46e22d
Fix linter errors
wagnerlmichael Jun 25, 2024
24a5239
Fix linter errors
wagnerlmichael Jun 25, 2024
d4e3f3f
Fix linter errors
wagnerlmichael Jun 25, 2024
c561a1b
Fix linter errors
wagnerlmichael Jun 25, 2024
62b0d15
Fix linter errors
wagnerlmichael Jun 25, 2024
8077f30
Remove rduplicate function
wagnerlmichael Jun 25, 2024
2acaa9f
Remove comment
wagnerlmichael Jun 25, 2024
0410a52
Clean up
wagnerlmichael Jun 25, 2024
894b7db
Change random seeds and add logs
wagnerlmichael Jul 1, 2024
f53f311
Add checkpoint for functioning reduced report_summarise()
wagnerlmichael Jul 19, 2024
29647d3
Add checks and successfully add column to final df
wagnerlmichael Aug 6, 2024
d22f8bf
Working spark code with pd sampling
wagnerlmichael Aug 14, 2024
ae4e23f
Merge branch 'master' into 436-refactor-ratio_stats-job-to-use-pyspark
dfsnow Nov 19, 2024
c2f0c35
Update column names
dfsnow Nov 19, 2024
fa8ff6d
Test PySpark applyInPandas
dfsnow Nov 19, 2024
e55970d
Fix col orders and types
dfsnow Nov 19, 2024
c10e043
Add working mostly-Spark implementation
dfsnow Nov 20, 2024
757a47c
Bump max DPUs
dfsnow Nov 20, 2024
d99972a
Get only the first value from each group col
dfsnow Nov 20, 2024
d489299
Bump nboot to 1000
dfsnow Nov 20, 2024
a125dda
Refactor ratio_stats for assesspy 2.0.0
dfsnow Nov 25, 2024
4ae9362
Update types
dfsnow Nov 25, 2024
b8d860f
Update med ratio col names
dfsnow Nov 25, 2024
66f107f
Check that median sample is gte 2
dfsnow Nov 25, 2024
3838e80
Fix sample constants
dfsnow Nov 25, 2024
d1e2eb8
Add Athena logging
dfsnow Nov 25, 2024
e881cd7
Condense Spark ratio_stats code
dfsnow Nov 26, 2024
062eddf
Reduce nboot to 300
dfsnow Nov 26, 2024
c77fae3
Add sales chasing check
dfsnow Nov 26, 2024
b36bb46
Swap bool to Spark data type
dfsnow Nov 26, 2024
50f6c60
Add sample size check for is_sales_chased
dfsnow Nov 26, 2024
ab7f2fa
Repace calced ratio column
dfsnow Nov 26, 2024
5f4711b
Ignore E402 only for Spark python models
dfsnow Nov 26, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Loading