-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/port to polars 4 #107
Conversation
Updated the test code in https://github.com/paulf81/flasc_metrics and still seeing a 10x reduction in time taken in computation |
@Bartdoekemeijer At this point you can go ahead and review! The artificial data examples are not yet updated (I'll be working on those tomorrow), but the Smarteole examples are ready to roll and we don't expect any further fundamental changes to the energy_ratio/ code. |
@Bartdoekemeijer ok, I think we're ready to merge on our end. If you want to have a look around, all examples updated to new syntax. We've also added this notebook as quicker tour: https://github.com/paulf81/flasc/blob/feature/port_to_polars_4/examples_artificial_data/energy_ratio/00_demo_energy_ratio_syntax.ipynb |
Port to polars (4th try)
Feature or improvement description
This is a final, final, reboot of the port_to_polars efforts, initially begin in Pull Request #81 and updated in Pull Request #97, (and then pull request #102) here following the decision made in discussion #98 that we keep the assumption that pandas dataframe defined in the current methods (with columns pow_000, pow_001, and time) will be kept the same, and conversion to polars can happen within certain functions. Here I also adjust to discussion Issue #104
Unlike the previous versions I also split some things apart I think in a way that adds more clarity.
energy_ratio_utilities.py contains helper functions other modules could need
energy_ratio.py includes the main working functions of energy ratio calculation
energy_ratio_result.py defines a class meant to hold the energy ratio results in a convenient way and provide clear plotting functions.
Unlike before I've included methods for overlapping bins and now you can the tests energy_ratio_test.py that I've defined for polars passes giving identical results. In general I've confirmed that the energy ratio result values match all previous results except a small but intentional change in how rebalancing is done. This change only manifests when using noisy data and more than one set of data (for example baseline and wakesteering) and is still generally small. But for this reason the tests pass with precision of 4 decimal places.
I'm temporary leaving the polars implementations split out during development but hope to move quickly to deleting the old energy_ratio codes once all surrounding codes are edited to use the new functions. Should hopefully go quick.
Finally you'll also see I'm playing with default visualizations, happy for any and all suggestions here too!
In connection with this effort is a small repo https://github.com/paulf81/flasc_metrics (based on https://github.com/rafmudaf/floris_metrics) which can be used to confirm that even with a pandas->polars->pandas round trip the functions are still sped up. The new folder timing_tests provides functions to that repo.
A first result is promising, converting the energy ratio with bootstrapping to the polars-based method reduces the average time to compute an N=20 bootstrap energy ratio over the artificial data set from 11.3344 seconds to 1.0762 seconds.
Related issue, if one exists
Issue #80
Impacted areas of the software
energy_ratio
(this list may grow)