-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
init: m5 forecasting FE benchmark #136
base: main
Are you sure you want to change the base?
Conversation
|
||
|
||
def q2_polars(df): | ||
return df.with_columns( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the select
+ explode
mapping here?
Participants typically used pandas (Polars was only just getting started at the time), so here we benchmark how long it have | ||
taken to do the same feature engineering with Polars (and, coming soon, DuckDB). | ||
|
||
We believe this to be a useful task to benchmark, because: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove L9-L12.
I think this can serve as a basis for more time-series related benchmarks on this datasets. I don't think we have to strictly limit to what was used in the kaggle competition.
Just got back to this - running locally, I'm seeing very good results for Polars:
|
Some results: https://www.kaggle.com/code/marcogorelli/m5-forecasting-feature-engineering-benchmark