Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Adds m5 forecasting kaggle example #280

Merged
merged 5 commits into from
Jan 24, 2023
Merged

Adds m5 forecasting kaggle example #280

merged 5 commits into from
Jan 24, 2023

Conversation

skrawcz
Copy link
Collaborator

@skrawcz skrawcz commented Jan 21, 2023

I pretty much verbatim copied the code from this notebook. It does have a few odd quirks in it -- so if I understood what was going on I removed it, else I left it in. E.g. test_2 is not used...

This is an example that shows:

  1. loading data
  2. creating a base set.
  3. adding more features.
  4. creating a training set.
  5. fitting model & predicting with it.
  6. saving the result.

There are a few other ways one could write some of this code. But as a first pass this seems fine. E.g. we could split up the model training function a bit more. Or we could use the parameterize decorator.

No unit tests have been added, or data quality checks.

Changes

  • adds time-series kaggle example

How I tested this

  • runs locally

Notes

  • brings in warts from notebook that it comes from

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@skrawcz skrawcz force-pushed the add_kaggle_example branch 2 times, most recently from f6c9456 to 1fdf6ec Compare January 23, 2023 07:24
@skrawcz skrawcz marked this pull request as ready for review January 23, 2023 07:30


@extract_fields({"x": pd.DataFrame, "y": pd.Series, "test": pd.DataFrame})
def data_sets(training_set: pd.DataFrame, cut_off_date: str = "2016-04-24") -> dict:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doc string

I pretty much verbatim copied the code from [this notebook](https://www.kaggle.com/code/ratan123/m5-forecasting-lightgbm-with-timeseries-splits).
It does have a few odd quirks in it -- so if I understood what was going on I removed it, else I left it in. E.g. `test_2` is not used...

This is an example that shows:

1. loading data
2. creating a base set.
3. adding more features.
4. creating a training set.
5. fitting model & predicting with it.
6. saving the result.

There are a few other ways one could write some of this code.
But as a first pass this seems fine. E.g. we could split up the model
training function a bit more. Or we could use the parameterize decorator.

No unit tests have been added, or data quality checks.
This commit should really be multiple. But I'm trying to move quickly...

Anyway:

1. added doc strings to help people get the feel for that.
2. refactored some functions to make it appear more "production like", showing
better reuse/smaller functions, and better naming. E.g. the fit and predict functions
 should be themselves nodes in the DAG, that a function can itself call out to.
3. switched out the image for a better named one to go with the production story.
@skrawcz skrawcz force-pushed the add_kaggle_example branch from 1fdf6ec to 14b602f Compare January 23, 2023 19:10
@elijahbenizzy elijahbenizzy self-requested a review January 24, 2023 17:02
@skrawcz skrawcz merged commit dc677ae into main Jan 24, 2023
@skrawcz skrawcz deleted the add_kaggle_example branch January 24, 2023 20:49
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants