Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add support for exogenous variables in utils.aggregate #294

Closed
wants to merge 2 commits into from

Conversation

KuriaMaingi
Copy link
Contributor

This change to the utility function will assist in instances where you need to generate your summation and Y_df but also want to retain any exogenous vars required for your forecast.

You will need to pass in a dictionary containing your exogenous vars and the Pandas agg functions you want applied against them.

I have currently hardcoded the list of acceptable agg_funcs but open to hear if there's a better way

@KuriaMaingi KuriaMaingi changed the title Add support for exogenous variables Add support for exogenous variables in utils.aggregate Oct 9, 2024
@KuriaMaingi KuriaMaingi changed the title Add support for exogenous variables in utils.aggregate ENH: Add support for exogenous variables in utils.aggregate Oct 9, 2024
@elephaint
Copy link
Contributor

@KuriaMaingi Thanks for your work! I'll happily take a look :)

We use nbdev, which means changes to the code should be made in source notebooks - in your case nbs\utils.ipynb.

To set your environment best up to work on this, I'd advise to:

  1. Clone the repository and go to the root directory in a Terminal window
  2. Create a conda environment hierarchicalforecast: conda create -n hierarchicalforecast python=3.10
  3. Activate the envionrment: conda activate hierarchicalforecast
  4. Install the required packages: conda env update -f environment.yml
  5. Install the locally cloned library editable: pip install -e ".[dev]"
  6. Install git hooks: nbdev_install_hooks
  7. Install pre-commit: pre-commit install

Now make your changes to the notebook, in your case nbs\utils.ipynb. Make sure to clean the notebook before exporting it (Edit -> Clear All Outputs in your IDE)

  1. Build the library: nbdev_export
  2. Use git add, commit and push commands to push the branch and create the PR onwards.

@christophertitchen
Copy link
Contributor

@KuriaMaingi Thanks for your work! I'll happily take a look :)

We use nbdev, which means changes to the code should be made in source notebooks - in your case nbs\utils.ipynb.

To set your environment best up to work on this, I'd advise to:

  1. Clone the repository and go to the root directory in a Terminal window
  2. Create a conda environment hierarchicalforecast: conda create -n hierarchicalforecast python=3.10
  3. Activate the envionrment: conda activate hierarchicalforecast
  4. Install the required packages: conda env update -f environment.yml
  5. Install the locally cloned library editable: pip install -e ".[dev]"
  6. Install git hooks: nbdev_install_hooks
  7. Install pre-commit: pre-commit install

Now make your changes to the notebook, in your case nbs\utils.ipynb. Make sure to clean the notebook before exporting it (Edit -> Clear All Outputs in your IDE)

  1. Build the library: nbdev_export
  2. Use git add, commit and push commands to push the branch and create the PR onwards.

@KuriaMaingi to add to the great summary above, you can also use the commands below before exporting.

  • nbdev_clean --clear_all to double-check that all of the metadata and cell outputs are removed in the notebooks to avoid any merge conflicts.
  • nbdev_test --n_workers 1 --do_print --timing to execute tests in the notebooks sequentially and report on the timings.

Copy link
Contributor

@christophertitchen christophertitchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for helping with the project, it is nice to have more contributors!

I left a few thoughts before you make the changes in the notebooks and export them.

Comment on lines +223 to +225
# Add exog_vars to the aggregation dictionary if it is not None
if exog_vars is not None:
agg_dict.update({key: (key, exog_vars[key]) for key in exog_vars.keys()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please give an example usage of exog_vars in this context?

I have not used pandas much lately, but I think that given your type signature of Dict[str, str], you intend to have exog_vars = {"col_a": "sum", "col_b": "sum"}. However, this does not support multiple functions to aggregate a particular column as you are going down the named aggregation route.

A way around this will be either exog_vars = {"col_a": ("sum", "mean")} which will create a MultiIndex, or alternatively something like exog_vars = {"col_a_sum": ("col_a", "sum"), "col_a_mean": ("col_a", "mean")}. Either way, the distinction between the output column name for the aggregation and the column name to be aggregated will need to be made when inserting into agg_dict to avoid overwriting anything in this case.

Comment on lines +198 to +201
# Define acceptable aggregation functions
acceptable_aggregations = {
'sum', 'mean', 'median', 'min', 'max', 'count', 'std', 'var', 'first', 'last'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is needed—we can just let pandas raise an AttributeError when aggregating rather than raising our own ValueError.

Plus, this gives us the flexibility to use custom (anonymous) functions rather than just string function names.

@KuriaMaingi KuriaMaingi closed this by deleting the head repository Oct 10, 2024
@KuriaMaingi
Copy link
Contributor Author

Thanks all for the comments, I will close this and replace with a new PR following the preferred approach. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants