ENH: Add support for exogenous variables in utils.aggregate #294

KuriaMaingi · 2024-10-09T12:54:59Z

This change to the utility function will assist in instances where you need to generate your summation and Y_df but also want to retain any exogenous vars required for your forecast.

You will need to pass in a dictionary containing your exogenous vars and the Pandas agg functions you want applied against them.

I have currently hardcoded the list of acceptable agg_funcs but open to hear if there's a better way

…riables in creation of Y and Summation dataframes

elephaint · 2024-10-10T10:08:14Z

@KuriaMaingi Thanks for your work! I'll happily take a look :)

We use nbdev, which means changes to the code should be made in source notebooks - in your case nbs\utils.ipynb.

To set your environment best up to work on this, I'd advise to:

Clone the repository and go to the root directory in a Terminal window
Create a conda environment hierarchicalforecast: conda create -n hierarchicalforecast python=3.10
Activate the envionrment: conda activate hierarchicalforecast
Install the required packages: conda env update -f environment.yml
Install the locally cloned library editable: pip install -e ".[dev]"
Install git hooks: nbdev_install_hooks
Install pre-commit: pre-commit install

Now make your changes to the notebook, in your case nbs\utils.ipynb. Make sure to clean the notebook before exporting it (Edit -> Clear All Outputs in your IDE)

Build the library: nbdev_export
Use git add, commit and push commands to push the branch and create the PR onwards.

christophertitchen · 2024-10-10T10:35:05Z

@KuriaMaingi Thanks for your work! I'll happily take a look :)

We use nbdev, which means changes to the code should be made in source notebooks - in your case nbs\utils.ipynb.

To set your environment best up to work on this, I'd advise to:

Clone the repository and go to the root directory in a Terminal window

Create a conda environment hierarchicalforecast: conda create -n hierarchicalforecast python=3.10

Activate the envionrment: conda activate hierarchicalforecast

Install the required packages: conda env update -f environment.yml

Install the locally cloned library editable: pip install -e ".[dev]"

Install git hooks: nbdev_install_hooks

Install pre-commit: pre-commit install

Now make your changes to the notebook, in your case nbs\utils.ipynb. Make sure to clean the notebook before exporting it (Edit -> Clear All Outputs in your IDE)

Build the library: nbdev_export

Use git add, commit and push commands to push the branch and create the PR onwards.

@KuriaMaingi to add to the great summary above, you can also use the commands below before exporting.

nbdev_clean --clear_all to double-check that all of the metadata and cell outputs are removed in the notebooks to avoid any merge conflicts.
nbdev_test --n_workers 1 --do_print --timing to execute tests in the notebooks sequentially and report on the timings.

christophertitchen

Thank you for helping with the project, it is nice to have more contributors!

I left a few thoughts before you make the changes in the notebooks and export them.

christophertitchen · 2024-10-10T11:17:15Z

hierarchicalforecast/utils.py

+    # Add exog_vars to the aggregation dictionary if it is not None
+    if exog_vars is not None:
+        agg_dict.update({key: (key, exog_vars[key]) for key in exog_vars.keys()})


Could you please give an example usage of exog_vars in this context?

I have not used pandas much lately, but I think that given your type signature of Dict[str, str], you intend to have exog_vars = {"col_a": "sum", "col_b": "sum"}. However, this does not support multiple functions to aggregate a particular column as you are going down the named aggregation route.

A way around this will be either exog_vars = {"col_a": ("sum", "mean")} which will create a MultiIndex, or alternatively something like exog_vars = {"col_a_sum": ("col_a", "sum"), "col_a_mean": ("col_a", "mean")}. Either way, the distinction between the output column name for the aggregation and the column name to be aggregated will need to be made when inserting into agg_dict to avoid overwriting anything in this case.

christophertitchen · 2024-10-10T11:17:36Z

hierarchicalforecast/utils.py

+    # Define acceptable aggregation functions
+    acceptable_aggregations = {
+        'sum', 'mean', 'median', 'min', 'max', 'count', 'std', 'var', 'first', 'last'
+    }    


I do not think this is needed—we can just let pandas raise an AttributeError when aggregating rather than raising our own ValueError.

Plus, this gives us the flexibility to use custom (anonymous) functions rather than just string function names.

KuriaMaingi · 2024-10-10T19:19:44Z

Thanks all for the comments, I will close this and replace with a new PR following the preferred approach. Thanks

KuriaMaingi added 2 commits October 9, 2024 15:45

Hierarchical forecast Utils - Aggregate: Add support for exogenous va…

114def6

…riables in creation of Y and Summation dataframes

Minor fix to handle cases where exog_vars is None

fd340a5

KuriaMaingi changed the title ~~Add support for exogenous variables~~ Add support for exogenous variables in utils.aggregate Oct 9, 2024

KuriaMaingi changed the title ~~Add support for exogenous variables in utils.aggregate~~ ENH: Add support for exogenous variables in utils.aggregate Oct 9, 2024

christophertitchen reviewed Oct 10, 2024

View reviewed changes

KuriaMaingi closed this by deleting the head repository Oct 10, 2024

elephaint mentioned this pull request Nov 14, 2024

from hierarchicalforecast.utils import aggregate #202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add support for exogenous variables in utils.aggregate #294

ENH: Add support for exogenous variables in utils.aggregate #294

KuriaMaingi commented Oct 9, 2024

elephaint commented Oct 10, 2024

christophertitchen commented Oct 10, 2024

christophertitchen left a comment

christophertitchen Oct 10, 2024

christophertitchen Oct 10, 2024

KuriaMaingi commented Oct 10, 2024

ENH: Add support for exogenous variables in utils.aggregate #294

ENH: Add support for exogenous variables in utils.aggregate #294

Conversation

KuriaMaingi commented Oct 9, 2024

elephaint commented Oct 10, 2024

christophertitchen commented Oct 10, 2024

christophertitchen left a comment

Choose a reason for hiding this comment

christophertitchen Oct 10, 2024

Choose a reason for hiding this comment

christophertitchen Oct 10, 2024

Choose a reason for hiding this comment

KuriaMaingi commented Oct 10, 2024