Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/scalar with window #2529

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
2a84226
add basic scalar window support
JanFidor Aug 25, 2023
3617cad
merge
JanFidor Oct 6, 2023
bc8a20c
Merge branch 'master' into feature/scalar-with-window
dennisbader Oct 13, 2023
d1583bb
Merge remote-tracking branch 'upstream/master' into feature/scalar-wi…
JanFidor Feb 5, 2024
1a0bf70
change scaler to a more robust generalisation
JanFidor Feb 5, 2024
a3cefae
delete unused functions and add util function
JanFidor Feb 5, 2024
42ca47c
delete unused functions and add util function
JanFidor Feb 5, 2024
c615c1c
add transformers to optimized historical forecasts
JanFidor Feb 5, 2024
6004391
add covariate transformers, refactor + docstring update
JanFidor Feb 5, 2024
de237c7
delete util to avoid circular import
JanFidor Feb 5, 2024
55068e4
delete unused param and add transforms to torch models
JanFidor Feb 5, 2024
f5e0041
fix param name
JanFidor Feb 7, 2024
7ae25c9
Merge remote-tracking branch 'upstream/master' into feature/scalar-wi…
JanFidor Jul 17, 2024
5237fea
move all series and covariates fitting into one place, allow data tra…
JanFidor Jul 17, 2024
dce57f6
update readme and data types
JanFidor Jul 17, 2024
77461dd
optimized forecasts only support invertible data transform
JanFidor Jul 17, 2024
c13889f
Merge branch 'master' into feature/scalar-with-window
madtoinou Sep 4, 2024
804c914
feat: harmonize application of scaler in hf, support for Pipeline
madtoinou Sep 10, 2024
4fa2186
feat: adding basic test for regression models
madtoinou Sep 10, 2024
3543982
fix: using an util method to reduce code duplication
madtoinou Sep 12, 2024
654c44b
fix: simplify the tests
madtoinou Sep 12, 2024
52118e4
fix: makes things faster if no data transformer are passed
madtoinou Sep 12, 2024
d0d2c98
feat: add test for the optimized hf
madtoinou Sep 12, 2024
781ddb0
fix: using util method in gridsearch as well
madtoinou Sep 13, 2024
e830d0f
fix: reverting some changes
madtoinou Sep 13, 2024
db301b4
Merge branch 'master' into feat/scalar-with-window
madtoinou Sep 13, 2024
be1065f
fix: make sure the series have a range that require scaling
madtoinou Sep 13, 2024
b521150
update changelog
madtoinou Sep 13, 2024
c44596a
feat: adding small example about how to use scaler in historical fore…
madtoinou Sep 13, 2024
0ec7f6d
fix: adress review comments
madtoinou Sep 13, 2024
5e8e304
fix: adapting the tests
madtoinou Sep 17, 2024
585f45b
Merge branch 'master' into feat/scalar-with-window
madtoinou Sep 17, 2024
912e0d2
Merge branch 'master' into feat/scalar-with-window
madtoinou Sep 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ but cannot always guarantee backwards compatibility. Changes that may **break co
- Added `IQRDetector`, that allows to detect anomalies using the interquartile range algorithm. [#2441] by [Igor Urbanik](https://github.com/u8-igor).
- Added hyperparameters controlling the hidden layer sizes for the feature encoders in `TiDEModel`. [#2408](https://github.com/unit8co/darts/issues/2408) by [eschibli](https://github.com/eschibli).
- Added support for broadcasting to TimeSeries on component and sample level. [#2476](https://https://github.com/unit8co/darts/pull/2476) by [Joel L.](https://github.com/Joelius300).
- Added `data_transformers` argument to `historical_forecasts`, `backtest` and `gridsearch` that allows scaling of the series without data-leakage. [#2529](https://github.com/unit8co/darts/pull/2529) by [Antoine Madrona](https://github.com/madtoinou) and [Jan Fidor](https://github.com/JanFidor)
- Various improvements in the documentation:
- Made README's forecasting model support table more colorblind-friendly. [#2433](https://github.com/unit8co/darts/pull/2433)
- Updated the Ray Tune Hyperparameter Optimization example in the [user guide](https://unit8co.github.io/darts/userguide/hyperparameter_optimization.html) to work with the latest `ray` versions (`>=2.31.0`). [#2459](https://github.com/unit8co/darts/pull/2459) by [He Weilin](https://github.com/cnhwl).
Expand Down
30 changes: 30 additions & 0 deletions darts/dataprocessing/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,10 @@ def __init__(
isinstance(t, InvertibleDataTransformer) for t in self._transformers
)

self._fittable = any(
isinstance(t, FittableDataTransformer) for t in self._transformers
)

if verbose is not None:
for transformer in self._transformers:
transformer.set_verbose(verbose)
Expand Down Expand Up @@ -217,6 +221,32 @@ def invertible(self) -> bool:
"""
return self._invertible

def fittable(self) -> bool:
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
"""
Returns whether the pipeline is fittable or not.
A pipeline is fittable if at least one of the transformers in the pipeline is fittable.

Returns
-------
bool
`True` if the pipeline is fittable, `False` otherwise
"""
return self._fittable

def _fit_called(self) -> bool:
"""
Returns whether all the transformers in the pipeline were fitted (when applicable).

Returns
-------
bool
`True` if all the fittable transformers are fitted, `False` otherwise
"""
return all(
(not isinstance(t, FittableDataTransformer)) or t._fit_called
for t in self._transformers
)

def __getitem__(self, key: Union[int, slice]) -> "Pipeline":
"""
Gets subset of Pipeline based either on index or slice with indexes.
Expand Down
100 changes: 94 additions & 6 deletions darts/models/forecasting/forecasting_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,17 @@

from darts import metrics
from darts.dataprocessing.encoders import SequentialEncoder
from darts.dataprocessing.pipeline import Pipeline
from darts.dataprocessing.transformers import (
BaseDataTransformer,
)
from darts.logging import get_logger, raise_if, raise_if_not, raise_log
from darts.metrics.metrics import METRIC_TYPE
from darts.timeseries import TimeSeries
from darts.utils import _build_tqdm_iterator, _parallel_apply, _with_sanity_checks
from darts.utils.historical_forecasts.utils import (
_adjust_historical_forecasts_time_index,
_apply_data_transformers,
_get_historical_forecast_predict_index,
_get_historical_forecast_train_index,
_historical_forecasts_general_checks,
Expand Down Expand Up @@ -645,6 +650,9 @@ def historical_forecasts(
show_warnings: bool = True,
predict_likelihood_parameters: bool = False,
enable_optimization: bool = True,
data_transformers: Optional[
Dict[str, Union[BaseDataTransformer, Pipeline]]
] = None,
fit_kwargs: Optional[Dict[str, Any]] = None,
predict_kwargs: Optional[Dict[str, Any]] = None,
sample_weight: Optional[Union[TimeSeries, Sequence[TimeSeries], str]] = None,
Expand Down Expand Up @@ -754,6 +762,11 @@ def historical_forecasts(
Default: ``False``
enable_optimization
Whether to use the optimized version of historical_forecasts when supported and available.
data_transformers
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
If model is retrained and transformer is fittable, data transformer re-fit on the training data
at each historical forecast step.
The fitted transformer is used to transform the input during both training and prediction.
If the transformation is invertible, the forecasts will be transformed back.
fit_kwargs
Additional arguments passed to the model `fit()` method.
predict_kwargs
Expand Down Expand Up @@ -885,6 +898,16 @@ def retrain_func(
logger,
)

if data_transformers is None:
data_transformers = dict()
else:
data_transformers = {
key_: val_
if isinstance(val_, Pipeline)
else Pipeline(transformers=[val_], copy=True)
for key_, val_ in data_transformers.items()
}

# remove unsupported arguments, raise exception if interference with historical forecasts logic
fit_kwargs, predict_kwargs = _historical_forecasts_sanitize_kwargs(
model=model,
Expand Down Expand Up @@ -917,6 +940,7 @@ def retrain_func(
verbose=verbose,
show_warnings=show_warnings,
predict_likelihood_parameters=predict_likelihood_parameters,
data_transformers=data_transformers,
**predict_kwargs,
)

Expand Down Expand Up @@ -944,6 +968,7 @@ def retrain_func(
for idx, series_ in enumerate(outer_iterator):
past_covariates_ = past_covariates[idx] if past_covariates else None
future_covariates_ = future_covariates[idx] if future_covariates else None

if isinstance(sample_weight, str):
sample_weight_ = sample_weight
else:
Expand Down Expand Up @@ -1035,6 +1060,8 @@ def retrain_func(
else:
iterator = historical_forecasts_time_index[::stride]

# TODO: if not retrain, scale all the series in one go by fitting the transformer on data before "start"?
madtoinou marked this conversation as resolved.
Show resolved Hide resolved

# Either store the whole forecasts or only the last points of each forecast, depending on last_points_only
forecasts = []
last_points_times = []
Expand All @@ -1053,6 +1080,19 @@ def retrain_func(
if train_length_ and len(train_series) > train_length_:
train_series = train_series[-train_length_:]

if len(data_transformers) > 0:
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
# data transformers are retrained between iterations to avoid data-leakage
train_series, past_covariates_, future_covariates_ = (
_apply_data_transformers(
series=train_series,
past_covariates=past_covariates_,
future_covariates=future_covariates_,
data_transformers=data_transformers,
max_future_cov_lag=model.extreme_lags[5],
fit_transformers=True,
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
)
)

# testing `retrain` to exclude `False` and `0`
if (
retrain
Expand Down Expand Up @@ -1130,6 +1170,14 @@ def retrain_func(
show_warnings=show_predict_warnings,
**predict_kwargs,
)

# target transformer is either already fitted or fitted during the retraining
if (
"target" in data_transformers
and data_transformers["target"].invertible()
):
forecast = data_transformers["target"].inverse_transform(forecast)

show_predict_warnings = False

if forecast_components is None:
Expand Down Expand Up @@ -1193,6 +1241,9 @@ def backtest(
reduction: Union[Callable[..., float], None] = np.mean,
verbose: bool = False,
show_warnings: bool = True,
data_transformers: Optional[
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
Dict[str, Union[BaseDataTransformer, Pipeline]]
] = None,
metric_kwargs: Optional[Union[Dict[str, Any], List[Dict[str, Any]]]] = None,
fit_kwargs: Optional[Dict[str, Any]] = None,
predict_kwargs: Optional[Dict[str, Any]] = None,
Expand Down Expand Up @@ -1391,6 +1442,7 @@ def backtest(
last_points_only=last_points_only,
verbose=verbose,
show_warnings=show_warnings,
data_transformers=data_transformers,
fit_kwargs=fit_kwargs,
predict_kwargs=predict_kwargs,
sample_weight=sample_weight,
Expand Down Expand Up @@ -1533,6 +1585,9 @@ def gridsearch(
verbose=False,
n_jobs: int = 1,
n_random_samples: Optional[Union[int, float]] = None,
data_transformers: Optional[
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
Dict[str, Union[BaseDataTransformer, Pipeline]]
] = None,
fit_kwargs: Optional[Dict[str, Any]] = None,
predict_kwargs: Optional[Dict[str, Any]] = None,
sample_weight: Optional[Union[TimeSeries, str]] = None,
Expand Down Expand Up @@ -1716,6 +1771,16 @@ def gridsearch(
logger,
)

if data_transformers is None:
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
data_transformers = dict()
else:
data_transformers = {
key_: val_
if isinstance(val_, Pipeline)
else Pipeline(transformers=[val_], copy=True)
for key_, val_ in data_transformers.items()
}

if fit_kwargs is None:
fit_kwargs = dict()
if predict_kwargs is None:
Expand Down Expand Up @@ -1773,27 +1838,49 @@ def _evaluate_combination(param_combination) -> float:
last_points_only=last_points_only,
verbose=verbose,
show_warnings=show_warnings,
data_transformers=data_transformers,
fit_kwargs=fit_kwargs,
predict_kwargs=predict_kwargs,
sample_weight=sample_weight,
)
else: # split mode
if len(data_transformers) > 0:
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
series_, past_covariates_, future_covariates_ = (
_apply_data_transformers(
series=series,
past_covariates=past_covariates,
future_covariates=future_covariates,
data_transformers=data_transformers,
max_future_cov_lag=model.extreme_lags[5],
fit_transformers=True,
)
)
else:
series_ = series
past_covariates_ = past_covariates
future_covariates_ = future_covariates

model._fit_wrapper(
series=series,
past_covariates=past_covariates,
future_covariates=future_covariates,
series=series_,
past_covariates=past_covariates_,
future_covariates=future_covariates_,
sample_weight=sample_weight,
**fit_kwargs,
)
pred = model._predict_wrapper(
n=len(val_series),
series=series,
past_covariates=past_covariates,
future_covariates=future_covariates,
series=series_,
past_covariates=past_covariates_,
future_covariates=future_covariates_,
num_samples=1,
verbose=verbose,
**predict_kwargs,
)
if (
madtoinou marked this conversation as resolved.
Show resolved Hide resolved
"target" in data_transformers
and data_transformers["target"].invertible()
):
pred = data_transformers["target"].inverse_transform(pred)
error = metric(val_series, pred)

return float(error)
Expand Down Expand Up @@ -2560,6 +2647,7 @@ def _optimized_historical_forecasts(
verbose: bool = False,
show_warnings: bool = True,
predict_likelihood_parameters: bool = False,
data_transformers: Optional[Dict[str, BaseDataTransformer]] = None,
) -> Union[TimeSeries, Sequence[TimeSeries], Sequence[Sequence[TimeSeries]]]:
logger.warning(
"`optimized historical forecasts is not available for this model, use `historical_forecasts` instead."
Expand Down
5 changes: 5 additions & 0 deletions darts/models/forecasting/regression_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
from collections import OrderedDict
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Union

from darts.dataprocessing.pipeline import Pipeline

try:
from typing import Literal
except ImportError:
Expand Down Expand Up @@ -1320,6 +1322,7 @@ def _optimized_historical_forecasts(
verbose: bool = False,
show_warnings: bool = True,
predict_likelihood_parameters: bool = False,
data_transformers: Optional[Dict[str, Pipeline]] = None,
**kwargs,
) -> Union[TimeSeries, Sequence[TimeSeries], Sequence[Sequence[TimeSeries]]]:
"""
Expand Down Expand Up @@ -1357,6 +1360,7 @@ def _optimized_historical_forecasts(
show_warnings=show_warnings,
verbose=verbose,
predict_likelihood_parameters=predict_likelihood_parameters,
data_transformers=data_transformers,
**kwargs,
)
else:
Expand All @@ -1374,6 +1378,7 @@ def _optimized_historical_forecasts(
show_warnings=show_warnings,
verbose=verbose,
predict_likelihood_parameters=predict_likelihood_parameters,
data_transformers=data_transformers,
**kwargs,
)
return series2seq(hfc, seq_type_out=series_seq_type)
Expand Down
4 changes: 4 additions & 0 deletions darts/models/forecasting/torch_forecasting_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
from glob import glob
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, Union

from darts.dataprocessing.pipeline import Pipeline

try:
from typing import Literal
except ImportError:
Expand Down Expand Up @@ -2167,6 +2169,7 @@ def _optimized_historical_forecasts(
verbose: bool = False,
show_warnings: bool = True,
predict_likelihood_parameters: bool = False,
data_transformers: Optional[Dict[str, Pipeline]] = None,
**kwargs,
) -> Union[TimeSeries, Sequence[TimeSeries], Sequence[Sequence[TimeSeries]]]:
"""
Expand Down Expand Up @@ -2198,6 +2201,7 @@ def _optimized_historical_forecasts(
show_warnings=show_warnings,
verbose=verbose,
predict_likelihood_parameters=predict_likelihood_parameters,
data_transformers=data_transformers,
**kwargs,
)
return series2seq(forecasts_list, seq_type_out=series_seq_type)
Expand Down
Loading
Loading