-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] AutoGluon TimeSeries Prototype #494
Conversation
Some questions I have:
|
@sebhrusen I'd appreciate it if you can have a look, I have very limited availability due to a paper deadline. |
@PGijsbers sunny holidays right now: will look at it when I'm back next week. |
* fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access
Code example:
Log output:
|
@sebhrusen Sorry to ping but would you be interested in reviewing this PR? A large chunk of the logic was written by @limpbot who is interning with us currently, and it would be great if he received feedback so as not to block his time-series benchmarking efforts. |
@Innixma I'm looking at it now and will make a full review before Monday. |
Sounds good! I agree that we should make sure the input/ouput/scoring definitions are generic and not AG specific. perhaps the AutoPyTorch-TimeSeries folks (@dengdifan) would be interesting in reviewing / trying to add on their AutoML system as a framework extension to this logic? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this contribution, this is a first feedback.
I'm good with most of the file loading logic, and the added metrics.
For the "middle" layers like benchmark
and results
transformations, I'd like to avoid changes as much as possible there as they look more ad-hoc.
Also, please use the AutoGluon
framework instead of this new one, they don't seem to be different enough to require a completely different setup especially.
frameworks/AutoGluonTS/__init__.py
Outdated
if hasattr(dataset, 'timestamp_column') is False: | ||
dataset.timestamp_column = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this one and below
if hasattr(dataset, 'timestamp_column') is False: | |
dataset.timestamp_column = None | |
if not hasattr(dataset, 'timestamp_column'): | |
dataset.timestamp_column = None |
amlb/results.py
Outdated
@@ -255,7 +259,8 @@ def save_predictions(dataset: Dataset, output_file: str, | |||
predictions: Union[A, DF, S] = None, truth: Union[A, DF, S] = None, | |||
probabilities: Union[A, DF] = None, probabilities_labels: Union[list, A] = None, | |||
target_is_encoded: bool = False, | |||
preview: bool = True): | |||
preview: bool = True, | |||
quantiles: Union[A, DF] = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: let's try to group the params functionally, makes it easier to read and understand params. Here quantiles
has a function similar to probabilities
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
if 'y_past_period_error' in df.columns: | ||
return TimeSeriesResult(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't bypass test mode by adding your own test block: it should remain the first check and also be applied for time series. Not asking you to add the test dataset in our workflow right now, but we will need to add this soon after your changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
resources/benchmarks/ts.yaml
Outdated
@@ -0,0 +1,15 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename file to timeseries.yaml
: explicit is good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
resources/benchmarks/ts.yaml
Outdated
# s3://autogluon-ts-bench/data/covid_deaths/csv/test.csv | https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv | ||
target: ConfirmedCases # target | ConfirmedCases | ||
type: timeseries | ||
prediction_length: 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the length unit?
30 entries? days? hours?
if this is number of entries, then please rename to num_predictions
to avoid confusion.
otherwise, please allow a unit:
prediction_length: 30d #provide predictions over the next 30 days, accept d (days), m (months), y (years)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is the number of predictions per sequence. So 'num_predictions_per_id' sounds good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so as predictions is a more general term, I suppose forecast_range_in_steps
is sufficiently explicit?
amlb/results.py
Outdated
|
||
if dataset.type == DatasetType.timeseries: | ||
if quantiles is not None: | ||
quantiles = quantiles.reset_index(drop=True) | ||
df = pd.concat([df, quantiles], axis=1) | ||
|
||
period_length = 1 # TODO: This period length could be adapted to the Dataset, but then we need to pass this information as well. As of now this works. | ||
|
||
# we aim to calculate the mean period error from the past for each sequence: 1/N sum_{i=1}^N |x(t_i) - x(t_i - T)| | ||
# 1. retrieve item_ids for each sequence/item | ||
item_ids, inverse_item_ids = np.unique(dataset.test.X[dataset.id_column].squeeze().to_numpy(), return_index=False, return_inverse=True) | ||
# 2. capture sequences in a list | ||
y_past = [dataset.test.y.squeeze().to_numpy()[inverse_item_ids == i][:-dataset.prediction_length] for i in range(len(item_ids))] | ||
# 3. calculate period error per sequence | ||
y_past_period_error = [np.abs(y_past_item[period_length:] - y_past_item[:-period_length]).mean() for y_past_item in y_past] | ||
# 4. repeat period error for each sequence, to save one for each element | ||
y_past_period_error_rep = np.repeat(y_past_period_error, dataset.prediction_length) | ||
df = df.assign(y_past_period_error=y_past_period_error_rep) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not have this here, this looks like a lot of calculations + assumptions (apparently you can't have time series without an id_column
) for a method that just supposed to be save predictions into a standard format. Even more as this y_past_period_error
seems to be useful only for the mase
metric, therefore, either you can compute it with the metric or you compute it before (in AG framework integration).
For now, I'd move your computations to the __init__.py
or exec.py
file, and simply ensure that we can customize the result by adding optional
columns (in this case, this includes both quantiles
and your additional results).
Suggestion:
change signature to
def save_predictions(dataset: Dataset, output_file: str,
predictions: Union[A, DF, S] = None, truth: Union[A, DF, S] = None,
probabilities: Union[A, DF] = None, probabilities_labels: Union[list, A] = None,
optional_columns: Union[A, DF] = None,
target_is_encoded: bool = False,
preview: bool = True):
and automatically concatenate the optional_columns
to the predictions if provided. For now, you should be able to generate those in exec.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
frameworks/AutoGluonTS/setup.sh
Outdated
@@ -0,0 +1,36 @@ | |||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is the same setup as for default AutoGluon
, right?
Why create another framework then? It adds a lot of complexity regarding testing, distribution, docker images and so on...
In the AutoGluon.__init__.py
you could just fork the logic like this:
exec_file = "exec_ts.py" if dataset.type is DatasetType.timeseries else "exec.py"
return run_in_venv(__file__, exec_file,
input_data=data, dataset=dataset, config=config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One difference @limpbot, I install MXNet in addition to the other normal dependencies for TimeSeries, since it isn't a default install. We can simply install MXNet by default for now, hopefully it won't cause issues.
@sebhrusen The one concern is if AutoGluon becomes too monolithic of an install, we may want to consider having separate install logic for the different submodules that are unrelated to eachother (for example, timeseries doesn't need vision and text modules, tabular doesn't need timeseries module). Probably not needed now, but something to keep in mind since AutoGluon covers more data types/domains than most AutoML systems and that comes with many dependencies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Innixma I understand the concern.
for better encapsulation and to allow you to reuse code easily in benchmarks when using different submodules, I'd still advise to keep a single AutoGluon
folder.
You can then provide different flavors for the setup just in the framework definition using the setup_env
syntax:
AutoGluon_TimeSeries:
extends: AutoGluon
setup_env:
MODULE: timeseries
VAR: string
this makes the 2 variables MODULE
and VAR
directly available in setup.sh
(right after the call to . ${HERE}/../shared/setup.sh ${HERE} true
) and allows you to customize the setup: you may already be using this for dev environment.
Also, thanks to the definitions hierarchy (extends: AutoGluon
), maybe we can later tweak the results to make it appear as just AutoGluon
, or we can add a notion of group, whatever...
This may not be perfect when you switch frequently between definitions, but for now, I'd like to keep the framework folders to a minimum.
I agree with you that we probably need to start thinking about distinguishing the setup/exec for different kind of tasks, ideally it should be smooth and not even require the additional definition above: for a given type of task, the framework should be able to tell early if it can handle it, if not it could try to apply some additional setup before replying, and if it's ready handle it, then it continues as before: but all of this is much easier to change if there's already one single folder in the first place.
I can create an issue for this, although I don't have huge time to dedicate to AMLB lately, but this may change in a couple of months.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good to me, how can I access the variables MODULE and VAR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it :)
resources/frameworks.yaml
Outdated
#################################### | ||
### TimeSeries AutoML frameworks ### | ||
#################################### | ||
|
||
AutoGluonTS: | ||
version: "stable" | ||
description: | | ||
AutoGluon-TimeSeries | ||
project: https://auto.gluon.ai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, we don't want to have a new framework if it supports only a specific kind of dataset. See my comment above.
If we start to have one framework for regression, one for classification, one for time series, one for anomaly detection and so on... then if becomes hard to compare "AutoML" frameworks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it
amlb/benchmark.py
Outdated
if self._task_def.dataset['type'] == 'timeseries' and self._task_def.dataset['timestamp_column'] is None: | ||
log.warning("Warning: For timeseries task setting undefined timestamp column to `timestamp`.") | ||
self._task_def.dataset['timestamp_column'] = "timestamp" | ||
self._dataset = Benchmark.data_loader.load(DataSourceType.file, dataset=self._task_def.dataset, fold=self.fold, timestamp_column=self._task_def.dataset['timestamp_column']) | ||
if self._dataset.type == DatasetType.timeseries: | ||
if self._task_def.dataset['id_column'] is None: | ||
log.warning("Warning: For timeseries task setting undefined itemid column to `item_id`.") | ||
self._task_def.dataset['id_column'] = "item_id" | ||
if self._task_def.dataset['prediction_length'] is None: | ||
log.warning("Warning: For timeseries task setting undefined prediction length to `1`.") | ||
self._task_def.dataset['prediction_length'] = "1" | ||
self._dataset.timestamp_column=self._task_def.dataset['timestamp_column'] | ||
self._dataset.id_column=self._task_def.dataset['id_column'] | ||
self._dataset.prediction_length=self._task_def.dataset['prediction_length'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like most of this logic could reside in the loading logic itself as this is dealing with information available in self._task_def.dataset
which is directly available to the file loader.
I'd move the logic to dataset/file.py
for now to minimize scope of changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so you want me to extend the FileDataset or the CsvDataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can extract this logic in a dedicated method in file.py
for clarity (it's just mutating dataset
after all), and if you just support CVS right now, then please apply it only there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I added it to a dedicated method in file.py
inside the FileLoader
class.
amlb/datasets/file.py
Outdated
@@ -30,7 +30,7 @@ def __init__(self, cache_dir=None): | |||
self._cache_dir = cache_dir if cache_dir else tempfile.mkdtemp(prefix='amlb_cache') | |||
|
|||
@profile(logger=log) | |||
def load(self, dataset, fold=0): | |||
def load(self, dataset, fold=0, timestamp_column=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you obtained this new column using
timestamp_column=self._task_def.dataset['timestamp_column']
so you already have the information in the dataset
object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true
Thanks @sebhrusen for the detailed review! @limpbot would you like to have a go at addressing some of the comments? Feel free to send a PR to my branch as you did in your prior update. |
* fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit
I merged @limpbot's changes into this branch via his PR: Innixma#7 @sebhrusen The branch should be ready for 2nd round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes to the core logic are much smaller now, which is what I mostly care about for this feature in its current state and scope as I don't want to prevent you from moving forward.
I think it will be interesting for us (cc: @PGijsbers) to start thinking about supporting new kind of tasks, and see how we can integrate this smoothly (mixins after restructuring code?). Maybe even some kind of plugin logic (I have a PoC PR allowing the user to plug custom code in various places, mainly thinking about data loading, result metrics, and whatever the framework may need.
If you have any idea on your side, feel free to make suggestions in the https://github.com/openml/automlbenchmark/discussions or contribute directly.
frameworks/AutoGluon/setup.sh
Outdated
if [[ ${MODULE} == "timeseries" ]]; then | ||
PY -c "from autogluon.tabular.version import __version__; print(__version__)" >> "${HERE}/.setup/installed" | ||
# TODO: GPU version install | ||
PIP install "mxnet<2.0" | ||
else | ||
PY -c "from autogluon.timeseries.version import __version__; print(__version__)" >> "${HERE}/.setup/installed" | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you swapped tabular.version
and timeseries.version
here.
Can the versions actually be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, that got mixed-up. I don't think the versions should ever differ, but to be safe I will correct it and add it in a future pull request. Thank you for the reviews and merge @sebhrusen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default autogluon setup looks broken: see https://github.com/openml/automlbenchmark/actions/runs/3199648554/jobs/5225661120
apparently, the forecasting/timeseries
module being always installed currently, the mxnet
dependency is always required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can merge only once the default setup works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay thanks for the pointer, most likely because of the mixed-up version call. I am taking a look at it.
…series modularities (#8) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * swapped timeseries and tabular to set version * make warning message more explicit * remove outer context manipulation * split timeseries / tabular into functions
Thanks @sebhrusen for the detailed review! @limpbot has addressed some final comments in the latest update, which should also fix the autogluon.tabular error you mentioned. |
I missed the "mention" ping (just thought it said a "subscribed"), sorry I didn't check earlier. Definitely, I want to first wait for the JMLR reviews and finish "that part of the project", but creating a more flexible environment for people to add new types of tasks would be a great next thing that invites more people to use (and extend) the benchmark tool. Thanks Innixma and Limpbot for your contribution 🎉 |
* Add AutoGluon TimeSeries Prototype * AutoMLBenchmark TimeSeries Prototype. (#6) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * Update readme * Autogluon timeseries, addressed comments by sebhrusen (#7) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * Addressed minor comments, and fixed version call for tabular and timeseries modularities (#8) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * swapped timeseries and tabular to set version * make warning message more explicit * remove outer context manipulation * split timeseries / tabular into functions Co-authored-by: Leo <LeonhardSommer96@gmail.com>
* Add a workflow to tag latest `v*` release as `stable` (#399) Currenty limited to alphabetical ordering which means that any one number in the version can not exceed one digit. * Bump auto-sklearn to 0.14.0 (#400) * Update version to 2.0 * Revert "Update version to 2.0" This reverts commit 9e0791a. * Fix/docker tag (#404) * Add the version tag to the image name if present * Fix casing for MLNet framework definition * Sync stable-v2 and master (#407) * Update version to 2.0.2 * Revert version change * Add support for the OpenML test server (#423) * Add support for the OpenML test server * change domain from openmltestserver to test.openml * update error message * Apply suggestions from code review Co-authored-by: seb. <sebastien@h2o.ai> * fix syntax error due to online merging Co-authored-by: seb. <sebastien@h2o.ai> * Switch from release:created to release:published (#429) * Added support for dataset files stored on s3 (#420) * s3 functionality * Update amlb/datasets/fileutils.py Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl> * OOD * add s3n * move boto3 import Co-authored-by: Weisu Yin <weisuyin96@gmail.com> Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl> * Respect TMP, TMPDIR, TEMP (#442) * Respect tmpdir * Fixed submodule * feat: retain environment vars for framework venv * minor fix on compatibility (#454) Co-authored-by: Qingyun Wu <qxw5138@psu.edu> * Ignore decoding errors on Windows (#459) By default it can use cp1252 decoding which sometimes raises an error and halts the process. * Fix a typo (#462) will used -> will be used * Merge back stable-v2 to master (#472) * Add `stable` tag workflow, bump auto-sklearn (#401) * Add a workflow to tag latest `v*` release as `stable` (#399) Currenty limited to alphabetical ordering which means that any one number in the version can not exceed one digit. * Bump auto-sklearn to 0.14.0 (#400) * Fix/docker tag (#404) * Add the version tag to the image name if present * Fix casing for MLNet framework definition * Changed latest from master to main * Update version to 2.0.1 * Improv/aws meta (#413) * Add volume meta data to aws meta info * Add constraints for v2 benchmark (#415) * Add constraints for v2 benchmark For ease of reproducibility, we want to include our experimental setup in the constraints file. For our experiments we increase the volume size to 100gb and require gp3 volumes (general purpose SSD). * Update version to 2.0.2 * Fix AWS random cancel issue (#422) * let the job runner handle the rescheduling logic to ensure that the job is always can't be acted upon by current worker after being rescheduled * remove commented code * Add a GAMA configuration intended for benchmarking (#426) Made the previous version abstract to avoid accidentally running the wrong version of GAMA for the benchmark. * Unsparsify target variables for (Tuned)RF (#425) * Unsparsify target variables for (Tuned)RF Sparse targets are not supported in scikit-learn 0.24.2, and are used with tasks 360932 and 360933 (QSAR) in the benchmark. * cosmetic change to make de/serialization easier to debug Co-authored-by: Sebastien Poirier <sebastien@h2o.ai> * ensure that openml is configured when loading the tasks (#427) * Expect a possible `NoSuchProcess` error (#428) Since it's entirely possible that the processes were already terminating, but only completed termination between the process.children call and the proc.terminate/kill calls. * Reset version for versioning workflow * Update version to 2.0.3 * ensure that the docker images can be built from linux (#437) * Avoid querying terminated instance with CloudWatch (#438) * fixes #432 add precision to runtimes in results.csv (#433) * fixes #432 add precision to runtimes in results.csv * Update amlb/results.py Co-authored-by: seb. <sebastien@h2o.ai> Co-authored-by: seb. <sebastien@h2o.ai> * Iteratively build the forest to honor constraints (#439) * Iteratively build the forest to honor constraints In particular depending on the dataset size either memory or time constraints can become a problem which makes it unreliable as a baseline. Gradually growing the forest sidesteps both issues. * Make iterative fit default, parameterize execution * Step_size as script parameter, safer check if done When final_forest_size is not an exact multiple of step_size, randomforest should still terminate. Additionally step_size is escaped with an underscore as it is not a RandomForestEstimator hyperparameter. * Iterative fit for TunedRandomForest to meet memory and time constraints (#441) * Iterative fit to meet memory and time constraints Specifically for each value of `max_features` to try, an equal time budget is alloted, with one additional budget being reserved for the final fit. This does mean that different `max_features` can lead to different number of trees, but it keeps it simple. * Abort tuning when close to total time budget The first fit of each iterative fit for a `max_features` value was not guarded, which can lead to exceeding the total time budget. This adds a check before the first fit to estimate whether the budget will be exceeded, and if so aborting further tuning and continue with the final fit. * Make k_folds configurable * Add scikit-learn code with explanation * Modify cross_validate, allow 1 estimator per split This is useful when we maintain a warm_started model for each individual split. * Use custom cv function to allow warm-start By default estimators are cloned in any scikit-learn cross_validate function (which stops warm-start) and it is not possible to specify a specific estimator-object per fold (which stops warm-start). The added custom_validate module makes changes to the scikit-learn code to allow warm-starting to work in conjunction with the cross-validate functionality. For more info see scikit-learn#22044 and scikit-learn#22087. * Add parameter to set tune time, rest is for fit The previous iteration where the final fit was treated as an equivalent budget to any other optimization sometimes left too little time to train the final forest, in particular when the last fit took longer than expected. This would often lead to very small forests for the final model. The new system guarantees roughly 10% of budget for the final forest, guaranteeing a better final fit. * Revert version to _dev_version to prepare release (#444) * Update version to 2.0.4 * Signal to encode predictions as proba now works (#447) In a previous iteration it was encoded as a numpy file, but now it's serialized to JSON which means that results.probabilities is simply a string if imputation is required. * Monkeypatch openml to keep whitespace in features (#446) Technically monkeypatch xmltodict function used by openml when reading the features xml * fixe for mlr3automl (#443) * Reset version for Github workflow (#448) * Update version to 2.0.5 * Update mlr3automl to latest Was supposed to be included with #443 * Update MLR3 (#461) * Reset version for version bump * Updatet version because GA failed * Issue 416: fixing versioning workflow for releases and merges to master (#468) * change workflow to correctly modify the app version on releases and when forcing merged version back to master * protect main branch from accidental releases * fix stress test Co-authored-by: PGijsbers <p.gijsbers@tue.nl> Co-authored-by: eddiebergman <eddiebergmanhs@gmail.com> Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Erin LeDell <erin@h2o.ai> Co-authored-by: Stefan Coors <stefan.coors@gmx.net> * useless workflow reintroduced during merge (#475) * tag all AWS entities (#469) * fixed parsing of int targets when loading file in CSV format (#467) * Avoid root owned files from docker (#464) * New site (#479) * First draft of new website * Add framework descriptions, papers and logos * Update footer with Github link * Remove under construction banner * Add redirect from old page to new one * Update page title * Add text links to new paper to be added later * Move static site to /docs * Whitelist documentation images * Remove temporary work directory * Add documentation images * Place holder for mobile * Move old notebooks and visualizations To make sure they are not confusing for new users, as these will no longer work out-of-the-box. New notebooks will be added soon but I don't have the files available right now. * Tell github this is not Jekyll * Update minimal responsiveness (#480) * Make results responsive (hacky) * Make Frameworks page more responsive * Make Home more responsive * Bare minimum mobile navbar * Make sure phones report fake width * Link to arxiv paper (#481) * Update to support AutoGluon v0.4 (#455) * Update to support AutoGluon v0.4 * Address comments * Updated setup.py for `hyperoptsklearn` as it no longer uses PyPi (also now accepts shas) (#410) * Updated hyper opt not to use PyPi and accept shas * case-sensitive PIP command in setup Co-authored-by: Sebastien Poirier <sebastien@h2o.ai> * AutoGluon TimeSeries Support (first version) (#494) * Add AutoGluon TimeSeries Prototype * AutoMLBenchmark TimeSeries Prototype. (#6) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * Update readme * Autogluon timeseries, addressed comments by sebhrusen (#7) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * Addressed minor comments, and fixed version call for tabular and timeseries modularities (#8) * fixed loading test & train, changed pred.-l. 5->30 * ignore launch.json of vscode * ensuring timestamp parsing * pass config, save pred, add results * remove unused code * add readability, remove slice from timer * ensure autogluonts has required info * add comments for readability * setting defaults for timeseries task * remove outer context manipulation * corrected spelling error for quantiles * adding mape, correct available metrics * beautify config options * fixed config for public access * no outer context manipulation, add dataset subdir * add more datasets * include error raising for too large pred. length. * mergin AutoGluonTS framework folder into AutoGluon * renaming ts.yaml to timeseries.yaml, plus ext. * removing presets, correct latest config for AGTS * move dataset timeseries ext to datasets/file.py * dont bypass test mode * move quantiles and y_past_period_error to opt_cols * remove whitespaces * deleting merge artifacts * delete merge artifacts * renaming prediction_length to forecast_range_in_steps * use public dataset, reduced range to maximum * fix format string works * fix key error bug, remove magic time limit * swapped timeseries and tabular to set version * make warning message more explicit * remove outer context manipulation * split timeseries / tabular into functions Co-authored-by: Leo <LeonhardSommer96@gmail.com> * Add workflow to manually run `runbenchmark.py` on Github Actions (#516) * Add workflow for manually running a test benchmark * Use built-in context for getting the branch * Add more info to step names * Add ability to specify options * Fixed user and sudo under docker (#495) * Fixed user and sudo under docker * Reverted format * Update docker.py * Addressing #497 #497 * Keep wget quiet * Use :, . is deprecated Co-authored-by: seb. <sebastien@h2o.ai> * Set username and userid in Dockerfile generation * Install HDF5 to Docker for tables * Avoid using unix-specific workarounds on Windows * Re-enable caching for building docker images --------- Co-authored-by: seb. <sebastien@h2o.ai> Co-authored-by: PGijsbers <p.gijsbers@tue.nl> * [no-ci] Fix broken link (#514) * Remove autoxgboost, add `removed` field for frameworks (#519) * Add redirect for dataset page (#521) * Upgrade Python version and dependencies (#520) * Remove usage of np.float alias and just use float * Bump to Py3.9 * Update requirements for March 2023, Py3.9 * Pin packaging, since LegacyVersion was removed. Also remove scipy pin, since later autosklearn needs higher scipy. * Install packages to ranger/lib * Set secret PAT used when installing with R remotes Specifically for mlr3automl integration * Update usage for oct 21 release * Disable custom installed packages * Remove installation of reqiurements altogether * Insert oboe example * Add monkeypatch * Make error matrix numpy array * Upgrade to Ubuntu 22.04 from 18.04 * Update pip cache to look at 3.9 directory * Add Github PAT to run_all_frameworks script * bump github action versions * Adding tarfile member sanitization to extractall() (#508) * Included lightautoml in frameworks_stable (#412) * Included lightautoml in frameworks_stable * Added MLNet to frameworks_latest * Added mlr3 to both stable and latest * copy/paste fix * Remove travis file (#529) * Remove travis file since it is not used * Update readme to reflect Python 3.9 support * Add github action workflow to replace old travis file * Add job id, improve name * Fix bug where task inference would lead to KeyError * Update type data for new openml/pandas Probably ought to remove the specific check if we don't enforce it. * Write numeric categories as str, see renatopp/liac-arff/issues/126 * [Open for review] Store results after each job completion (#526) * ensure that results are solved progressively in all situations instead of only when all jobs are completed * rename config flag * don't forget to cleanup job runner exec thread * Improve type hints * Adding file lock on global results file (#453) * adding file lock on global results file * fix imports * fix amlb.utils export * cosmetic * clranup util imports (also magic strings) + remove ruamel dependency in subprocesses --------- Co-authored-by: Sebastien Poirier <sebastien@h2o.ai> * Update the requirements files to exclude yaml and include filelock The remainder of dependencies are not re-generated to avoid additional changes in the PR. * Add missing import * Add fallback for when job is not started * Return an empty dataframe if dataframe is empty This avoids a bug where an empty dataframe is indexed. * Inform the user result summary is not available in AWS mode As results are processed in a different manner (files are directly copied over from S3). This avoids a bug where a benchmark results.csv file tries to be accessed. * Separate scoreboard generation to two lines instead Which makes it easier to tell which part of the generation generates an error, if any. * re-enable logging * Provide a warning and return early if no process output is detected This avoids potentially crashing if the logging is configured incorrectly. In the future, we should expand this to first check how logging is configured in order to see whether or not the issue should be reported and possibly give a more detailed warning if it is likely the cause of an error. --------- Co-authored-by: Sebastien Poirier <sebastien@h2o.ai> Co-authored-by: seb <sebastien.poirier@h2o.ai> * maint: upgrade AMI to Ubuntu 22.04 #512 (#525) * Add `flaml_benchmark` (#528) * dont discard setup_args if it already is a list * Add flaml and flaml_benchmark It is not added to latest since install from latest seems to be broken * Set up alternative way for benchmark mode of flaml This is only temporarily allowed - we expect an easily configurable algorithm, instead of having to carefully install specific dependencies. * limit install, since >2 incompatible * Measure inference time (#532) Add the option to measure inference time (disabled by default) for most frameworks. For those frameworks, inference time is measured capturing both the data loading and the inference. This is done to make things more equal between the different frameworks (as some _need_ to read the file if they don't operator in Python). Inference time is measured multiple times for different batch sizes (configurable). By default, the median is reported in the results file (as it is less sensitive to e.g., cold-starts) but all measured inference times are stored in the predictions folder of a run. For Python frameworks, inference time for in-memory single row predictions is also measured. * Upload to OpenML (#523) Adds a script that allows uploading run results to openml. Additional metadata is stored in the task information to be able to provide a complete description for openml upload. Additional parameters are added to `run_benchmark` to allow runs to automatically be tagged, and to connect to the test server. Also fixes TPOT integration for newer versions, where if a model has no `predict_proba` an `AttributeError` is raised instead of a `RuntimeError`. * Fix a race condition of checking vs adding results (#535) Specifically, adding results was queued in a job executor, while checking results was directly called by the worker threads. If the worker thread checks before the executor had added results, it is possible to get into a deadlock condition. The deadlock arises from the fact that the `stop` condition is never called and the main thread will continue to wait for its END_Q signal. * Add scikit_safe inference time measurement files (#537) * Add scikit_safe inference time measurement files These files have categorical values numerically encoded and missing values imputed, which makes them usable for any scikit-learn algo. * Only generate inference measurement files if enabled * Optionally limit inference time measurements by dataset size (#538) * Add versions 2023 q2 (#539) * Fix versions for June 2023 benchmark * Add 2023Q2 framework tag * Use encoded values for inference * Add us-east-2 AMI * Run docker as root on AWS * Add option to add build options for docker build command * Remove 'infer_speed' artifact as it is not supported in main repo * Fix pandas 2 not compatible with autosklearn 2 see askl#1672 --------- Co-authored-by: github-actions <github-actions@github.com> Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> Co-authored-by: seb. <sebastien@h2o.ai> Co-authored-by: Weisu Yin <weisy@amazon.com> Co-authored-by: Weisu Yin <weisuyin96@gmail.com> Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> Co-authored-by: Qingyun Wu <qw2ky@virginia.edu> Co-authored-by: Qingyun Wu <qxw5138@psu.edu> Co-authored-by: Robinnibor <robinksskss@gmail.com> Co-authored-by: Erin LeDell <erin@h2o.ai> Co-authored-by: Stefan Coors <stefan.coors@gmx.net> Co-authored-by: Alan Silva <3899850+alanwilter@users.noreply.github.com> Co-authored-by: Nick Erickson <neerick@amazon.com> Co-authored-by: Leo <LeonhardSommer96@gmail.com> Co-authored-by: TrellixVulnTeam <112716341+TrellixVulnTeam@users.noreply.github.com> Co-authored-by: seb <sebastien.poirier@h2o.ai>
[Don't merge this PR]
This PR is a proof of concept of time series data and framework support in AutoMLBenchmark.
To run, follow the instructions in the newly added
frameworks/AutoGluonTS/README.md
.