Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] AutoGluon TimeSeries Prototype #494

Merged
merged 5 commits into from
Oct 10, 2022

Conversation

Innixma
Copy link
Collaborator

@Innixma Innixma commented Sep 13, 2022

[Don't merge this PR]

This PR is a proof of concept of time series data and framework support in AutoMLBenchmark.

To run, follow the instructions in the newly added frameworks/AutoGluonTS/README.md.

@Innixma
Copy link
Collaborator Author

Innixma commented Sep 13, 2022

@sebhrusen @PGijsbers

Some questions I have:

  1. [Solved in AutoMLBenchmark TimeSeries Prototype. Innixma/automlbenchmark#6] Is there a way to specify information such as prediction_length=5 on a per dataset basis? prediction_length is the look-ahead requirement for prediction and dictates the difficulty of the task. I'm wondering if I can specify it as part of the yaml file definition of the dataset in ts.yaml. Ditto for a couple other things like timestamp_column="Date" and item_id="name".

  2. [Solved in AutoMLBenchmark TimeSeries Prototype. Innixma/automlbenchmark#6] How can I update and specify the logic that does the final scoring based on predictions and truth? It needs to be altered to work with TimeSeries. Additionally, it may take a different form, such as if the metric requires quantile predictions to calculate.

@PGijsbers PGijsbers marked this pull request as draft September 14, 2022 09:58
@PGijsbers
Copy link
Collaborator

@sebhrusen I'd appreciate it if you can have a look, I have very limited availability due to a paper deadline.

@sebhrusen
Copy link
Collaborator

@PGijsbers sunny holidays right now: will look at it when I'm back next week.

@sebhrusen sebhrusen self-requested a review September 19, 2022 15:38
* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access
@Innixma
Copy link
Collaborator Author

Innixma commented Sep 21, 2022

Update: Several of the TODO / FIXME comments have been addressed by @limpbot in Innixma#6

@Innixma
Copy link
Collaborator Author

Innixma commented Sep 21, 2022

Code example:

python3 runbenchmark.py autogluonts ts test

Log output:

Running benchmark `autogluonts` on `ts` framework in `local` mode.
Loading frameworks definitions from ['/Users/neerick/workspace/code/automlbenchmark/resources/frameworks.yaml'].
Loading benchmark constraint definitions from ['/Users/neerick/workspace/code/automlbenchmark/resources/constraints.yaml'].
Loading benchmark definitions from /Users/neerick/workspace/code/automlbenchmark/resources/benchmarks/ts.yaml.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 21.7%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 64.3%

-----------------------------------------------
Starting job local.ts.test.covid.0.AutoGluonTS.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 55.8%
Assigning 4 cores (total=12) for new task covid.
Assigning 3803 MB (total=16384 MB) for new covid task.
Using training set /Users/neerick/.openml/train.csv with test set /Users/neerick/.openml/test.csv.
Running task covid on framework AutoGluonTS with config:
TaskConfig({'framework': 'AutoGluonTS', 'framework_params': {}, 'framework_version': '0.5.2', 'type': 'timeseries', 'name': 'covid', 'fold': 0, 'metric': 'mase', 'metrics': ['mase', 'mape', 'smape', 'rmse', 'mse', 'nrmse', 'wape', 'ncrps'], 'seed': 949238273, 'job_timeout_seconds': 1200, 'max_runtime_seconds': 600, 'cores': 4, 'max_mem_size_mb': 3803, 'min_vol_size_mb': -1, 'input_dir': '/Users/neerick/.openml', 'output_dir': '/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514', 'output_predictions_file': '/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/predictions.csv', 'ext': {}, 'type_': 'timeseries', 'output_metadata_file': '/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/metadata.json'})
Running cmd `/Users/neerick/workspace/code/automlbenchmark/frameworks/AutoGluonTS/venv/bin/python -W ignore /Users/neerick/workspace/code/automlbenchmark/frameworks/AutoGluonTS/exec.py`

**** AutoGluon TimeSeries [v0.5.2] ****

Warning: path already exists! This predictor may overwrite an existing predictor! path="/var/folders/cn/t0r03w5d3nldq9n5h65wd29c0000gs/T/tmpsbe45bwi/"
Learner random seed set to 0
================ TimeSeriesPredictor ================
TimeSeriesPredictor.fit() called
Fitting with arguments:
{'evaluation_metric': 'MASE',
 'hyperparameter_tune_kwargs': None,
 'hyperparameters': 'default',
 'prediction_length': 30,
 'target_column': 'ConfirmedCases',
 'time_limit': 600}
Provided training data set with 22536 rows, 313 items. Average time series length is 72.0.
Training artifacts will be saved to: /private/var/folders/cn/t0r03w5d3nldq9n5h65wd29c0000gs/T/tmpsbe45bwi
=====================================================
Validation data is None, will hold the last prediction_length 30 time steps out to use as validation set.
AutoGluon will save models to /var/folders/cn/t0r03w5d3nldq9n5h65wd29c0000gs/T/tmpsbe45bwi/

Starting training. Start time is 2022-09-21 09:25:33
Models that will be trained: ['AutoETS', 'ARIMA', 'SimpleFeedForward', 'DeepAR', 'Transformer']
Training timeseries model AutoETS. Training for up to 599.36s of the 599.36s of remaining time.
        -4261.6502    = Validation score (-MASE)
        7.06    s     = Training runtime
        23.90   s     = Validation (prediction) runtime
Training timeseries model ARIMA. Training for up to 568.20s of the 568.20s of remaining time.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 22.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 64.1%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 55.8%
        -4291.2952    = Validation score (-MASE)
        36.87   s     = Training runtime
        49.88   s     = Validation (prediction) runtime
Training timeseries model SimpleFeedForward. Training for up to 480.49s of the 480.49s of remaining time.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 29.3%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 66.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 55.8%
        -4319.9065    = Validation score (-MASE)
        100.00  s     = Training runtime
        2.43    s     = Validation (prediction) runtime
Training timeseries model DeepAR. Training for up to 378.04s of the 378.04s of remaining time.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 19.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 66.3%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 55.8%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 14.5%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 65.1%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 12.8%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 65.2%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
        -4332.0235    = Validation score (-MASE)
        380.97  s     = Training runtime
        10.45   s     = Validation (prediction) runtime
Stopping training due to lack of time remaining. Time left: -13.39 seconds
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 12.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 68.7%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
Fitting simple weighted ensemble.
        -4261.6502    = Validation score (-MASE)
        138.62  s     = Training runtime
        23.90   s     = Validation (prediction) runtime
Training complete. Models trained: ['AutoETS', 'ARIMA', 'SimpleFeedForward', 'DeepAR', 'WeightedEnsemble']
Total runtime: 816.54 s
Best model: AutoETS
Best model score: -4261.6502
Model not specified in predict, will default to the model with the best validation score: AutoETS
Different set of items than those provided during training were provided for prediction. The model AutoETS will be re-trained on newly provided data
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 14.2%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 60.5%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
                              mean        0.1  ...        0.8        0.9
item_id      timestamp                         ...                      
Afghanistan_ 2020-03-23  43.673204  40.929207  ...  45.475244  46.417202
             2020-03-24  47.477861  43.269943  ...  50.241288  51.685780
             2020-03-25  51.282519  45.705039  ...  54.945364  56.859998
             2020-03-26  55.087176  48.146160  ...  59.645483  62.028192
             2020-03-27  58.891833  50.563691  ...  64.361095  67.219975
...                            ...        ...  ...        ...        ...
Zimbabwe_    2020-04-17  16.572826   8.359642  ...  21.966592  24.786010
             2020-04-18  17.094855   8.458588  ...  22.766468  25.731121
             2020-04-19  17.616884   8.550552  ...  23.570930  26.683216
             2020-04-20  18.138913   8.635642  ...  24.379906  27.642183
             2020-04-21  18.660942   8.713965  ...  25.193326  28.607919

[9390 rows x 10 columns]
[43.67320426 47.47786141 51.28251855 ... 17.61688379 18.13891286
 18.66094192]
[40. 74. 84. ... 25. 25. 28.]
Additional data provided, testing on additional data. Resulting leaderboard will be sorted according to test score (`score_test`).
Different set of items than those provided during training were provided for prediction. The model AutoETS will be re-trained on newly provided data
Different set of items than those provided during training were provided for prediction. The model ARIMA will be re-trained on newly provided data
Different set of items than those provided during training were provided for prediction. The model AutoETS will be re-trained on newly provided data
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 12.4%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 62.7%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
               model  score_test    score_val  pred_time_test  pred_time_val  fit_time_marginal  fit_order
0   WeightedEnsemble -444.037098 -4261.650234       29.051365      23.899191         138.624613          5
1            AutoETS -444.037098 -4261.650234       25.333331      23.899191           7.057499          1
2              ARIMA -475.878400 -4291.295201       51.673333      49.880237          36.868394          2
3  SimpleFeedForward -526.892250 -4319.906528        1.442273       2.432864          99.998205          3
4             DeepAR -591.905430 -4332.023525        9.755238      10.447713         380.970382          4
Terminating process psutil.Process(pid=25939, name='Python', status='running', started='09:27:32').
Killing process psutil.Process(pid=25939, name='Python', status='running', started='09:27:32').
Early stopping based on learning rate scheduler callback (min_lr was reached).
Traceback (most recent call last):

  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/multiprocessing/resource_tracker.py", line 201, in main

    cache[rtype].remove(name)

KeyError: '/loky-25767-o7csihc6'



Predictions preview:
     predictions  truth        0.1        0.2         0.3         0.4         0.5         0.6         0.7         0.8         0.9  y_past_period_error
0     43.673204   40.0  40.929207  41.871165   42.550383   43.130749   43.673204   44.215659   44.796026   45.475244   46.417202             0.666667
1     47.477861   74.0  43.269943  44.714435   45.756015   46.646007   47.477861   48.309715   49.199707   50.241288   51.685780             0.666667
2     51.282519   84.0  45.705039  47.619673   49.000259   50.179919   51.282519   52.385118   53.564778   54.945364   56.859998             0.666667
3     55.087176   94.0  48.146160  50.528868   52.246968   53.715022   55.087176   56.459330   57.927383   59.645483   62.028192             0.666667
4     58.891833  110.0  50.563691  53.422571   55.484025   57.245461   58.891833   60.538205   62.299641   64.361095   67.219975             0.666667
5     62.696490  110.0  52.944972  56.292468   58.706248   60.768734   62.696490   64.624246   66.686732   69.100512   72.448007             0.666667
6     66.501147  120.0  55.284052  59.134650   61.911203   64.283664   66.501147   68.718630   71.091092   73.867644   77.718243             0.666667
7     70.305804  170.0  57.578101  61.947260   65.097731   67.789693   70.305804   72.821916   75.513877   78.664349   83.033508             0.666667
8     74.110461  174.0  59.825896  64.729494   68.265333   71.286577   74.110461   76.934346   79.955590   83.491429   88.395027             0.666667
9     77.915119  237.0  62.027091  67.481124   71.413866   74.774249   77.915119   81.055988   84.416371   88.349113   93.803146             0.666667
10    81.719776  273.0  64.181835  70.202250   74.543392   78.252739   81.719776   85.186813   88.896159   93.237302   99.257717             0.666667
11    85.524433  281.0  66.290563  72.893156   77.654089   81.722132   85.524433   89.326734   93.394776   98.155710  104.758302             0.666667
12    89.329090  299.0  68.353876  75.554236   80.746202   85.182546   89.329090   93.475634   97.911978  103.103944  110.304304             0.666667
13    93.133747  349.0  70.372463  78.185944   83.820014   88.634119   93.133747   97.633375  102.447480  108.081550  115.895032             0.666667
14    96.938404  367.0  72.347060  80.788763   86.875826   92.076996   96.938404  101.799813  107.000983  113.088045  121.529748             0.666667
15   100.743061  423.0  74.278423  83.363190   89.913946   95.511325  100.743061  105.974797  111.572177  118.122933  127.207700             0.666667
16   104.547719  444.0  76.167305  85.909718   92.934683   98.937257  104.547719  110.158180  116.160754  123.185719  132.928132             0.666667
17   108.352376  484.0  78.014453  88.428839   95.938344  102.354939  108.352376  114.349813  120.766408  128.275912  138.690298             0.666667
18   112.157033  521.0  79.820595  90.921030   98.925224  105.764514  112.157033  118.549552  125.388841  133.393036  144.493471             0.666667
19   115.961690  555.0  81.586437  93.386755  101.895615  109.166122  115.961690  122.757258  130.027765  138.536625  150.336943             0.666667

Predictions saved to `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/predictions.csv`.
Loading metadata from `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/metadata.json`.
fatal: not a git repository (or any of the parent directories): .git

Loading predictions from `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/predictions/covid/0/predictions.csv`.
Metric scores: { 'app_version': 'dev [NA, NA, NA]',
  'constraint': 'test',
  'duration': nan,
  'fold': 0,
  'framework': 'AutoGluonTS',
  'id': 'covid',
  'info': None,
  'mape': 0.47176599878084985,
  'mase': 444.03709806992947,
  'metric': 'neg_mase',
  'mode': 'local',
  'models_count': 5,
  'mse': 66512955.519554704,
  'ncrps': 3.7180833818575727,
  'nrmse': 1.8264137433841354,
  'params': '',
  'predict_duration': 24.022034168243408,
  'result': -444.03709806992947,
  'rmse': 8155.547530335085,
  'seed': 949238273,
  'smape': 0.6795078347334532,
  'task': 'covid',
  'training_duration': 817.2716138362885,
  'type': 'timeseries',
  'utc': '2022-09-21T16:41:38',
  'version': '0.5.2',
  'wape': 0.4395118445505013}
Job `local.ts.test.covid.0.AutoGluonTS` executed in 984.191 seconds.
All jobs executed in 984.232 seconds.
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] CPU Utilization: 16.7%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Memory Usage: 59.6%
[MONITORING] [local.ts.test.covid.0.AutoGluonTS] Disk Usage: 56.0%
Processing results for autogluonts.ts.test.local.20220921T162514
Scores saved to `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/scores/AutoGluonTS.benchmark_ts.csv`.
Scores saved to `/Users/neerick/workspace/code/tmp_amlb/results/autogluonts.ts.test.local.20220921T162514/scores/results.csv`.
Scores saved to `/Users/neerick/workspace/code/tmp_amlb/results/results.csv`.
Summing up scores for current run:
   id  task fold   framework constraint   result   metric  duration      seed
covid covid    0 AutoGluonTS       test -444.037 neg_mase     984.2 949238273

@Innixma
Copy link
Collaborator Author

Innixma commented Sep 29, 2022

@sebhrusen Sorry to ping but would you be interested in reviewing this PR? A large chunk of the logic was written by @limpbot who is interning with us currently, and it would be great if he received feedback so as not to block his time-series benchmarking efforts.

@sebhrusen
Copy link
Collaborator

sebhrusen commented Sep 30, 2022

@Innixma I'm looking at it now and will make a full review before Monday.
Outside implementation details/modularity, I mainly want to be sure that it is not designed to first satisfy AG's timeseries implementation and can be generalized to other implementations (would be nice to have an alternative implementation): for now to satisfy your needs, I'll mainly ensure that the changes are limited to data loading + AG implementation as much as possible.

@Innixma
Copy link
Collaborator Author

Innixma commented Sep 30, 2022

Sounds good! I agree that we should make sure the input/ouput/scoring definitions are generic and not AG specific. perhaps the AutoPyTorch-TimeSeries folks (@dengdifan) would be interesting in reviewing / trying to add on their AutoML system as a framework extension to this logic?

Copy link
Collaborator

@sebhrusen sebhrusen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this contribution, this is a first feedback.
I'm good with most of the file loading logic, and the added metrics.
For the "middle" layers like benchmark and results transformations, I'd like to avoid changes as much as possible there as they look more ad-hoc.

Also, please use the AutoGluon framework instead of this new one, they don't seem to be different enough to require a completely different setup especially.

Comment on lines 13 to 14
if hasattr(dataset, 'timestamp_column') is False:
dataset.timestamp_column = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this one and below

Suggested change
if hasattr(dataset, 'timestamp_column') is False:
dataset.timestamp_column = None
if not hasattr(dataset, 'timestamp_column'):
dataset.timestamp_column = None

amlb/results.py Outdated
@@ -255,7 +259,8 @@ def save_predictions(dataset: Dataset, output_file: str,
predictions: Union[A, DF, S] = None, truth: Union[A, DF, S] = None,
probabilities: Union[A, DF] = None, probabilities_labels: Union[list, A] = None,
target_is_encoded: bool = False,
preview: bool = True):
preview: bool = True,
quantiles: Union[A, DF] = None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: let's try to group the params functionally, makes it easier to read and understand params. Here quantiles has a function similar to probabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

Comment on lines +231 to +232
if 'y_past_period_error' in df.columns:
return TimeSeriesResult(df)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't bypass test mode by adding your own test block: it should remain the first check and also be applied for time series. Not asking you to add the test dataset in our workflow right now, but we will need to add this soon after your changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

@@ -0,0 +1,15 @@
---
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please rename file to timeseries.yaml : explicit is good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

# s3://autogluon-ts-bench/data/covid_deaths/csv/test.csv | https://autogluon.s3-us-west-2.amazonaws.com/datasets/CovidTimeSeries/test.csv
target: ConfirmedCases # target | ConfirmedCases
type: timeseries
prediction_length: 30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the length unit?
30 entries? days? hours?
if this is number of entries, then please rename to num_predictions to avoid confusion.
otherwise, please allow a unit:
prediction_length: 30d #provide predictions over the next 30 days, accept d (days), m (months), y (years)...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the number of predictions per sequence. So 'num_predictions_per_id' sounds good?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so as predictions is a more general term, I suppose forecast_range_in_steps is sufficiently explicit?

amlb/results.py Outdated
Comment on lines 317 to 334

if dataset.type == DatasetType.timeseries:
if quantiles is not None:
quantiles = quantiles.reset_index(drop=True)
df = pd.concat([df, quantiles], axis=1)

period_length = 1 # TODO: This period length could be adapted to the Dataset, but then we need to pass this information as well. As of now this works.

# we aim to calculate the mean period error from the past for each sequence: 1/N sum_{i=1}^N |x(t_i) - x(t_i - T)|
# 1. retrieve item_ids for each sequence/item
item_ids, inverse_item_ids = np.unique(dataset.test.X[dataset.id_column].squeeze().to_numpy(), return_index=False, return_inverse=True)
# 2. capture sequences in a list
y_past = [dataset.test.y.squeeze().to_numpy()[inverse_item_ids == i][:-dataset.prediction_length] for i in range(len(item_ids))]
# 3. calculate period error per sequence
y_past_period_error = [np.abs(y_past_item[period_length:] - y_past_item[:-period_length]).mean() for y_past_item in y_past]
# 4. repeat period error for each sequence, to save one for each element
y_past_period_error_rep = np.repeat(y_past_period_error, dataset.prediction_length)
df = df.assign(y_past_period_error=y_past_period_error_rep)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have this here, this looks like a lot of calculations + assumptions (apparently you can't have time series without an id_column) for a method that just supposed to be save predictions into a standard format. Even more as this y_past_period_error seems to be useful only for the mase metric, therefore, either you can compute it with the metric or you compute it before (in AG framework integration).

For now, I'd move your computations to the __init__.py or exec.py file, and simply ensure that we can customize the result by adding optional columns (in this case, this includes both quantiles and your additional results).

Suggestion:
change signature to

 def save_predictions(dataset: Dataset, output_file: str,
                         predictions: Union[A, DF, S] = None, truth: Union[A, DF, S] = None,
                         probabilities: Union[A, DF] = None, probabilities_labels: Union[list, A] = None,
                         optional_columns: Union[A, DF] = None,
                         target_is_encoded: bool = False,
                         preview: bool = True):

and automatically concatenate the optional_columns to the predictions if provided. For now, you should be able to generate those in exec.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

@@ -0,0 +1,36 @@
#!/usr/bin/env bash
Copy link
Collaborator

@sebhrusen sebhrusen Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is the same setup as for default AutoGluon, right?
Why create another framework then? It adds a lot of complexity regarding testing, distribution, docker images and so on...
In the AutoGluon.__init__.py you could just fork the logic like this:

exec_file = "exec_ts.py" if dataset.type is DatasetType.timeseries else "exec.py"
return run_in_venv(__file__, exec_file,
                       input_data=data, dataset=dataset, config=config)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference @limpbot, I install MXNet in addition to the other normal dependencies for TimeSeries, since it isn't a default install. We can simply install MXNet by default for now, hopefully it won't cause issues.

@sebhrusen The one concern is if AutoGluon becomes too monolithic of an install, we may want to consider having separate install logic for the different submodules that are unrelated to eachother (for example, timeseries doesn't need vision and text modules, tabular doesn't need timeseries module). Probably not needed now, but something to keep in mind since AutoGluon covers more data types/domains than most AutoML systems and that comes with many dependencies.

Copy link
Collaborator

@sebhrusen sebhrusen Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Innixma I understand the concern.
for better encapsulation and to allow you to reuse code easily in benchmarks when using different submodules, I'd still advise to keep a single AutoGluon folder.
You can then provide different flavors for the setup just in the framework definition using the setup_env syntax:

AutoGluon_TimeSeries:
  extends: AutoGluon
  setup_env:
    MODULE: timeseries
    VAR: string

this makes the 2 variables MODULE and VAR directly available in setup.sh (right after the call to . ${HERE}/../shared/setup.sh ${HERE} true) and allows you to customize the setup: you may already be using this for dev environment.

Also, thanks to the definitions hierarchy (extends: AutoGluon), maybe we can later tweak the results to make it appear as just AutoGluon, or we can add a notion of group, whatever...

This may not be perfect when you switch frequently between definitions, but for now, I'd like to keep the framework folders to a minimum.

I agree with you that we probably need to start thinking about distinguishing the setup/exec for different kind of tasks, ideally it should be smooth and not even require the additional definition above: for a given type of task, the framework should be able to tell early if it can handle it, if not it could try to apply some additional setup before replying, and if it's ready handle it, then it continues as before: but all of this is much easier to change if there's already one single folder in the first place.
I can create an issue for this, although I don't have huge time to dedicate to AMLB lately, but this may change in a couple of months.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, how can I access the variables MODULE and VAR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it :)

Comment on lines 194 to 202
####################################
### TimeSeries AutoML frameworks ###
####################################

AutoGluonTS:
version: "stable"
description: |
AutoGluon-TimeSeries
project: https://auto.gluon.ai
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we don't want to have a new framework if it supports only a specific kind of dataset. See my comment above.
If we start to have one framework for regression, one for classification, one for time series, one for anomaly detection and so on... then if becomes hard to compare "AutoML" frameworks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it

Comment on lines 492 to 505
if self._task_def.dataset['type'] == 'timeseries' and self._task_def.dataset['timestamp_column'] is None:
log.warning("Warning: For timeseries task setting undefined timestamp column to `timestamp`.")
self._task_def.dataset['timestamp_column'] = "timestamp"
self._dataset = Benchmark.data_loader.load(DataSourceType.file, dataset=self._task_def.dataset, fold=self.fold, timestamp_column=self._task_def.dataset['timestamp_column'])
if self._dataset.type == DatasetType.timeseries:
if self._task_def.dataset['id_column'] is None:
log.warning("Warning: For timeseries task setting undefined itemid column to `item_id`.")
self._task_def.dataset['id_column'] = "item_id"
if self._task_def.dataset['prediction_length'] is None:
log.warning("Warning: For timeseries task setting undefined prediction length to `1`.")
self._task_def.dataset['prediction_length'] = "1"
self._dataset.timestamp_column=self._task_def.dataset['timestamp_column']
self._dataset.id_column=self._task_def.dataset['id_column']
self._dataset.prediction_length=self._task_def.dataset['prediction_length']
Copy link
Collaborator

@sebhrusen sebhrusen Oct 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like most of this logic could reside in the loading logic itself as this is dealing with information available in self._task_def.dataset which is directly available to the file loader.
I'd move the logic to dataset/file.py for now to minimize scope of changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so you want me to extend the FileDataset or the CsvDataset?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can extract this logic in a dedicated method in file.py for clarity (it's just mutating dataset after all), and if you just support CVS right now, then please apply it only there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I added it to a dedicated method in file.py inside the FileLoader class.

@@ -30,7 +30,7 @@ def __init__(self, cache_dir=None):
self._cache_dir = cache_dir if cache_dir else tempfile.mkdtemp(prefix='amlb_cache')

@profile(logger=log)
def load(self, dataset, fold=0):
def load(self, dataset, fold=0, timestamp_column=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you obtained this new column using

timestamp_column=self._task_def.dataset['timestamp_column']

so you already have the information in the dataset object

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true

@Innixma
Copy link
Collaborator Author

Innixma commented Oct 3, 2022

Thanks @sebhrusen for the detailed review!

@limpbot would you like to have a go at addressing some of the comments? Feel free to send a PR to my branch as you did in your prior update.

* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access

* no outer context manipulation, add dataset subdir

* add more datasets

* include error raising for too large pred. length.

* mergin AutoGluonTS framework folder into AutoGluon

* renaming ts.yaml to timeseries.yaml, plus ext.

* removing presets, correct latest config for AGTS

* move dataset timeseries ext to datasets/file.py

* dont bypass test mode

* move quantiles and y_past_period_error to opt_cols

* remove whitespaces

* deleting merge artifacts

* delete merge artifacts

* renaming prediction_length to forecast_range_in_steps

* use public dataset, reduced range to maximum

* fix format string works

* fix key error bug, remove magic time limit
@Innixma
Copy link
Collaborator Author

Innixma commented Oct 6, 2022

I merged @limpbot's changes into this branch via his PR: Innixma#7

@sebhrusen The branch should be ready for 2nd round of review.

Copy link
Collaborator

@sebhrusen sebhrusen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the core logic are much smaller now, which is what I mostly care about for this feature in its current state and scope as I don't want to prevent you from moving forward.

I think it will be interesting for us (cc: @PGijsbers) to start thinking about supporting new kind of tasks, and see how we can integrate this smoothly (mixins after restructuring code?). Maybe even some kind of plugin logic (I have a PoC PR allowing the user to plug custom code in various places, mainly thinking about data loading, result metrics, and whatever the framework may need.
If you have any idea on your side, feel free to make suggestions in the https://github.com/openml/automlbenchmark/discussions or contribute directly.

Thanks a lot for this @limpbot and @Innixma

Comment on lines 40 to 46
if [[ ${MODULE} == "timeseries" ]]; then
PY -c "from autogluon.tabular.version import __version__; print(__version__)" >> "${HERE}/.setup/installed"
# TODO: GPU version install
PIP install "mxnet<2.0"
else
PY -c "from autogluon.timeseries.version import __version__; print(__version__)" >> "${HERE}/.setup/installed"
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you swapped tabular.version and timeseries.version here.
Can the versions actually be different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, that got mixed-up. I don't think the versions should ever differ, but to be safe I will correct it and add it in a future pull request. Thank you for the reviews and merge @sebhrusen!

Copy link
Collaborator

@sebhrusen sebhrusen Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default autogluon setup looks broken: see https://github.com/openml/automlbenchmark/actions/runs/3199648554/jobs/5225661120

apparently, the forecasting/timeseries module being always installed currently, the mxnet dependency is always required

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can merge only once the default setup works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay thanks for the pointer, most likely because of the mixed-up version call. I am taking a look at it.

@sebhrusen sebhrusen marked this pull request as ready for review October 7, 2022 10:43
…series modularities (#8)

* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access

* no outer context manipulation, add dataset subdir

* add more datasets

* include error raising for too large pred. length.

* mergin AutoGluonTS framework folder into AutoGluon

* renaming ts.yaml to timeseries.yaml, plus ext.

* removing presets, correct latest config for AGTS

* move dataset timeseries ext to datasets/file.py

* dont bypass test mode

* move quantiles and y_past_period_error to opt_cols

* remove whitespaces

* deleting merge artifacts

* delete merge artifacts

* renaming prediction_length to forecast_range_in_steps

* use public dataset, reduced range to maximum

* fix format string works

* fix key error bug, remove magic time limit

* swapped timeseries and tabular to set version

* make warning message more explicit

* remove outer context manipulation

* split timeseries / tabular into functions
@Innixma
Copy link
Collaborator Author

Innixma commented Oct 7, 2022

Thanks @sebhrusen for the detailed review! @limpbot has addressed some final comments in the latest update, which should also fix the autogluon.tabular error you mentioned.

@sebhrusen sebhrusen merged commit 4029472 into openml:master Oct 10, 2022
@PGijsbers
Copy link
Collaborator

I think it will be interesting for us (cc: @PGijsbers) to start thinking about supporting new kind of tasks

I missed the "mention" ping (just thought it said a "subscribed"), sorry I didn't check earlier. Definitely, I want to first wait for the JMLR reviews and finish "that part of the project", but creating a more flexible environment for people to add new types of tasks would be a great next thing that invites more people to use (and extend) the benchmark tool.

Thanks Innixma and Limpbot for your contribution 🎉

limpbot added a commit to limpbot/automlbenchmark that referenced this pull request Nov 15, 2022
* Add AutoGluon TimeSeries Prototype

* AutoMLBenchmark TimeSeries Prototype. (#6)

* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access

* Update readme

* Autogluon timeseries, addressed comments by sebhrusen (#7)

* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access

* no outer context manipulation, add dataset subdir

* add more datasets

* include error raising for too large pred. length.

* mergin AutoGluonTS framework folder into AutoGluon

* renaming ts.yaml to timeseries.yaml, plus ext.

* removing presets, correct latest config for AGTS

* move dataset timeseries ext to datasets/file.py

* dont bypass test mode

* move quantiles and y_past_period_error to opt_cols

* remove whitespaces

* deleting merge artifacts

* delete merge artifacts

* renaming prediction_length to forecast_range_in_steps

* use public dataset, reduced range to maximum

* fix format string works

* fix key error bug, remove magic time limit

* Addressed minor comments, and fixed version call for tabular and timeseries modularities (#8)

* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access

* no outer context manipulation, add dataset subdir

* add more datasets

* include error raising for too large pred. length.

* mergin AutoGluonTS framework folder into AutoGluon

* renaming ts.yaml to timeseries.yaml, plus ext.

* removing presets, correct latest config for AGTS

* move dataset timeseries ext to datasets/file.py

* dont bypass test mode

* move quantiles and y_past_period_error to opt_cols

* remove whitespaces

* deleting merge artifacts

* delete merge artifacts

* renaming prediction_length to forecast_range_in_steps

* use public dataset, reduced range to maximum

* fix format string works

* fix key error bug, remove magic time limit

* swapped timeseries and tabular to set version

* make warning message more explicit

* remove outer context manipulation

* split timeseries / tabular into functions

Co-authored-by: Leo <LeonhardSommer96@gmail.com>
PGijsbers added a commit that referenced this pull request Jun 20, 2023
* Add a workflow to tag latest `v*` release as `stable` (#399)

Currenty limited to alphabetical ordering which means that any one number in the version can not exceed one digit.

* Bump auto-sklearn to 0.14.0 (#400)

* Update version to 2.0

* Revert "Update version to 2.0"

This reverts commit 9e0791a.

* Fix/docker tag (#404)

* Add the version tag to the image name if present

* Fix casing for MLNet framework definition

* Sync stable-v2 and master (#407)

* Update version to 2.0.2

* Revert version change

* Add support for the OpenML test server (#423)

* Add support for the OpenML test server

* change domain from openmltestserver to test.openml

* update error message

* Apply suggestions from code review

Co-authored-by: seb. <sebastien@h2o.ai>

* fix syntax error due to online merging

Co-authored-by: seb. <sebastien@h2o.ai>

* Switch from release:created to release:published (#429)

* Added support for dataset files stored on s3 (#420)

* s3 functionality

* Update amlb/datasets/fileutils.py

Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>

* OOD

* add s3n

* move boto3 import

Co-authored-by: Weisu Yin <weisuyin96@gmail.com>
Co-authored-by: Pieter Gijsbers <p.gijsbers@tue.nl>

* Respect TMP, TMPDIR, TEMP (#442)

* Respect tmpdir

* Fixed submodule

* feat: retain environment vars for framework venv

* minor fix on compatibility (#454)

Co-authored-by: Qingyun Wu <qxw5138@psu.edu>

* Ignore decoding errors on Windows (#459)

By default it can use cp1252 decoding which sometimes raises an error
and halts the process.

* Fix a typo (#462)

will used -> will be used

* Merge back stable-v2 to master (#472)

* Add `stable` tag workflow, bump auto-sklearn (#401)

* Add a workflow to tag latest `v*` release as `stable` (#399)

Currenty limited to alphabetical ordering which means that any one number in the version can not exceed one digit.

* Bump auto-sklearn to 0.14.0 (#400)

* Fix/docker tag (#404)

* Add the version tag to the image name if present

* Fix casing for MLNet framework definition

* Changed latest from master to main

* Update version to 2.0.1

* Improv/aws meta (#413)

* Add volume meta data to aws meta info

* Add constraints for v2 benchmark (#415)

* Add constraints for v2 benchmark

For ease of reproducibility, we want to include our experimental setup
in the constraints file. For our experiments we increase the volume size
to 100gb and require gp3 volumes (general purpose SSD).

* Update version to 2.0.2

* Fix AWS random cancel issue (#422)

* let the job runner handle the rescheduling logic to ensure that the job is always can't be acted upon by current worker after being rescheduled

* remove commented code

* Add a GAMA configuration intended for benchmarking (#426)

Made the previous version abstract to avoid accidentally running the
wrong version of GAMA for the benchmark.

* Unsparsify target variables for (Tuned)RF (#425)

* Unsparsify target variables for (Tuned)RF

Sparse targets are not supported in scikit-learn 0.24.2, and are used
with tasks 360932 and 360933 (QSAR) in the benchmark.

* cosmetic change to make de/serialization easier to debug

Co-authored-by: Sebastien Poirier <sebastien@h2o.ai>

* ensure that openml is configured when loading the tasks (#427)

* Expect a possible `NoSuchProcess` error (#428)

Since it's entirely possible that the processes were already
terminating, but only completed termination between the process.children
call and the proc.terminate/kill calls.

* Reset version for versioning workflow

* Update version to 2.0.3

* ensure that the docker images can be built from linux (#437)

* Avoid querying terminated instance with CloudWatch (#438)

* fixes #432 add precision to runtimes in results.csv (#433)

* fixes #432 add precision to runtimes in results.csv

* Update amlb/results.py

Co-authored-by: seb. <sebastien@h2o.ai>

Co-authored-by: seb. <sebastien@h2o.ai>

* Iteratively build the forest to honor constraints (#439)

* Iteratively build the forest to honor constraints

In particular depending on the dataset size either memory or time
constraints can become a problem which makes it unreliable as a
baseline. Gradually growing the forest sidesteps both issues.

* Make iterative fit default, parameterize execution

* Step_size as script parameter, safer check if done

When final_forest_size is not an exact multiple of step_size,
randomforest should still terminate. Additionally step_size is escaped
with an underscore as it is not a RandomForestEstimator hyperparameter.

* Iterative fit for TunedRandomForest to meet memory and time constraints (#441)

* Iterative fit to meet memory and time constraints

Specifically for each value of `max_features` to try, an equal time
budget is alloted, with one additional budget being reserved for the
final fit. This does mean that different `max_features` can lead to
different number of trees, but it keeps it simple.

* Abort tuning when close to total time budget

The first fit of each iterative fit for a `max_features` value was not
guarded, which can lead to exceeding the total time budget. This adds a
check before the first fit to estimate whether the budget will be
exceeded, and if so aborting further tuning and continue with the final
fit.

* Make k_folds configurable

* Add scikit-learn code with explanation

* Modify cross_validate, allow 1 estimator per split

This is useful when we maintain a warm_started model for each individual
split.

* Use custom cv function to allow warm-start

By default estimators are cloned in any scikit-learn cross_validate
function (which stops warm-start) and it is not possible to specify a
specific estimator-object per fold (which stops warm-start). The added
custom_validate module makes changes to the scikit-learn code to allow
warm-starting to work in conjunction with the cross-validate
functionality. For more info see scikit-learn#22044 and
scikit-learn#22087.

* Add parameter to set tune time, rest is for fit

The previous iteration where the final fit was treated as an equivalent
budget to any other optimization sometimes left too little time to train
the final forest, in particular when the last fit took longer than
expected. This would often lead to very small forests for the final
model. The new system guarantees roughly 10% of budget for the final
forest, guaranteeing a better final fit.

* Revert version to _dev_version to prepare release (#444)

* Update version to 2.0.4

* Signal to encode predictions as proba now works (#447)

In a previous iteration it was encoded as a numpy file, but now it's
serialized to JSON which means that results.probabilities is simply a
string if imputation is required.

* Monkeypatch openml to keep whitespace in features (#446)

Technically monkeypatch xmltodict function used by openml when reading the features xml

* fixe for mlr3automl (#443)

* Reset version for Github workflow (#448)

* Update version to 2.0.5

* Update mlr3automl to latest

Was supposed to be included with #443

* Update MLR3 (#461)

* Reset version for version bump

* Updatet version because GA failed

* Issue 416: fixing versioning workflow for releases and merges to master (#468)

* change workflow to correctly modify the app version on releases and when forcing merged version back to master

* protect main branch from accidental releases

* fix stress test

Co-authored-by: PGijsbers <p.gijsbers@tue.nl>
Co-authored-by: eddiebergman <eddiebergmanhs@gmail.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Erin LeDell <erin@h2o.ai>
Co-authored-by: Stefan Coors <stefan.coors@gmx.net>

* useless workflow reintroduced during merge (#475)

* tag all AWS entities (#469)

* fixed parsing of int targets when loading file in CSV format (#467)

* Avoid root owned files from docker (#464)

* New site (#479)

* First draft of new website

* Add framework descriptions, papers and logos

* Update footer with Github link

* Remove under construction banner

* Add redirect from old page to new one

* Update page title

* Add text links to new paper to be added later

* Move static site to /docs

* Whitelist documentation images

* Remove temporary work directory

* Add documentation images

* Place holder for mobile

* Move old notebooks and visualizations

To make sure they are not confusing for new users, as these will no longer work out-of-the-box.
New notebooks will be added soon but I don't have the files available right now.

* Tell github this is not Jekyll

* Update minimal responsiveness (#480)

* Make results responsive (hacky)

* Make Frameworks page more responsive

* Make Home more responsive

* Bare minimum mobile navbar

* Make sure phones report fake width

* Link to arxiv paper (#481)

* Update to support AutoGluon v0.4 (#455)

* Update to support AutoGluon v0.4

* Address comments

* Updated setup.py for `hyperoptsklearn` as it no longer uses PyPi (also now accepts shas) (#410)

* Updated hyper opt not to use PyPi and accept shas

* case-sensitive PIP command in setup

Co-authored-by: Sebastien Poirier <sebastien@h2o.ai>

* AutoGluon TimeSeries Support (first version) (#494)

* Add AutoGluon TimeSeries Prototype

* AutoMLBenchmark TimeSeries Prototype. (#6)

* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access

* Update readme

* Autogluon timeseries, addressed comments by sebhrusen (#7)

* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access

* no outer context manipulation, add dataset subdir

* add more datasets

* include error raising for too large pred. length.

* mergin AutoGluonTS framework folder into AutoGluon

* renaming ts.yaml to timeseries.yaml, plus ext.

* removing presets, correct latest config for AGTS

* move dataset timeseries ext to datasets/file.py

* dont bypass test mode

* move quantiles and y_past_period_error to opt_cols

* remove whitespaces

* deleting merge artifacts

* delete merge artifacts

* renaming prediction_length to forecast_range_in_steps

* use public dataset, reduced range to maximum

* fix format string works

* fix key error bug, remove magic time limit

* Addressed minor comments, and fixed version call for tabular and timeseries modularities (#8)

* fixed loading test & train, changed pred.-l. 5->30

* ignore launch.json of vscode

* ensuring timestamp parsing

* pass config, save pred, add results

* remove unused code

* add readability, remove slice from timer

* ensure autogluonts has required info

* add comments for readability

* setting defaults for timeseries task

* remove outer context manipulation

* corrected spelling error for quantiles

* adding mape, correct available metrics

* beautify config options

* fixed config for public access

* no outer context manipulation, add dataset subdir

* add more datasets

* include error raising for too large pred. length.

* mergin AutoGluonTS framework folder into AutoGluon

* renaming ts.yaml to timeseries.yaml, plus ext.

* removing presets, correct latest config for AGTS

* move dataset timeseries ext to datasets/file.py

* dont bypass test mode

* move quantiles and y_past_period_error to opt_cols

* remove whitespaces

* deleting merge artifacts

* delete merge artifacts

* renaming prediction_length to forecast_range_in_steps

* use public dataset, reduced range to maximum

* fix format string works

* fix key error bug, remove magic time limit

* swapped timeseries and tabular to set version

* make warning message more explicit

* remove outer context manipulation

* split timeseries / tabular into functions

Co-authored-by: Leo <LeonhardSommer96@gmail.com>

* Add workflow to manually run `runbenchmark.py` on Github Actions (#516)

* Add workflow for manually running a test benchmark

* Use built-in context for getting the branch

* Add more info to step names

* Add ability to specify options

* Fixed user and sudo under docker (#495)

* Fixed user and sudo under docker

* Reverted format

* Update docker.py

* Addressing #497

#497

* Keep wget quiet

* Use :, . is deprecated

Co-authored-by: seb. <sebastien@h2o.ai>

* Set username and userid in Dockerfile generation

* Install HDF5 to Docker for tables

* Avoid using unix-specific workarounds on Windows

* Re-enable caching for building docker images

---------

Co-authored-by: seb. <sebastien@h2o.ai>
Co-authored-by: PGijsbers <p.gijsbers@tue.nl>

* [no-ci] Fix broken link (#514)

* Remove autoxgboost, add `removed` field for frameworks (#519)

* Add redirect for dataset page (#521)

* Upgrade Python version and dependencies (#520)

* Remove usage of np.float alias and just use float

* Bump to Py3.9

* Update requirements for March 2023, Py3.9

* Pin packaging, since LegacyVersion was removed.

Also remove scipy pin, since later autosklearn needs higher scipy.

* Install packages to ranger/lib

* Set secret PAT used when installing with R remotes

Specifically for mlr3automl integration

* Update usage for oct 21 release

* Disable custom installed packages

* Remove installation of reqiurements altogether

* Insert oboe example

* Add monkeypatch

* Make error matrix numpy array

* Upgrade to Ubuntu 22.04 from 18.04

* Update pip cache to look at 3.9 directory

* Add Github PAT to run_all_frameworks script

* bump github action versions

* Adding tarfile member sanitization to extractall() (#508)

* Included lightautoml in frameworks_stable (#412)

* Included lightautoml in frameworks_stable

* Added MLNet to frameworks_latest

* Added mlr3 to both stable and latest

* copy/paste fix

* Remove travis file (#529)

* Remove travis file since it is not used

* Update readme to reflect Python 3.9 support

* Add github action workflow to replace old travis file

* Add job id, improve name

* Fix bug where task inference would lead to KeyError

* Update type data for new openml/pandas

Probably ought to remove the specific check if we don't enforce it.

* Write numeric categories as str, see renatopp/liac-arff/issues/126

* [Open for review] Store results after each job completion (#526)

* ensure that results are solved progressively in all situations instead of only when all jobs are completed

* rename config flag

* don't forget to cleanup job runner exec thread

* Improve type hints

* Adding file lock on global results file (#453)

* adding file lock on global results file

* fix imports

* fix amlb.utils export

* cosmetic

* clranup util imports (also magic strings) + remove ruamel dependency in subprocesses

---------

Co-authored-by: Sebastien Poirier <sebastien@h2o.ai>

* Update the requirements files to exclude yaml and include filelock

The remainder of dependencies are not re-generated to avoid
additional changes in the PR.

* Add missing import

* Add fallback for when job is not started

* Return an empty dataframe if dataframe is empty

This avoids a bug where an empty dataframe is indexed.

* Inform the user result summary is not available in AWS mode

As results are processed in a different manner (files are directly
copied over from S3). This avoids a bug where a benchmark
results.csv file tries to be accessed.

* Separate scoreboard generation to two lines instead

Which makes it easier to tell which part of the generation generates
an error, if any.

* re-enable logging

* Provide a warning and return early if no process output is detected

This avoids potentially crashing if the logging is configured incorrectly.
In the future, we should expand this to first check how logging is
configured in order to see whether or not the issue should be reported
and possibly give a more detailed warning if it is likely the cause
of an error.

---------

Co-authored-by: Sebastien Poirier <sebastien@h2o.ai>
Co-authored-by: seb <sebastien.poirier@h2o.ai>

* maint: upgrade AMI to Ubuntu 22.04 #512 (#525)

* Add `flaml_benchmark` (#528)

* dont discard setup_args if it already is a list

* Add flaml and flaml_benchmark

It is not added to latest since install from latest seems to be broken

* Set up alternative way for benchmark mode of flaml

This is only temporarily allowed - we expect an easily configurable
algorithm, instead of having to carefully install specific
dependencies.

* limit install, since >2 incompatible

* Measure inference time (#532)

Add the option to measure inference time (disabled by default) for most frameworks.
For those frameworks, inference time is measured capturing both the data loading and the inference.
This is done to make things more equal between the different frameworks (as some _need_ to read the file if they don't operator in Python). Inference time is measured multiple times for different batch sizes (configurable). By default, the median is reported in the results file (as it is less sensitive to e.g., cold-starts) but all measured inference times are stored in the predictions folder of a run.
For Python frameworks, inference time for in-memory single row predictions is also measured.

* Upload to OpenML (#523)

Adds a script that allows uploading run results to openml.
Additional metadata is stored in the task information to be able to provide a complete description for openml upload.
Additional parameters are added to `run_benchmark` to allow runs to automatically be tagged, and to connect to the test server.
Also fixes TPOT integration for newer versions, where if a model has no `predict_proba` an `AttributeError` is raised instead of a `RuntimeError`.

* Fix a race condition of checking vs adding results (#535)

Specifically, adding results was queued in a job executor, while
checking results was directly called by the worker threads.
If the worker thread checks before the executor had added results,
it is possible to get into a deadlock condition. The deadlock
arises from the fact that the `stop` condition is never called
and the main thread will continue to wait for its END_Q signal.

* Add scikit_safe inference time measurement files (#537)

* Add scikit_safe inference time measurement files

These files have categorical values numerically encoded and missing
values imputed, which makes them usable for any scikit-learn algo.

* Only generate inference measurement files if enabled

* Optionally limit inference time measurements by dataset size (#538)

* Add versions 2023 q2 (#539)

* Fix versions for June 2023 benchmark

* Add 2023Q2 framework tag

* Use encoded values for inference

* Add us-east-2 AMI

* Run docker as root on AWS

* Add option to add build options for docker build command

* Remove 'infer_speed' artifact as it is not supported in main repo

* Fix pandas 2 not compatible with autosklearn 2 see askl#1672

---------

Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>
Co-authored-by: seb. <sebastien@h2o.ai>
Co-authored-by: Weisu Yin <weisy@amazon.com>
Co-authored-by: Weisu Yin <weisuyin96@gmail.com>
Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com>
Co-authored-by: Qingyun Wu <qw2ky@virginia.edu>
Co-authored-by: Qingyun Wu <qxw5138@psu.edu>
Co-authored-by: Robinnibor <robinksskss@gmail.com>
Co-authored-by: Erin LeDell <erin@h2o.ai>
Co-authored-by: Stefan Coors <stefan.coors@gmx.net>
Co-authored-by: Alan Silva <3899850+alanwilter@users.noreply.github.com>
Co-authored-by: Nick Erickson <neerick@amazon.com>
Co-authored-by: Leo <LeonhardSommer96@gmail.com>
Co-authored-by: TrellixVulnTeam <112716341+TrellixVulnTeam@users.noreply.github.com>
Co-authored-by: seb <sebastien.poirier@h2o.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants