MCO rebase to main (#183)

* Benchmarks (#165) * add benchmarks * fix readme * fix readme * bolt font for arguments * Перевод комментариев в разделе iOpt на английский для документации (#164) * Перевод комментариев в разделе iOpt на английский для документации * Перевод комментариев в разделе iOpt на английский для документации (с исправлениями) * Исправление замечаний по переводу комментов iOpt * Исправление замечаний по переводу комментов iOpt (2) * Перевод комментариев из раздела problems * Update .readthedocs.yml for current requirements of Read the Docs * Update .readthedocs.yml * Update .readthedocs.yml * Update conf.py for English language * I supplemented the documentation with a paragraph about the work of the framework with the optimal selection of two real and one discrete parameters. Corrected the problem code for finding real and discrete parameters. (#167) * Fixed a bug in the calculator destructor (#168) * Fixed a bug in the calculator destructor * Fixed problem with process pool destruction * The design of the example is brought to a single sample * Correct target score (#170) * I supplemented the documentation with a paragraph about the work of the framework with the optimal selection of two real and one discrete parameters. Corrected the problem code for finding real and discrete parameters. * correct target score * Grid search (#169) * Fixed a bug in the calculator destructor * Fixed problem with process pool destruction * The design of the example is brought to a single sample * Added grid search * Fixed search * Corrected comments * Fix problem with pool * Removed unnecessary field * Corrected comments * Построение по сеткам известных точек + исправление багов (#172) * add graph by points * fix double axes bug * add lines layers by points --------- Co-authored-by: Marina Usova <usova@itmm.unn.com> * Асинхронная параллельная схема (#166) * async initial * async up * async second * pep8 * pep8 * switch multiprocessing to multiprocess (part of pathos) * revert gkls example * revert requirements.txt * move async implementation from async_parallel_process to async_calculator * pep8 * redundant code removed * test for async parallel process * gkls async example add multiprocess to requirements * gkls async example * async initial * async up * async second * pep8 * pep8 * switch multiprocessing to multiprocess (part of pathos) * revert gkls example * revert requirements.txt * move async implementation from async_parallel_process to async_calculator * pep8 * redundant code removed * test for async parallel process * gkls async example add multiprocess to requirements * gkls async example * semi-fix for iter-tasks * Fixed test for asynchronous parallel circuit (#173) * Fixed a bug in the calculator destructor * Fixed problem with process pool destruction * The design of the example is brought to a single sample * Fixed test for asynchronous parallel circuit * add characteristic in save progress (#176) * add characteristic in save progress * add sol time&accuracy * Дополнение для сохранения в JSON (#177) * add characteristic in save progress * add sol time&accuracy * add Task, Parameters and creation_time for sd_item in save_progress * change save&load * Update method.py * meaningless change * add _init_ in loadProgress * The output of the optimal solution in problems with restrictions has … (#180) * The output of the optimal solution in problems with restrictions has been corrected Parallel index calculator is working properly * Update Stronginc3_example.py * Fix bug with original optimum using (#181) * fix bug with original optimum using * add var for number of constraints and fix objective function value --------- Co-authored-by: Marina Usova <usova@itmm.unn.com> * Corrected documentation of examples (#182) * I supplemented the documentation with a paragraph about the work of the framework with the optimal selection of two real and one discrete parameters. Corrected the problem code for finding real and discrete parameters. * correct target score * Append new examples. Correct documentation * Corrected documentation of examples * The calculator is used in trial calculation * Added work with the calculator * Реализация решения задач MCO (#163) * 1. Добавлены интерфейсы классов для многокритериальной оптимизации * rename classes * Рабочая начальная версия mco (#179) * mco test problem & optim task * mco test problem & optim task 2 * added mco_process, fixed convolution and added to mco to solverFactory * mco test problem & optim task 3 * reverted optim task. shouldn't have touched in first place * fixed bug * mco test problem & optim task 4 * mco test problem & optim task 5 * mco test problem & optim task 6 * new problem&test, update method * working ver * mb work * delete comment, add task, evolvent for init lambdas, other refac * delete comment, add start_lambdas&is_scaling, add init_lambdas * fix with comments * fix with comments 1 --------- Co-authored-by: MADZEROPIE <ask_ii1@mail.ru> * The calculator is used in trial calculation * Added work with the calculator * Corrected to match the updated interface * Added example with MCO Test1 * Fixed calculator factory --------- Co-authored-by: dyonichhh <36537172+RodionovDenis@users.noreply.github.com> Co-authored-by: Anton A. Shtanyuk <ashtanyuk@gmail.com> Co-authored-by: Alexander Sysoyev <sysoyev@vmk.unn.ru> Co-authored-by: Karchkov Denis <karchkov.denis@mail.ru> Co-authored-by: UsovaMA <oppabang@mail.ru> Co-authored-by: Marina Usova <usova@itmm.unn.com> Co-authored-by: oleg-w570 <73493289+oleg-w570@users.noreply.github.com> Co-authored-by: Yanina Kolt <43132462+YaniKolt@users.noreply.github.com> Co-authored-by: kozinove <evgeniy.kozinov@gmail.com> Co-authored-by: MADZEROPIE <ask_ii1@mail.ru>
aimclub · Mar 5, 2024 · 589e30f · 589e30f
1 parent f0ffb22
commit 589e30f
Show file tree

Hide file tree

Showing 134 changed files with 40,582 additions and 971 deletions.
diff --git a/.gitignore b/.gitignore
@@ -129,3 +129,8 @@ dmypy.json
 # Pyre type checker
 .pyre/
 
+# vs code
+.vscode
+
+# datasets
+benchmarks/data/datasets
diff --git a/.readthedocs.yml b/.readthedocs.yml
@@ -4,7 +4,11 @@
 
 # Required
 version: 2
-
+build:
+  os: ubuntu-22.04
+  tools:
+    python: "3.8"
+
 # Build documentation in the docs/ directory with Sphinx
 sphinx:
   configuration: docs/source/conf.py
@@ -15,6 +19,6 @@ sphinx:
 #  - pdf
 
 python:
-  version: 3.8
+#  version: 3.8
   install:
     - requirements: docs/requirements.txt
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,35 @@
+# Reproduction of results
+
+Install modules:
+
+     pip install -U -r requirements.txt
+
+Downloading datasets:
+
+     python data/loader.py
+
+Running the experiment:
+
+     python runner.py --dataset {dataset name} --method {method name} --max-iter {number of iterations} --dir {directory for results} --trials {number of trials} --n_jobs {the number of worker processes to use}
+
+`runner.py` script parameters:
+
+1. --dataset – one or more from the list:
+
+     (`balance`, `bank-marketing`, `banknote`, `breast-cancer`, `car-evaluation`, `cnae9`, `credit-approval`,
+      `digits`, `ecoli`, `parkinsons`, `semeion`, `statlog-segmentation`, `wilt`, `zoo`)
+
+2. **--method** – either `svc`, or `xgb`, or `mlp`
+3. **--max-iter** – number of iterations
+4. **--dir** – directory in which tables with results will be saved (by default this will be the `result` folder)
+5. **--trials** – the number of trials in non-deterministic algorithms (`hyperopt`, `optuna`)
+6. **--n_jobs** – the number of worker processes to use
+
+
+## Launch example
+
+We run the `svc` method with the `breast-cancer` and `zoo` datasets, the maximum number of iterations is `200`, trials with non-deterministic algorithms are `10`, the number of worker processes to use is `12`.
+
+     python runner.py --dataset breast-cancer zoo --method svc --max-iter 200 --trials 10, --n-jobs 12
+
+Once completed, the script will create two tables with the resulting metrics (`result/metrics.csv`) and times (`result/times.csv`). If the algorithm is non-deterministic, the table contains the mean with standard deviation.
diff --git a/benchmarks/argparser.py b/benchmarks/argparser.py
@@ -0,0 +1,122 @@
+import data
+
+from argparse import ArgumentParser
+from dataclasses import dataclass, field
+
+from hyperparams import Hyperparameter, Numerical, Categorial
+from sklearn.svm import SVC
+from xgboost import XGBClassifier
+from sklearn.neural_network import MLPClassifier
+from functools import partial
+
+
+METHOD_TO_HYPERPARAMS = {
+    SVC: {
+        'gamma': Numerical('float', 1e-9, 1e-6, is_log_scale=True),
+        'C': Numerical('int', 1, 1e10, is_log_scale=True),
+        'kernel': Categorial('poly', 'rbf', 'sigmoid')
+    },
+
+    XGBClassifier: {
+        'n_estimators': Numerical('int', 10, 200),
+        'max_depth': Numerical('int', 5, 20),
+        'min_child_weight': Numerical('int', 1, 10),
+        'gamma': Numerical('float', 0.01, 0.6),
+        'subsample': Numerical('float', 0.05, 0.95),
+        'colsample_bytree': Numerical('float', 0.05, 0.95),
+        'learning_rate': Numerical('float', 0.001, 0.1, is_log_scale=True)
+    },
+
+    MLPClassifier: {
+        'hidden_layer_sizes': Numerical('int', 2, 150),
+        'activation': Categorial('identity', 'logistic', 'tanh', 'relu'),
+        'solver': Categorial('lbfgs', 'sgd', 'adam'),
+        'alpha': Numerical('float', 1e-9, 1e-1, is_log_scale=True)
+    }
+}
+
+
+NAME_TO_DATASET = {
+    'balance': data.Balance,
+    'bank-marketing': data.BankMarketing,
+    'banknote': data.Banknote,
+    'breast-cancer': data.BreastCancer,
+    'car-evaluation': data.CarEvaluation,
+    'cnae9': data.CNAE9,
+    'credit-approval': data.CreditApproval,
+    'digits': data.Digits,
+    'ecoli': data.Ecoli,
+    'parkinsons': data.Parkinsons,
+    'semeion': data.Semeion,
+    'statlog-segmentation': data.StatlogSegmentation,
+    'wilt': data.Wilt,
+    'zoo': data.Zoo
+}
+
+
+@dataclass
+class ConsoleArgument:
+    max_iter: int
+    estimator: SVC | XGBClassifier | MLPClassifier
+    dataset: data.Dataset
+    hyperparams: Hyperparameter = field(init=False)
+    dir: str
+    trials: int
+    n_jobs: int
+
+    def __post_init__(self):
+        estimator = self.estimator
+        if isinstance(estimator, partial):
+            estimator = estimator.func
+        self.hyperparams = METHOD_TO_HYPERPARAMS[estimator]
+
+
+def get_estimator(name: str) -> SVC | XGBClassifier | MLPClassifier:
+    if name == 'svc':
+        return partial(SVC, max_iter=1000)
+    elif name == 'xgb':
+        return partial(XGBClassifier, n_jobs=1)
+    elif name == 'mlp':
+        return MLPClassifier
+    raise ValueError(f'Estimator "{name}" do not support')
+
+
+def get_datasets(names: str) -> data.Dataset:
+    try:
+        result = []
+        for x in names:
+            result.append(NAME_TO_DATASET[x])
+        return result
+    except KeyError:
+        raise ValueError(f' Dataset "{x}" do not support')
+
+
+def parse_arguments():
+    """
+    --max-iter:
+        int, positive
+    --dataset:
+        names of dataset, see all names in NAME_TO_DATASET dict
+    --method:
+        must be or svc, or xgb, or mlp
+    --dir:
+        name of the dir to save the results (result by default)
+
+    """
+    parser = ArgumentParser()
+    parser.add_argument('--max-iter', type=int)
+    parser.add_argument('--dataset', nargs='*')
+    parser.add_argument('--method')
+    parser.add_argument('--dir', default='result')
+    parser.add_argument('--trials', type=int, default=1)
+    parser.add_argument('--n-jobs', type=int, default=1)
+
+    args = parser.parse_args()
+    assert args.max_iter > 0, 'Max iter must be positive'
+    assert args.trials > 0, 'Trials must be positive'
+    assert args.n_jobs > 0, 'n_jobs must be positive'
+
+    return ConsoleArgument(args.max_iter,
+                           get_estimator(args.method),
+                           get_datasets(args.dataset),
+                           args.dir, args.trials, args.n_jobs)
diff --git a/benchmarks/data/__init__.py b/benchmarks/data/__init__.py
@@ -0,0 +1,18 @@
+from .loader import (Dataset,
+                     BreastCancer,
+                     Digits,
+                     BankMarketing,
+                     CNAE9,
+                     StatlogSegmentation,
+                     Semeion,
+                     Ecoli,
+                     CreditApproval,
+                     Balance,
+                     Parkinsons,
+                     Zoo,
+                     Banknote,
+                     CarEvaluation,
+                     Wilt)
+
+__all__ = [Dataset, BreastCancer, Digits, BankMarketing, CNAE9, StatlogSegmentation, Semeion, Ecoli,
+           CreditApproval, Balance, Parkinsons, Zoo, Banknote, CarEvaluation, Wilt]