Doe categorical second attempt (#259)

* setting up bofire * simple relaxed categorical example in jupyter notebook * adding exhaustive search for optimization problems with binary variables and writing tests * showcasing how to use binary variables * changes in jupyter notebooks * adding constraint mapper * fixing bug with dtype * stated change to one-hot-encoding * allowing for multiple groups * starting with branch-and-bound * rough sketch of bnb * asserting that design variables is actually a 1d array * asserting that design variables is actually a 1d array * using the right argument for optimality criteria * finishing first bab implementation * changing __str__ * resetting test for valid solution * printing information of branching * allowing for relaxed discrete variables, and fixing bug with sequence of column keys in dataframe * fixing docstring * skipping non-valid-designs and branching for discrete values * catching case for list length 1 * cringe... renaming function * removing redundant code * bug fix/ removing code with no effect * renaming variable to make intention clear * start including new doe strategies in api * adding categorical groups to the domain * changing to categorical groups from domain * correcting docstring * start writing tests for categorical and discrete variables * catching branches in exhaustive search which do not fulfill constraints * simple tests for bab and exhaustive search * adding optional print statements for information about the optimization process * adding optional print statements for information about the optimization process * bug fix, allowing arbitrary objective function and model_type * bug fix, fixing and partially fixing experiments can now be combined * skipping branches with already fixed experiments * skipping branches with already fixed experiments in exhaustive search * removing unused code from binary vars and giving warning when unsuitable attributes are passed * adding documentation * adding documentation * adding documentation * adding documentation * bug fix, now also testing if solution for discrete variables are also valid * bug fix, now also testing if solution for discrete variables are also valid * adding documentation * bug fix, where error occurred if fixed_experiments was None * adjusting tolerances for testing for valid solution * adjusting tolerances for testing for valid solution * raising error when to many (partially)-fixed experiments are provided * bug fix, sorting (partially) fixed experiments and initial guess if provided * adapting branch-and-bound to new fixed experiments usage * bug fix, allowing to fix candidates with .tell * allowing to partially fix experiments with .tell and with all strategies * allowing to use either equality or inequality and changing rhs * reverting, we can only do exactly 1, * bug fix, if partially_fixed_experiments are none, and adding time information to verbose output * adding error for using discrete var in exhaustive search, bug fix where order of variables lead to fixing wrong variables, bug fix with where partially fixed experiments is none * renaming * adding information about how many branches have been explored * added NChooseKGroup_with_quantity (helper function) and mapping from discrete domain to relaxable domain * bug fix, NChooseKGroup_with_quantity (helper function) and allowing to choose between current categorical variables (and NonLinearConstraints) or relaxable categorical variables (and LinearConstraints) * making some arguments optional * fixing optional arguments * adding documentation * Update documentation bofire/data_models/constraints/nonlinear.py Co-authored-by: Johannes P. Dürholt <johannespeter.duerholt@evonik.com> * refactoring RelaxableBinaryInput, RelaxableDiscreteInput, they are not accessible through the public api anymore. * refactoring generate_mixture_constraint and deleting unused functions * allowing to use the old strategy to solve nchoosek constraints * reversing accidental commit * refactoring functions * removing check for initial guess, as we can also allow non-valid initial points * adding initial guess based on design of previous branch * bug fix * deleting old example * merge main * fixing typing * looser tolerances and pruning branches where ipopt does not satisfy constraints * skipping fixations where ipopt does not satisfy constraints * reverting commit, where I skip the is_fulfilled test * typing * adding not implemented error * typing * fixing tests * Delete .gitattributes * Delete .idea directory * typing * removing outdated tests, and fixing existing ones * evaluating with d_optimality requires a 1D array * adding test for categorical and discrete doe with nchoosek * typing * typing * adding random seed * adapting test * adapting (partially) fixed experiments * typing * ignore typing * ignore typing * started beautiful fix * reverting beautiful fix * quick fix * Update test_doe.py * fix test * going back to beautiful fix * removing Relaxable Features * deleting unnecessary check * adding set candidates and test * typing * typing * bug fix * refactoring * removing unnecessary line * merging tutorials from main * merging tutorials from main --------- Co-authored-by: GuenesUI <ufuk-ilkay.guenes@basf.com> Co-authored-by: Johannes P. Dürholt <johannespeter.duerholt@evonik.com>
experimental-design · Aug 16, 2023 · 6810c60 · 6810c60
1 parent 3e481f5
commit 6810c60
Show file tree

Hide file tree

Showing 14 changed files with 1,626 additions and 60 deletions.
diff --git a/bofire/data_models/constraints/nchoosek.py b/bofire/data_models/constraints/nchoosek.py
@@ -93,6 +93,7 @@ def is_fulfilled(self, experiments: pd.DataFrame, tol: float = 1e-6) -> pd.Serie
         Returns:
             bool: True if fulfilled else False.
         """
+
         cols = self.features
         sums = (np.abs(experiments[cols]) > tol).sum(axis=1)
 

diff --git a/bofire/data_models/constraints/nonlinear.py b/bofire/data_models/constraints/nonlinear.py
@@ -73,7 +73,7 @@ def jacobian(self, experiments: pd.DataFrame) -> pd.DataFrame:
 
 
 class NonlinearEqualityConstraint(NonlinearConstraint):
-    """Nonlinear inequality constraint of the form 'expression <= 0'.
+    """Nonlinear equality constraint of the form 'expression == 0'.
 
     Attributes:
         expression: Mathematical expression that can be evaluated by `pandas.eval`.
@@ -91,7 +91,7 @@ def __str__(self):
 
 
 class NonlinearInequalityConstraint(NonlinearConstraint):
-    """Linear inequality constraint of the form 'expression == 0'.
+    """Nonlinear inequality constraint of the form 'expression <= 0'.
 
     Attributes:
         expression: Mathematical expression that can be evaluated by `pandas.eval`.

diff --git a/bofire/data_models/domain/domain.py b/bofire/data_models/domain/domain.py
@@ -179,7 +179,7 @@ def validate_linear_constraints(cls, v, values):
         # gather continuous inputs in dictionary
         continuous_inputs_dict = {}
         for f in values["inputs"]:
-            if type(f) is ContinuousInput:
+            if isinstance(f, ContinuousInput):
                 continuous_inputs_dict[f.key] = f
 
         # check if non continuous input features appear in linear constraints

diff --git a/bofire/data_models/strategies/doe.py b/bofire/data_models/strategies/doe.py
@@ -2,8 +2,6 @@
 
 from bofire.data_models.constraints.api import Constraint
 from bofire.data_models.features.api import (
-    CategoricalInput,
-    DiscreteInput,
     Feature,
     MolecularInput,
 )
@@ -22,14 +20,19 @@ class DoEStrategy(Strategy):
         ],
         str,
     ]
+    optimization_strategy: Literal[
+        "default", "exhaustive", "branch-and-bound", "partially-random", "relaxed"
+    ] = "default"
+
+    verbose: bool = False
 
     @classmethod
     def is_constraint_implemented(cls, my_type: Type[Constraint]) -> bool:
         return True
 
     @classmethod
     def is_feature_implemented(cls, my_type: Type[Feature]) -> bool:
-        if my_type in [CategoricalInput, DiscreteInput, MolecularInput]:
+        if my_type in [MolecularInput]:
             return False
         return True
 

diff --git a/bofire/strategies/doe/branch_and_bound.py b/bofire/strategies/doe/branch_and_bound.py
@@ -0,0 +1,227 @@
+from __future__ import annotations
+
+from functools import total_ordering
+from queue import PriorityQueue
+from typing import Dict, List, Optional, Tuple
+
+import numpy as np
+import pandas as pd
+
+from bofire.data_models.constraints.api import ConstraintNotFulfilledError
+from bofire.data_models.features.api import ContinuousInput
+from bofire.strategies.doe.design import find_local_max_ipopt
+from bofire.strategies.doe.objective import get_objective_class
+from bofire.strategies.doe.utils import get_formula_from_string
+from bofire.strategies.doe.utils_categorical_discrete import equal_count_split
+
+
+@total_ordering
+class NodeExperiment:
+    def __init__(
+        self,
+        partially_fixed_experiments: pd.DataFrame,
+        design_matrix: pd.DataFrame,
+        value: float,
+        categorical_groups: Optional[List[List[ContinuousInput]]] = None,
+        discrete_vars: Optional[Dict[str, Tuple[ContinuousInput, List[float]]]] = None,
+    ):
+        """
+
+        Args:
+            partially_fixed_experiments: dataframe containing (some) fixed variables for experiments.
+            design_matrix: optimal design for given the fixed and partially fixed experiments
+            value: value of the objective function evaluated with the design_matrix
+            categorical_groups: Represents the different groups of the categorical variables
+            discrete_vars: Dict of discrete variables and the corresponding valid values in the optimization problem
+        """
+        self.partially_fixed_experiments = partially_fixed_experiments
+        self.design_matrix = design_matrix
+        self.value = value
+        if categorical_groups is not None:
+            self.categorical_groups = categorical_groups
+        else:
+            self.categorical_groups = []
+        if discrete_vars is not None:
+            self.discrete_vars = discrete_vars
+        else:
+            self.discrete_vars = {}
+
+    def get_next_fixed_experiments(self) -> List[pd.DataFrame]:
+        """
+        Based on the current partially_fixed_experiment DataFrame the next branches are determined. One variable will
+        be fixed more than before.
+        Returns: List of the next possible branches where only one variable more is fixed
+
+        """
+        # branching for the binary/ categorical variables
+        for group in self.categorical_groups:
+            for row_index, _exp in self.partially_fixed_experiments.iterrows():
+                if (
+                    self.partially_fixed_experiments.iloc[row_index][group[0].key]
+                    is None
+                ):
+                    current_keys = [elem.key for elem in group]
+                    allowed_fixations = np.eye(len(group))
+                    branches = [
+                        self.partially_fixed_experiments.copy()
+                        for i in range(len(allowed_fixations))
+                    ]
+                    for k, elem in enumerate(branches):
+                        elem.loc[row_index, current_keys] = allowed_fixations[k]
+                    return branches
+
+        # branching for the discrete variables
+        for key, (var, values) in self.discrete_vars.items():
+            for row_index, _exp in self.partially_fixed_experiments.iterrows():
+                current_fixation = self.partially_fixed_experiments.iloc[row_index][key]
+                first_fixation, second_fixation = None, None
+                if current_fixation is None:
+                    lower_split, upper_split = equal_count_split(
+                        values, var.lower_bound, var.upper_bound
+                    )
+                    first_fixation = (var.lower_bound, lower_split)
+                    second_fixation = (upper_split, var.upper_bound)
+
+                elif current_fixation[0] != current_fixation[1]:
+                    lower_split, upper_split = equal_count_split(
+                        values, current_fixation[0], current_fixation[1]
+                    )
+                    first_fixation = (current_fixation[0], lower_split)
+                    second_fixation = (upper_split, current_fixation[1])
+
+                if first_fixation is not None:
+                    first_branch = self.partially_fixed_experiments.copy()
+                    second_branch = self.partially_fixed_experiments.copy()
+
+                    first_branch.loc[row_index, key] = first_fixation
+                    second_branch.loc[row_index, key] = second_fixation
+
+                    return [first_branch, second_branch]
+
+        return []
+
+    def __eq__(self, other: NodeExperiment) -> bool:
+        return self.value == other.value
+
+    def __ne__(self, other: NodeExperiment) -> bool:
+        return self.value != other.value
+
+    def __lt__(self, other: NodeExperiment) -> bool:
+        return self.value < other.value
+
+    def __str__(self):
+        return (
+            "\n ================ Branch-and-Bound Node ================ \n"
+            + f"objective value: {self.value} \n"
+            + f"design matrix: \n{self.design_matrix.round(4)} \n"
+            + f"current fixations: \n{self.partially_fixed_experiments.round(4)} \n"
+        )
+
+
+def is_valid(node: NodeExperiment, tolerance: float = 1e-2) -> bool:
+    """
+    test if a design is a valid solution. i.e. binary and discrete variables are valid
+    Args:
+        node: the current node of the branch to be tested
+        tolerance: absolute tolerance between valid values and values in the design
+
+    Returns: True if the design is valid, else False
+
+    """
+    categorical_vars = [var for group in node.categorical_groups for var in group]
+    design_matrix = node.design_matrix
+    for var in categorical_vars:
+        value = design_matrix.get(var.key)
+        if not (
+            np.logical_or(
+                np.isclose(value, 0, atol=tolerance),
+                np.isclose(value, 1, atol=tolerance),
+            ).all()
+        ):
+            return False
+
+    discrete_vars = node.discrete_vars
+    for _key, (var, values) in discrete_vars.items():
+        value = design_matrix.get(var.key)
+        if False in [True in np.isclose(v, values, atol=tolerance) for v in value]:  # type: ignore
+            return False
+    return True
+
+
+def bnb(
+    priority_queue: PriorityQueue,
+    verbose: bool = False,
+    num_explored: int = 0,
+    **kwargs,
+) -> NodeExperiment:
+    """
+    branch-and-bound algorithm for solving optimization problems containing binary and discrete variables
+    Args:
+        num_explored: keeping track of how many branches have been explored
+        priority_queue (PriorityQueue): initial nodes of the branching tree
+        verbose (bool): if true, print information during the optimization process
+        **kwargs: parameters for the actual optimization / find_local_max_ipopt
+
+    Returns: a branching Node containing the best design found
+
+    """
+    if priority_queue.empty():
+        raise RuntimeError("Queue empty before feasible solution was found")
+
+    domain = kwargs["domain"]
+    n_experiments = kwargs["n_experiments"]
+
+    # get objective function
+    model_formula = get_formula_from_string(
+        model_type=kwargs["model_type"], rhs_only=True, domain=domain
+    )
+    objective_class = get_objective_class(kwargs["objective"])
+    objective_class = objective_class(
+        domain=domain, model=model_formula, n_experiments=n_experiments
+    )
+
+    pre_size = priority_queue.qsize()
+    current_branch = priority_queue.get()
+    # test if current solution is already valid
+    if is_valid(current_branch):
+        return current_branch
+
+    # branch current solutions in sub-problems
+    next_branches = current_branch.get_next_fixed_experiments()
+
+    if verbose:
+        print(
+            f"current length of branching queue (+ new branches): {pre_size} + {len(next_branches)} currently "
+            f"explored branches: {num_explored}, current best value: {current_branch.value}"
+        )
+    # solve branched problems
+    for _i, branch in enumerate(next_branches):
+        kwargs["sampling"] = current_branch.design_matrix
+        try:
+            design = find_local_max_ipopt(partially_fixed_experiments=branch, **kwargs)
+            value = objective_class.evaluate(design.to_numpy().flatten())
+            new_node = NodeExperiment(
+                branch,
+                design,
+                value,
+                current_branch.categorical_groups,
+                current_branch.discrete_vars,
+            )
+            domain.validate_candidates(
+                candidates=design.apply(lambda x: np.round(x, 8)),
+                only_inputs=True,
+                tol=1e-4,
+                raise_validation_error=True,
+            )
+
+            priority_queue.put(new_node)
+        except ConstraintNotFulfilledError:
+            if verbose:
+                print("skipping branch because of not fulfilling constraints")
+
+    return bnb(
+        priority_queue,
+        verbose=verbose,
+        num_explored=num_explored + len(next_branches),
+        **kwargs,
+    )