Skip to content

Commit

Permalink
Doe categorical second attempt (#259)
Browse files Browse the repository at this point in the history
* setting up bofire

* simple relaxed categorical example in jupyter notebook

* adding exhaustive search for optimization problems with binary variables and writing tests

* showcasing how to use binary variables

* changes in jupyter notebooks

* adding constraint mapper

* fixing bug with dtype

* stated change to one-hot-encoding

* allowing for multiple groups

* starting with branch-and-bound

* rough sketch of bnb

* asserting that design variables is actually a 1d array

* asserting that design variables is actually a 1d array

* using the right argument for optimality criteria

* finishing first bab implementation

* changing __str__

* resetting test for valid solution

* printing information of branching

* allowing for relaxed discrete variables, and fixing bug with sequence of column keys in dataframe

* fixing docstring

* skipping non-valid-designs and branching for discrete values

* catching case for list length 1

* cringe... renaming function

* removing redundant code

* bug fix/ removing code with no effect

* renaming variable to make intention clear

* start including new doe strategies in api

* adding categorical groups to the domain

* changing to categorical groups from domain

* correcting docstring

* start writing tests for categorical and discrete variables

* catching branches in exhaustive search which do not fulfill constraints

* simple tests for bab and exhaustive search

* adding optional print statements for information about the optimization process

* adding optional print statements for information about the optimization process

* bug fix, allowing arbitrary objective function and model_type

* bug fix, fixing and partially fixing experiments can now be combined

* skipping branches with already fixed experiments

* skipping branches with already fixed experiments in exhaustive search

* removing unused code from binary vars and giving warning when unsuitable attributes are passed

* adding documentation

* adding documentation

* adding documentation

* adding documentation

* bug fix, now also testing if solution for discrete variables are also valid

* bug fix, now also testing if solution for discrete variables are also valid

* adding documentation

* bug fix, where error occurred if fixed_experiments was None

* adjusting tolerances for testing for valid solution

* adjusting tolerances for testing for valid solution

* raising error when to many (partially)-fixed experiments are provided

* bug fix, sorting (partially) fixed experiments and initial guess if provided

* adapting branch-and-bound to new fixed experiments usage

* bug fix, allowing to fix candidates with .tell

* allowing to partially fix experiments with .tell and with all strategies

* allowing to use either equality or inequality and changing rhs

* reverting, we can only do exactly 1,

* bug fix, if partially_fixed_experiments are none, and adding time information to verbose output

* adding error for using discrete var in exhaustive search, bug fix where order of variables lead to fixing wrong variables, bug fix with where partially fixed experiments is none

* renaming

* adding information about how many branches have been explored

* added NChooseKGroup_with_quantity (helper function) and mapping from discrete domain to relaxable domain

* bug fix, NChooseKGroup_with_quantity (helper function) and allowing to choose between current categorical variables (and NonLinearConstraints) or relaxable categorical variables (and LinearConstraints)

* making some arguments optional

* fixing optional arguments

* adding documentation

* Update documentation bofire/data_models/constraints/nonlinear.py

Co-authored-by: Johannes P. Dürholt <johannespeter.duerholt@evonik.com>

* refactoring RelaxableBinaryInput, RelaxableDiscreteInput, they are not accessible through the public api anymore.

* refactoring generate_mixture_constraint and deleting unused functions

* allowing to use the old strategy to solve nchoosek constraints

* reversing accidental commit

* refactoring functions

* removing check for initial guess, as we can also allow non-valid initial points

* adding initial guess based on design of previous branch

* bug fix

* deleting old example

* merge main

* fixing typing

* looser tolerances and pruning branches where ipopt does not satisfy constraints

* skipping fixations where ipopt does not satisfy constraints

* reverting commit, where I skip the is_fulfilled test

* typing

* adding not implemented error

* typing

* fixing tests

* Delete .gitattributes

* Delete .idea directory

* typing

* removing outdated tests, and fixing existing ones

* evaluating with d_optimality requires a 1D array

* adding test for categorical and discrete doe with nchoosek

* typing

* typing

* adding random seed

* adapting test

* adapting (partially) fixed experiments

* typing

* ignore typing

* ignore typing

* started beautiful fix

* reverting beautiful fix

* quick fix

* Update test_doe.py

* fix test

* going back to beautiful fix

* removing Relaxable Features

* deleting unnecessary check

* adding set candidates and test

* typing

* typing

* bug fix

* refactoring

* removing unnecessary line

* merging tutorials from main

* merging tutorials from main

---------

Co-authored-by: GuenesUI <ufuk-ilkay.guenes@basf.com>
Co-authored-by: Johannes P. Dürholt <johannespeter.duerholt@evonik.com>
  • Loading branch information
3 people authored Aug 16, 2023
1 parent 3e481f5 commit 6810c60
Show file tree
Hide file tree
Showing 14 changed files with 1,626 additions and 60 deletions.
1 change: 1 addition & 0 deletions bofire/data_models/constraints/nchoosek.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ def is_fulfilled(self, experiments: pd.DataFrame, tol: float = 1e-6) -> pd.Serie
Returns:
bool: True if fulfilled else False.
"""

cols = self.features
sums = (np.abs(experiments[cols]) > tol).sum(axis=1)

Expand Down
4 changes: 2 additions & 2 deletions bofire/data_models/constraints/nonlinear.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def jacobian(self, experiments: pd.DataFrame) -> pd.DataFrame:


class NonlinearEqualityConstraint(NonlinearConstraint):
"""Nonlinear inequality constraint of the form 'expression <= 0'.
"""Nonlinear equality constraint of the form 'expression == 0'.
Attributes:
expression: Mathematical expression that can be evaluated by `pandas.eval`.
Expand All @@ -91,7 +91,7 @@ def __str__(self):


class NonlinearInequalityConstraint(NonlinearConstraint):
"""Linear inequality constraint of the form 'expression == 0'.
"""Nonlinear inequality constraint of the form 'expression <= 0'.
Attributes:
expression: Mathematical expression that can be evaluated by `pandas.eval`.
Expand Down
2 changes: 1 addition & 1 deletion bofire/data_models/domain/domain.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ def validate_linear_constraints(cls, v, values):
# gather continuous inputs in dictionary
continuous_inputs_dict = {}
for f in values["inputs"]:
if type(f) is ContinuousInput:
if isinstance(f, ContinuousInput):
continuous_inputs_dict[f.key] = f

# check if non continuous input features appear in linear constraints
Expand Down
9 changes: 6 additions & 3 deletions bofire/data_models/strategies/doe.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@

from bofire.data_models.constraints.api import Constraint
from bofire.data_models.features.api import (
CategoricalInput,
DiscreteInput,
Feature,
MolecularInput,
)
Expand All @@ -22,14 +20,19 @@ class DoEStrategy(Strategy):
],
str,
]
optimization_strategy: Literal[
"default", "exhaustive", "branch-and-bound", "partially-random", "relaxed"
] = "default"

verbose: bool = False

@classmethod
def is_constraint_implemented(cls, my_type: Type[Constraint]) -> bool:
return True

@classmethod
def is_feature_implemented(cls, my_type: Type[Feature]) -> bool:
if my_type in [CategoricalInput, DiscreteInput, MolecularInput]:
if my_type in [MolecularInput]:
return False
return True

Expand Down
227 changes: 227 additions & 0 deletions bofire/strategies/doe/branch_and_bound.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
from __future__ import annotations

from functools import total_ordering
from queue import PriorityQueue
from typing import Dict, List, Optional, Tuple

import numpy as np
import pandas as pd

from bofire.data_models.constraints.api import ConstraintNotFulfilledError
from bofire.data_models.features.api import ContinuousInput
from bofire.strategies.doe.design import find_local_max_ipopt
from bofire.strategies.doe.objective import get_objective_class
from bofire.strategies.doe.utils import get_formula_from_string
from bofire.strategies.doe.utils_categorical_discrete import equal_count_split


@total_ordering
class NodeExperiment:
def __init__(
self,
partially_fixed_experiments: pd.DataFrame,
design_matrix: pd.DataFrame,
value: float,
categorical_groups: Optional[List[List[ContinuousInput]]] = None,
discrete_vars: Optional[Dict[str, Tuple[ContinuousInput, List[float]]]] = None,
):
"""
Args:
partially_fixed_experiments: dataframe containing (some) fixed variables for experiments.
design_matrix: optimal design for given the fixed and partially fixed experiments
value: value of the objective function evaluated with the design_matrix
categorical_groups: Represents the different groups of the categorical variables
discrete_vars: Dict of discrete variables and the corresponding valid values in the optimization problem
"""
self.partially_fixed_experiments = partially_fixed_experiments
self.design_matrix = design_matrix
self.value = value
if categorical_groups is not None:
self.categorical_groups = categorical_groups
else:
self.categorical_groups = []
if discrete_vars is not None:
self.discrete_vars = discrete_vars
else:
self.discrete_vars = {}

def get_next_fixed_experiments(self) -> List[pd.DataFrame]:
"""
Based on the current partially_fixed_experiment DataFrame the next branches are determined. One variable will
be fixed more than before.
Returns: List of the next possible branches where only one variable more is fixed
"""
# branching for the binary/ categorical variables
for group in self.categorical_groups:
for row_index, _exp in self.partially_fixed_experiments.iterrows():
if (
self.partially_fixed_experiments.iloc[row_index][group[0].key]
is None
):
current_keys = [elem.key for elem in group]
allowed_fixations = np.eye(len(group))
branches = [
self.partially_fixed_experiments.copy()
for i in range(len(allowed_fixations))
]
for k, elem in enumerate(branches):
elem.loc[row_index, current_keys] = allowed_fixations[k]
return branches

# branching for the discrete variables
for key, (var, values) in self.discrete_vars.items():
for row_index, _exp in self.partially_fixed_experiments.iterrows():
current_fixation = self.partially_fixed_experiments.iloc[row_index][key]
first_fixation, second_fixation = None, None
if current_fixation is None:
lower_split, upper_split = equal_count_split(
values, var.lower_bound, var.upper_bound
)
first_fixation = (var.lower_bound, lower_split)
second_fixation = (upper_split, var.upper_bound)

elif current_fixation[0] != current_fixation[1]:
lower_split, upper_split = equal_count_split(
values, current_fixation[0], current_fixation[1]
)
first_fixation = (current_fixation[0], lower_split)
second_fixation = (upper_split, current_fixation[1])

if first_fixation is not None:
first_branch = self.partially_fixed_experiments.copy()
second_branch = self.partially_fixed_experiments.copy()

first_branch.loc[row_index, key] = first_fixation
second_branch.loc[row_index, key] = second_fixation

return [first_branch, second_branch]

return []

def __eq__(self, other: NodeExperiment) -> bool:
return self.value == other.value

def __ne__(self, other: NodeExperiment) -> bool:
return self.value != other.value

def __lt__(self, other: NodeExperiment) -> bool:
return self.value < other.value

def __str__(self):
return (
"\n ================ Branch-and-Bound Node ================ \n"
+ f"objective value: {self.value} \n"
+ f"design matrix: \n{self.design_matrix.round(4)} \n"
+ f"current fixations: \n{self.partially_fixed_experiments.round(4)} \n"
)


def is_valid(node: NodeExperiment, tolerance: float = 1e-2) -> bool:
"""
test if a design is a valid solution. i.e. binary and discrete variables are valid
Args:
node: the current node of the branch to be tested
tolerance: absolute tolerance between valid values and values in the design
Returns: True if the design is valid, else False
"""
categorical_vars = [var for group in node.categorical_groups for var in group]
design_matrix = node.design_matrix
for var in categorical_vars:
value = design_matrix.get(var.key)
if not (
np.logical_or(
np.isclose(value, 0, atol=tolerance),
np.isclose(value, 1, atol=tolerance),
).all()
):
return False

discrete_vars = node.discrete_vars
for _key, (var, values) in discrete_vars.items():
value = design_matrix.get(var.key)
if False in [True in np.isclose(v, values, atol=tolerance) for v in value]: # type: ignore
return False
return True


def bnb(
priority_queue: PriorityQueue,
verbose: bool = False,
num_explored: int = 0,
**kwargs,
) -> NodeExperiment:
"""
branch-and-bound algorithm for solving optimization problems containing binary and discrete variables
Args:
num_explored: keeping track of how many branches have been explored
priority_queue (PriorityQueue): initial nodes of the branching tree
verbose (bool): if true, print information during the optimization process
**kwargs: parameters for the actual optimization / find_local_max_ipopt
Returns: a branching Node containing the best design found
"""
if priority_queue.empty():
raise RuntimeError("Queue empty before feasible solution was found")

domain = kwargs["domain"]
n_experiments = kwargs["n_experiments"]

# get objective function
model_formula = get_formula_from_string(
model_type=kwargs["model_type"], rhs_only=True, domain=domain
)
objective_class = get_objective_class(kwargs["objective"])
objective_class = objective_class(
domain=domain, model=model_formula, n_experiments=n_experiments
)

pre_size = priority_queue.qsize()
current_branch = priority_queue.get()
# test if current solution is already valid
if is_valid(current_branch):
return current_branch

# branch current solutions in sub-problems
next_branches = current_branch.get_next_fixed_experiments()

if verbose:
print(
f"current length of branching queue (+ new branches): {pre_size} + {len(next_branches)} currently "
f"explored branches: {num_explored}, current best value: {current_branch.value}"
)
# solve branched problems
for _i, branch in enumerate(next_branches):
kwargs["sampling"] = current_branch.design_matrix
try:
design = find_local_max_ipopt(partially_fixed_experiments=branch, **kwargs)
value = objective_class.evaluate(design.to_numpy().flatten())
new_node = NodeExperiment(
branch,
design,
value,
current_branch.categorical_groups,
current_branch.discrete_vars,
)
domain.validate_candidates(
candidates=design.apply(lambda x: np.round(x, 8)),
only_inputs=True,
tol=1e-4,
raise_validation_error=True,
)

priority_queue.put(new_node)
except ConstraintNotFulfilledError:
if verbose:
print("skipping branch because of not fulfilling constraints")

return bnb(
priority_queue,
verbose=verbose,
num_explored=num_explored + len(next_branches),
**kwargs,
)
Loading

0 comments on commit 6810c60

Please sign in to comment.