Greedy Policy for MOCO

Official PyTorch implementation of "Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization", published at ICML'24. This code is built on top of LaMBO repo and MOGFL repo

Paper

Abstract Active learning is increasingly adopted for expensive multi-objective combinatorial optimization problems, but it involves a challenging subset selection problem, optimizing the batch acquisition score that quantifies the goodness of a batch for evaluation. Due to the excessively large search space of the subset selection problem, prior methods optimize the batch acquisition on the latent space, which has discrepancies with the actual space, or optimize individual acquisition scores without considering the dependencies among candidates in a batch instead of directly optimizing the batch acquisition. To manage the vast search space, a simple and effective approach is the greedy method, which decomposes the problem into smaller subproblems, yet it has difficulty in parallelization since each subproblem depends on the outcome from the previous ones. To this end, we introduce a novel greedy-style subset selection algorithm that optimizes batch acquisition directly on the combinatorial space by sequential greedy sampling from the greedy policy, specifically trained to address all greedy subproblems concurrently. Notably, our experiments on the red fluorescent proteins design task show that our proposed method achieves the baseline performance in 1.69x fewer queries, demonstrating its efficiency.

Installation

Requirements : Anaconda, cudatoolkit 11.3

Install conda environment

Run install scripts.

    source install.sh

Note. Ignore the version conflict related to protbuf.

(Optional) If you want to conduct single-round experiments on the DNA aptamers, please follow instructions in NUPACK Docs to install NUPACK library ver 4.0.1. You may login this link to agree NUPACK license.
(Optional) If you want to conduct multi-round experiments on the red fluorescent proteins (RFPs), please follow instructions in LaMBO to install FoldX executable and supporting assets.

1. Run Single-Round Experiments

Arguments

task: name of task [regex_2, regex, regex_4, nupack_energy_pins_pairs]
tokenizer: name of tokenizer [protein, aptamer]
algorithm: name of algorithm [setrl, moreinforce, greedyrl, rand_hill, rand_sample]
max_size: n, the cardinality constraint for actual construction
train_steps: N_u = 4000, the number of update steps
batch_size: N_e = 128, the number of episodes per step
n_set_samples: N_t = 4, the frequency of updating behavior
train_max_size: n_train = 64, the cardinality constraint for training
pi_lr : eta, learning rate [1e-4, 1e-5, 1e-6]
random_action_prob: random action prob [0, 0.05]

Commands

Here, we introduce the commands to reproduce our experimental results in Table 1 as follows.

To start, activate the installed environment for single-round experiments by run conda activate setbench. For easier understanding, we present commands for (4 Bigrams, n=16) case of Table 1.

Method	Hypervolume Indicator (4 Bigrams, `n=16`)
Ours	0.077 (0.000)
PC-RL (TS)	0.054 (0.003)
PC-RL (WS)	0.016 (0.005)
Greedy + RL	0.062 (0.005)
Greedy + HC	0.014 (0.008)
Greedy + RS	0.000 (0.000)

Ours

python run_scripts/run_single.py algorithm.state_save_path=$DIR2SAVE task=regex_4 tokenizer=protein algorithm=setrl algorithm.pi_lr=1e-4 algorithm.random_action_prob=0.0 algorithm.max_size=16 trial_id=$trial_id

Baselines

Preference-conditioned reinforcement learning (PC-RL) w/ Tchebycheff scalarization (TS)

python run_scripts/run_single.py algorithm.state_save_path=$DIR2SAVE task=regex_4 tokenizer=protein algorithm=moreinforce algorithm.reward_type=tchebycheff algorithm.pi_lr=1e-4 algorithm.random_action_prob=0.0 algorithm.max_size=16 algorithm.simplex_bins=20 trial_id=$trial_id

Preference-conditioned reinforcement learning (PC-RL) w/ weight scalarization (WS)

python run_scripts/run_single.py algorithm.state_save_path=$DIR2SAVE task=regex_4 tokenizer=protein algorithm=moreinforce algorithm.reward_type=convex algorithm.pi_lr=1e-5 algorithm.random_action_prob=0.0 algorithm.max_size=16 algorithm.simplex_bins=20 trial_id=$trial_id

Greedy + reinforcement learning (RL)

python run_scripts/run_single.py algorithm.state_save_path=$DIR2SAVE task=regex_4 tokenizer=protein algorithm=greedyrl algorithm.pi_lr=1e-5 algorithm.random_action_prob=0.0 algorithm.max_size=16 trial_id=$trial_id

Greedy + hill climbing (HC)

python run_scripts/run_single.py algorithm.state_save_path=$DIR2SAVE task=regex_4 tokenizer=protein algorithm=rand_hill algorithm.max_size=16 trial_id=$trial_id

Greedy + random sampling (RS)

python run_scripts/run_single.py algorithm.state_save_path=$DIR2SAVE task=regex_4 tokenizer=protein algorithm=rand_sample algorithm.max_size=16 trial_id=$trial_id

After running commands above for $trial_id in [0,1,...,9], you may adopt the following example script to aggregate the results.

python agg_scripts/agg_single_example.py

If you use slurm workload manager, you may run sbatch run_scripts/run_single_example.sbatch to run commands above for 10 trials at once.

Note. For a fair comparison, we fix the number of queries to the deterministic surrogate model across all methods. We consider (pi_lr, random_action_prob) in [1e-3, 1e-4, 1e-5] X [0.0, 0.05] for hyperparameter tuning. Please refer to Appendix B.3 for the detailed experimental settings.

Note. For the faster runtime of Ours, you may set n_set_samples to a higher value. This may result in a lower performance.

Note. Since we can reuse a set-conditioned rl agent or a preference-conditioned rl agent to generate subsets with various cardinality, we can conduct experiments for multiple cardinalities at once for Ours and PC-RL methods. You may set algorithm.max_size=-1 to conduct experiments for n=4, 16, 64, 256 at once.

2. Run Multi-Round Experiments

Here, we introduce the commands to reproduce our experimental results in Figure 2, 3, and 4 as follows.

Arguments

task: name of task [regex, proxy_rfp]
tokenizer: name of tokenizer [protein]
algorithm: name of algorithm [setrl, lambo, mb_genetic, mf_genetic]
acquisition: name of acquisition [nehvi, ehvi, ucbhvi]
optimizer.use_mlm_logits: Set True for Ours w/ MLM [True, False]

Commands

To start, activate the installed environment by run conda activate setbench.

Ours

Ours w/ MLM

python run_scripts/run_active_learning.py task=$task tokenizer=protein acquisition=nehvi optimizer=setrl optimizer.use_mlm_logits=True trial_id=$trial_id

Ours w/o MLM

python run_scripts/run_active_learning.py task=$task tokenizer=protein acquisition=nehvi optimizer=setrl optimizer.use_mlm_logits=False trial_id=$trial_id

Baselines

LaMBO

python run_scripts/run_active_learning.py task=$task tokenizer=protein acquisition=nehvi optimizer=lambo trial_id=$trial_id

Model-based genetic algorithm (MBGA)

python run_scripts/run_active_learning.py task=$task tokenizer=protein acquisition=nehvi optimizer=mb_genetic optimizer/algorithm=soga trial_id=$trial_id

NSGA-II

python run_scripts/run_active_learning.py task=$task tokenizer=protein optimizer=mf_genetic optimizer/algorithm=nsga2 trial_id=$trial_id

We conducted each experiment for 10 trials.

Machine Information

Below are the information about machine that authors used.

OS: Ubuntu 22.04
CUDA Driver Version: 530.30.02
gcc: 11.4.0
nvcc(CUDA): 12.1
CPU: AMD EPYC 7402 24-Core Processor
GPU: NVIDIA GeForce RTX 3090 GPU

Citation

% Our paper
@inproceedings{lee24greedy,
      title={Training Greedy Policy for Proposal Batch Selection in Expensive Multi-Objective Combinatorial Optimization},
      author={Lee, Deokjae and Song, Hyun Oh and Cho, Kyunghyun},
      booktitle = {International Conference on Machine Learning (ICML)},
      year={2024}
}

% MOGFN
@inproceedings{jain2023multi,
  title={Multi-objective gflownets},
  author={Jain, Moksh and Raparthy, Sharath Chandra and Hern{\'a}ndez-Garc{\i}́a, Alex and Rector-Brooks, Jarrid and Bengio, Yoshua and Miret, Santiago and Bengio, Emmanuel},
  booktitle={International Conference on Machine Learning},
  pages={14631--14653},
  year={2023},
  organization={PMLR}
}

% LaMBO
@inproceedings{stanton2022accelerating,
  title={Accelerating bayesian optimization for biological sequence design with denoising autoencoders},
  author={Stanton, Samuel and Maddox, Wesley and Gruver, Nate and Maffettone, Phillip and Delaney, Emily and Greenside, Peyton and Wilson, Andrew Gordon},
  booktitle={International Conference on Machine Learning},
  pages={20459--20478},
  year={2022},
  organization={PMLR}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
agg_scripts		agg_scripts
asset		asset
hydra_config		hydra_config
run_scripts		run_scripts
setbench		setbench
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Greedy Policy for MOCO

Installation

1. Run Single-Round Experiments

Arguments

Commands

Ours

Baselines

2. Run Multi-Round Experiments

Arguments

Commands

Ours

Baselines

Machine Information

Citation

About

Releases

Packages

Languages

License

snu-mllab/GreedyPolicyForMOCO

Folders and files

Latest commit

History

Repository files navigation

Greedy Policy for MOCO

Installation

1. Run Single-Round Experiments

Arguments

Commands

Ours

Baselines

2. Run Multi-Round Experiments

Arguments

Commands

Ours

Baselines

Machine Information

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages