This repository implements the Learning to Cut chapter in my Master's thesis on Combinatorial Optimization with Graph Reinforcement Learning.
You will find here:
- RL Environments for playing with cut selection in SCIP:
- Adapting SCIP cut selection parameters to individual ILP instances.
- Adapting SCIP cut selection parameters to every LP round.
- Direct cut selection, i.e explicitly choosing the cuts which will b applied to the LP in each LP round.
- Ape-X DQN framework for:
- distributed RL training on your PC or on clusters using
slurm
scheduler (e.g. Compute Canada). - development with debug capabilities of remote actors.
- extensive evaluation of multiple models/instances/seeds etc.
- distributed RL training on your PC or on clusters using
- For Compute Canada users:
- Scripts and guidelines how to run multiple whole node experiments exploiting the massive compute power provided by the various clusters.
- Complementary experiments showing the room for improvement on MAXCUT and MVC.
I'm working on integrating the RL part of this project into Ecole,
a fancy gym-like SCIP RL environment (currently supports branching rules some more).
Nevertheless, if you are interested in direct control of every line in SCIP,
this project will provide you with good starting point and many workarounds.
The following instructions were tested successfully on Ubuntu 18.04 with gcc 7.5.0
and cuda 10.2
.
- Add
export SCIPOPTDIR=/path/to/SCIP/installation/dir
to your~/.bashrc
. - Open terminal and install SCIP:
$ git clone https://github.com/avrech/scipoptsuite-6.0.2-avrech.git
$ cd scipoptsuite-6.0.2-avrech
$ cmake -Bbuild -H. -DCMAKE_BUILD_TYPE=Debug -DCMAKE_INSTALL_PREFIX=$SCIPOPTDIR
$ cmake --build build
$ cd build
$ make install
- Create virtual environment and install this repo:
$ virtualenv --python=python3 venv
$ source venv/bin/activate
$ git clone https://github.com/avrech/learning2cut.git $ pip install -r learning2cut/requirements.txt $ pip install -e learning2cut
- Install PySCIPOpt:
git clone https://github.com/ds4dm/PySCIPOpt.git
cd PySCIPOpt
git checkout ml-cutting-planes
pip install --debug_option='--debug' .
- Sign up to Weights & Biases, and follow the instructions to connect your device to your
wandb
account.
Custom torch_geometric
installtion instructions according to your cuda
version can be found here.
Do not install the virtualenv with the provided requirements.txt
!
The installation on compute Canada is tricky,
because packages like torch, torch-geometric, pyarrow and more require
certain gcc and cuda versions and probably changing the software environment to an older one.
Here I provide with a recipe that worked on Graham and Niagara as of writing these lines.
pyarrow
cannot be installed directly. Instead, load it bymodule load arrow
.torch_geometric
is compiled for specifictorch
andcuda
versions. For availabletorch
versions contact CC support.- The following setup was tested successfully on Graham:
$ module load StdEnv/2018.3 gcc/7.3.0 python/3.7.4 arrow
$ virtualenv env && source env/bin/activate
(env) ~ $ pip install torch==1.4.0 torch_geometric torchvision torch-scatter torch-sparse torch-cluster torch-spline-conv -U --force
(env) ~ $ python -c "import pyarrow; torch_geometric; import torch_cluster; import torch_cluster.graclus_cpu"
(env) ~ $
- The virtualenv worked with
module load NiaEnv/2018a
inside the sbatch scripts.
Inside learning2cut/data
run:
python generate_data.py --experiment_configfile ../experiments/cut_selection_dqn/configs/exp5.yaml --data_configfile <mvc/maxcut>_data_config.yaml --datadir --mp ray --nworkers --quiet
Or on Niagara
(recommended):
sbatch sbatch_niagara_maxcut.sh
sbatch sbatch_niagara_mvc.sh
generate_data.py
does the following:
- Randomizes
barabasi-albert
andErdos-Reyni
graphs for MAXCUT and MVC respectively. - For each graph, solves MAXCUT/MVC with B&C with time limit of 1 hour.
- Saves stats of the solving process for three baselines
default
,15_random
and15_most_violated
.
This experiment requires massive computation power. The code was built to run on Niagara.
Inside learning2cut/experiments/room4improvement
run:
python run_experiment.py --datadir $SCRATCH/learning2cut/data --rootdir $SCRATCH/room4improvement
The script will save the first instance of each validation set generated by data/generate_data.py
into a data.pkl
for the whole experiment.
On Niagara, inside learning2cut/experiments/room4improvement
run:
python run_scip_tuned.py --rootdir $SCRATCH/room4improvement --nnodes 20 --ncpus_per_node 80
Jobs for finding scip_tuned
policy will be submitted. After all jobs have finished, run the same command line again to finalize stuff.
In a case something went wrong in the first run, the script should be invoked again until it finishes the work.
On Niagara, inside learning2cut/experiments/room4improvement
run:
python run_scip_adaptive.py --rootdir $SCRATCH/room4improvement --nnodes 20 --ncpus_per_node 80
Running this command line 2K
times will generate adaptive policy for K
lp rounds.
On Niagara, inside learning2cut/experiments/room4improvement
run:
python run_scip_tuned_avg.py --rootdir $SCRATCH/room4improvement --datadir $SCRATCH/learning2cut/data --nnodes 20 --ncpus_per_node 60
After all jobs finished, run again to finalize stuff.
To compare all baselines in terms of solving time,
run again run_experiment
pointing to the rootdir where
individual_tuning, average_tuning and inidividual_adaptive_tuning results are stored.
The script will test all baselines on the local machine one by one
without multiprocessing.
Results will be saved to csv and png files.
In this experiment we investigate two RL formulations for tuning SCIP,
combinatorial contextual multi-armed bandit (CCMAB) and MDP.
The CCMAB problem is actually a 1-step MDP, and is solved using the same
DQN algorithm.
To experiment with these two formulations on compute canada, enter
experiments/scip_tuning_dqn
, and run sbatch_scip_tuning.py
.
This script allows submitting multiple whole node jobs on Graham or Niagara.
Inside this script edit the parameters grid search space you want to test.
This can be useful for both basic hparam tuning,
and for running multiple training jobs with different seeds so that the robustness
of the RL algorithm can be evaluated.
The following command line will submit multiple jobs, saving their output to
a path depending on the cluster you use.
Niagara example:
$ python sbatch_scip_tuning.py --cluster niagara --tag reported_runs --hours 12
Synchronize the wandb
run dirs to wandb cloud once the jobs finished:
$ python ../sync_wandb_runs.py --dir $SCRATCH/learning2cut/scip_tuning/results/reported_runs/outfiles/
Select run_id
s to test and evaluate them on the test sets:
$ python sbatch_scip_tuning.py --cluster niagara --hours 3 --test --run_ids 130m0n5o 3n3uxetn baseline --tag reported_runs --num_test_nodes 10 && python sbatch_scip_tuning.py --cluster niagara --hours 3 --test --run_ids 130m0n5o 3n3uxetn baseline --tag reported_runs --num_test_nodes 10 --test_args use_cycles=False,set_aggressive_separation=False
This will execute extensive tests on tons of cpus, saving their results in test
directories inside each run_dir
Run again after all jobs completed:
$ python sbatch_scip_tuning.py --cluster niagara --hours 3 --test --run_ids 130m0n5o 3n3uxetn baseline --tag reported_runs --num_test_nodes 10 && python sbatch_scip_tuning.py --cluster niagara --hours 3 --test --run_ids 130m0n5o 3n3uxetn baseline --tag reported_runs --num_test_nodes 10 --test_args use_cycles=False,set_aggressive_separation=False
and analyize the results with:
$ python analyze_scip_tuning_results.py --savedir
$SCRATCH/learning2cut/scip_tuning/results/reported_runs --mdp_run_id 3n3uxetn --ccmab_run_id 130m0n5o $ python analyze_scip_tuning_results.py --savedir $SCRATCH/learning2cut/scip_tuning/results/reported_runs --mdp_run_id 3n3uxetn --ccmab_run_id 130m0n5o --test_args use_cycles=False,set_aggressive_separation=False
This will summarize and write the root only and branch-and-cut results to .csv
files.
There are two run files, run_single_thread_dqn.py
for single thread training, and run_cut_selection_dqn.py
for distributed training.
The distributed version is useful also for debugging and development, as each actor can run independently of the others.
run_cut_selection_dqn.py
allows debugging and updating the code of a specific actor while the entire system keep running.
Run
$ python run_cut_selection_dqn.py --rootdir /path/to/save/results --configfile /path/to/config/file --use-gpu
Example config files can be found at learning2cut/experiments/dqn/configs
. Those files conveniently pack parameters for training.
All parameters are controlled also from command line, where the command line args override the config file setting.
Each run is assigned a random 8-characters run_id
which can be used for resuming and for viewing results on wandb
dashboard.
For resuming a run, add --resume --run_id <run_id>
to the command line arguments.
Actors can be restarted (for updating code) without restarting the entire system. Useful cases:
- updating the tester code with additional tests/logs without shutting down the replay server.
- fixing bugs and restarting an actor after crashing.
Restart options are: - Restarting the entire system: add
--restart
to the resuming command line. This will restart all crashed actors. - Restarting specific actors: add
--restart --restart-actors <list of actors>
. The list of actors can include any combination ofapex
,replay_server
,learner
,tester
andworker_<worker_id>
(worker_id
running from 1 tonum_workers
). - Forcing restart when the target actors are still running: add
--force-restart
to the arguments above.
Example:
In order to debug a remote actors, run:
python run_cut_selection_dqn.py --resume --run_id <run_id> --restart [--restart-actors ] --debug-actor <actor_name>
This will restart the debugged actor main loop in the debugger, so one can step into the actor code, while the rest of remote actors keep running.
Inside learning2cut/experiments/cut_selection_dqn
run:
python cycles_variability.py --logdir results/cycles_variability [--simple_cycle_only --chordless_only --enable_chordality_check] --record_cycles
cycles_variability.py
will solve each graph in validset_20_30
and validset_50_60
10 times with seeds ranging from 0 to 9. In each separation round it will save the cycles generated along with other related stats.
The script will pickle a dictionary of the following structure:
{dataset: [{seed: stats for seed in range(10)} for graph in dataset] for dataset in [`validset_20_30`, `validset_50_60`]}
The recorded_cycles
are stored in stats
alongside the dualbound
, lp_iterations
etc. A cycle is stored as a dictionary with items:
edges
: a list of the edges in cycleF
: a list of odd number of cut edgesC_minus_F
: a list of the rest of the edgesis_simple
: True if the cycle is simple cycle else Falseis_chordless
: True if the cycle has no chords else Falseapplied
: True if the cycle was selected to the LP else False
Done | Exp | Train Set | Behaviour | Loss | SCIP Seed | Goal | Results |
---|---|---|---|---|---|---|---|
☑ | 1 | Fixed graph | Demo | Demo | Fixed | Perfect overfitting | here |
☑ | 2 | Fixed graph | Demo | Demo | Random | Generalization across seeds | here |
☑ | 3 | Random | Demo | Demo | Random | Generalization across graphs | here |
☐ | 4 | Random | Demo | Demo+DQN | Random | See convergence to "interesting" policy | here |
☐ | 5 | Random | Demo+DQN | Demo+DQN | Random | Improving over SCIP | here |
This work was done during my Master's studies at the Technion
under the supervision of Prof. Shie Mannor and Prof. Tamir Hazan.
The research was done in collaboration with @gasse and @akazachk from the Polytechnique university in Montreal.
I thank @dchetelat for his helpful comments in the initial steps of the project,
and in particular for improving the mathematical formulation of the MAXCUT problem.
I also thank @cyoon1729 for providing me with initial Ape-X DQN implementation.