This repository is the official implementation of Contextual Combinatorial Volatile Bandits via Gaussian Processes, submitted to Machine Learning.
To install requirements:
pip install -r requirements.txt
We use the gpflow library for all GP-related computations and gpflow uses tensorflow. Our code uses the TIM+ algorithm, for which you must link the C++ TIM+ code to Python. Follow here for linking instructions. Once the library has been generated, place it both in the root directory where main.py is and also inside the tim_plus directory.
We ran a total of three simulations. Moreover, none of the algorithms that we implement and test do offline-learning, thus there is no 'training' to be done. However, to be able to repeat the simulations and also improve speed, we first generate the arm contexts, rewards, and other setup-related information and save them as HDF5, in the case of Simulation I, and pickled DataFrames, in the case of Simulations II & III. We provide the links to the generated datasets that we used at the bottom of this README file. By default, when you run the script (main.py), it re-generates new datasets and runs the simulations on them.
To run Simulation I, provide the argument sim_1
to the main.py script. For example, to re-generate this simulation's datasets and run the simulations on the newly generated datasets use
python main.py main
and to run the main paper simulations using pre-generated datasets , which must be in the root directory, use
python main.py main --use_saved_dataset
To run Simulation II, use the argument sim_2
. You can provide the --use_saved_dataset
argument to use pre-generated and saved datasets.
To run Simulation III, use the argument sim_3
. You can provide the --use_saved_dataset
argument to use pre-generated and saved datasets.
After the script has run the simulations, it will automatically plot the reward and regret curves and save them as PDFs. Then, if you want to re-generate the plots without running the whole simulation again, you can give the --only_plot
argument to the main.py script.
You can download the generated datasets that we ran the simulations with below:
The HDF5 dataset file can be downloaded here. Make sure to place the 'movielens_simulation.hdf5' file in the root directory where main.py is.
A zip file of the pickled DataFrames used for the Foursquare simulation (Simulation II) can be downloaded here. Make sure to extract both 'fs_tky_simulation_df_uni' and 'fs_tky_simulation_df_nuni' and place them in the root directory where main.py is.
We use the Wolfram Engine to learn the distribution of the TKY dataset's locations; thus to generate the dataset, you must have the Wolfram Engine installed. It can be installed for free here. Moreover, you must download the exported LearnedDistribution file available here and set its absolute path in fs_problem_model.py. Note that if you download the saved datasets (‘fs_tky_simulation_df_uni’ and ‘fs_tky_simulation_df_nuni’), you will NOT need to download the Wolfram Engine. The Wolfram Engine is only needed to generate the datasets.
A zip file of the pickled DataFrames used for the varying arm codependence simulation (Simulation III) can be downloaded here. Make sure to extract all of the five files, each corresponding to a different kernel lengthscale, and place them in the root directory where main.py is.
Our algorithm beats the current combinatorial contextual volatile multi-armed bandit (CCV-MAB) algorithm, ACC-UCB. The figure below shows the time-averaged reward of the sparse version of our algorithm (SO'CLOK_UCB) and ACC-UCB on a movie-recommendation simulation (Simulation I) using the MovieLens dataset. Notice that even with just 2 inducing points, we manage to outperform ACC-UCB. See Section 5 of the paper for a detailed explanation of the setup and in-depth analysis.