DBGD

A python implementation of Dueling Bandit Gradient Descent (DBGD).

DBGD is a list-wise online learning to rank approach based on dueling bandits problem using user's implicit feedback.

Note that, DBGD is commonly used in online setting
(i.e. models are trained based on user's realtime implicit feedback using some interleaved list),
but this implementation is for offline setting
(i.e. models are trained using offline training data, not realtime user's feedback).
Basically, DBGD should be used in online setting, not offline.

For details about DBGD, see the following papers:

Interactively optimizing information retrieval systems as a dueling bandits problem, Yue+, ICML'09.

Weights are updated using Iterative Parameter Mixture, which is commonly used in supervised online-learning approaches based on offline data such as Perceptron, Passive-Aggressive and so on.

But I'm not sure Iterative Parameter Mixture works well in reinforcement learning.

Requirements

python 2.7
tqdm
numpy
scipy
joblib

Example

Training

from updater import Updater
from weight import Weight

# number of maximum epochs
epochs = 100

# number of maximum number of features
max_feature_num = 5

# exploration parameter
delta = 1.0

# exploitation parameter
ganma = 0.1

# number of parallerization
parallel_num = 6

# metric that you want to optimize
# you can choose MAP or MRR
metric = "MAP"

# make training data
# x_train represents feature vector using dict
#       - key: qid
#	- value: feature vectors using scipy.sparse.csr_matrix
# y_train represents relevancy labels (e.g. 5 scale ratings or binary) corresponding to each feature_vector using dict
#	- key: qid
#	- value: relevancy vectors
x_train, y_train = make_data()

weight = Weight(max_feature_num)

updater = Updater(delta=delta, ganma=ganma, process_num=prallel_num, metric=metric)

for _ in xrange(epochs):
	# update weight using DBGD
	updater.update(x_train, y_train, weight)
	# dump weight parameter
	weight.dump_weight("./models/dbgd")

Testing

from weight import Weight
import predictor import Predictor

# make test data
x_test, y_test = make_data()

# load trained weight parameters from model file
# second argument means number of epochs for weight that you want to load
weight = Weight()
weight.load_weight("./models/dbgd", 30)

predictor = Predictor()

# get result rankings for x_test
for qid, features in x_test.items():
	labels = y_test[qid]
	# ranking is represented as list and its element is composed of (true_label, case_id, score) by descending order of score
	ranking = predictor.predict_and_ranks(features, labels, weight.get_weight())

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
libs		libs
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DBGD

Requirements

Example

Training

Testing

About

Releases

Packages

Languages

License

AkihikoWatanabe/DBGD

Folders and files

Latest commit

History

Repository files navigation

DBGD

Requirements

Example

Training

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages