Offline policy evaluation

Implementations and examples of common offline policy evaluation methods in Python. For more information on offline policy evaluation see this tutorial.

Installation

pip install offline-evaluation

Usage

from ope.methods import doubly_robust

Get some historical logs generated by a previous policy:

df = pd.DataFrame([
	{"context": {"p_fraud": 0.08}, "action": "blocked", "action_prob": 0.90, "reward": 0},
	{"context": {"p_fraud": 0.03}, "action": "allowed", "action_prob": 0.90, "reward": 20},
	{"context": {"p_fraud": 0.02}, "action": "allowed", "action_prob": 0.90, "reward": 10},
	{"context": {"p_fraud": 0.01}, "action": "allowed", "action_prob": 0.90, "reward": 20},     
	{"context": {"p_fraud": 0.09}, "action": "allowed", "action_prob": 0.10, "reward": -20},
	{"context": {"p_fraud": 0.40}, "action": "allowed", "action_prob": 0.10, "reward": -10},
 ])

Define a function that computes P(action | context) under the new policy:

def action_probabilities(context):
    epsilon = 0.10
    if context["p_fraud"] > 0.10:
        return {"allowed": epsilon, "blocked": 1 - epsilon}    
    return {"allowed": 1 - epsilon, "blocked": epsilon}

Conduct the evaluation:

doubly_robust.evaluate(df, action_probabilities)
> {'expected_reward_logging_policy': 3.33, 'expected_reward_new_policy': -28.47}

This means the new policy is significantly worse than the logging policy. Instead of A/B testing this new policy online, it would be better to test some other policies offline first.

See examples for more detailed tutorials.

Supported methods

Inverse propensity scoring
Direct method
Doubly robust (paper)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Offline policy evaluation

Installation

Usage

Supported methods

Files

README.md

Latest commit

History

README.md

File metadata and controls

Offline policy evaluation

Installation

Usage

Supported methods