Off-policy evaluation
80 papers with code • 0 benchmarks • 0 datasets
Off-policy Evaluation (OPE), or offline evaluation in general, evaluates the performance of hypothetical policies leveraging only offline log data. It is particularly useful in applications where the online interaction involves high stakes and expensive setting such as precision medicine and recommender systems.
Benchmarks
These leaderboards are used to track progress in Off-policy evaluation
Libraries
Use these libraries to find Off-policy evaluation models and implementationsMost implemented papers
Open Bandit Dataset and Pipeline: Towards Realistic and Reproducible Off-Policy Evaluation
Our dataset is unique in that it contains a set of multiple logged bandit datasets collected by running different policies on the same platform.
Benchmarks for Deep Off-Policy Evaluation
Off-policy evaluation (OPE) holds the promise of being able to leverage large, offline datasets for both evaluating and selecting complex policies for decision making.
Off-Policy Evaluation for Large Action Spaces via Embeddings
Unfortunately, when the number of actions is large, existing OPE estimators -- most of which are based on inverse propensity score weighting -- degrade severely and can suffer from extreme bias and variance.
Optimal and Adaptive Off-policy Evaluation in Contextual Bandits
We study the off-policy evaluation problem---estimating the value of a target policy using data collected by another policy---under the contextual bandit model.
Robust Generalization despite Distribution Shift via Minimum Discriminating Information
Training models that perform well under distribution shifts is a central challenge in machine learning.
Evaluating the Robustness of Off-Policy Evaluation
Unfortunately, identifying a reliable estimator from results reported in research papers is often difficult because the current experimental procedure evaluates and compares the estimators' performance on a narrow set of hyperparameters and evaluation policies.
Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model
We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions.
Balanced Off-Policy Evaluation for Personalized Pricing
We consider a personalized pricing problem in which we have data consisting of feature information, historical pricing decisions, and binary realized demand.
Off-policy evaluation for slate recommendation
This paper studies the evaluation of policies that recommend an ordered set of items (e. g., a ranking) based on some context---a common scenario in web search, ads, and recommendation.
Importance Sampling Policy Evaluation with an Estimated Behavior Policy
We find that this estimator often lowers the mean squared error of off-policy evaluation compared to importance sampling with the true behavior policy or using a behavior policy that is estimated from a separate data set.