Action Centered Contextual Bandits

Deep Learning

Cite Paper

Authors

Kristjan Greenewald
Ambuj Tewari
Predrag Klasnja
Susan Murphy

Published on

11/09/2017

Categories

Deep Learning Reinforcement Learning

Contextual bandits have become popular as they offer a middle ground between very simple approaches based on multi-armed bandits and very complex approaches using the full power of reinforcement learning. They have demonstrated success in web applications and have a rich body of associated theoretical guarantees. Linear models are well understood theoretically and preferred by practitioners because they are not only easily interpretable but also simple to implement and debug. Furthermore, if the linear model is true, we get very strong performance guarantees. Unfortunately, in emerging applications in mobile health, the time-invariant linear model assumption is untenable. We provide an extension of the linear model for contextual bandits that has two parts: baseline reward and treatment effect. We allow the former to be complex but keep the latter simple. We argue that this model is plausible for mobile health applications. At the same time, it leads to algorithms with strong performance guarantees as in the linear model setting, while still allowing for complex nonlinear baseline modeling. Our theory is supported by experiments on data gathered in a recently concluded mobile health study.

Please cite our work using the BibTeX below.

@misc{greenewald2017action,
    title={Action Centered Contextual Bandits},
    author={Kristjan Greenewald and Ambuj Tewari and Predrag Klasnja and Susan Murphy},
    year={2017},
    eprint={1711.03596},
    archivePrefix={arXiv},
    primaryClass={stat.ME}
}