infomercial
contains code for Curiosity eliminates the exploration-exploitation dilemma, bioArxiv 671362v8 (2020).
In this paper we present an alternative interpretation of the classic but intractible exploration-exploitation dilemma. We prove the key to finding a tractable solution is to do an unintuitive thing--to explore without considering reward value.
The exploration-exploitation dilemma is summarized by a simple question: “Should I exploit an available reward, or explore to try out a new uncertain action?” Unfortunately, it’s been proven that this dilemma, when stated as a mathematical problem, is intractable and so can’t be solved directly. This fundamentally limits our ability to predict optimal naturalistic behavior during foraging and exploration, and to optimally drive learning in artificial agents.
To overcome this field-wide limitation, we took a fresh look at the dilemma. Our goal was simple: when one mathematical problem can’t be solved, it’s often good to find another related problem that can be and use that to make progress on both.
We show, for the first time, that nearly any dilemma problem can also be viewed as competition, between exploiting known rewards or exploring to learn for its own sake. We prove this competition can be perfectly solved using the famous win-stay, lose-switch strategy from game theory. To ensure this solution is as broad as possible we also derived a new universal theory of information value which complements--but is independent of--Shannon’s Information Theory.
- A standard anaconda install
- pytorch (>= 4.1)
pip install . -e
following cloning of this repo.
All experiments can be (re)run from the top-level Makefile, found in the infomercial
repo.
For analysis see ./notebooks/
For paper figures see ./figures/