Definition of the debate game
published on
· updated on
Here is the debate game posed by AI safety via debate1:
We will initially consider a question-answering setting … We have a set of questions $Q$, answers $A$, and debate statements $S$. The simplest version of debate has two agents competing to convince a human judge:
- A question $q \in Q$ is shown to both agents.
- The two agents state their answers $a_0, a_1 \in A$ (which may be the same).
- The two agents take turns making statements $s_0, s_1, \dotsc, s_{n−1} \in S$.
- The judge sees the debate $(q, a, s)$ and decides which agent wins.
- The game is zero sum: each agent maximizes their probability of winning.
The next article in the sequence is Reasons to worry about AI safety via debate.
Endnotes
1
G. Irving, P. Christiano, and D. Amodei, “AI safety via debate.” arXiv, Oct. 22, 2018. doi: 10.48550/arXiv.1805.00899.