Article

Definition of the debate game

published on · updated on

Here is the debate game posed by AI safety via debate1:

We will initially consider a question-answering setting … We have a set of questions $Q$, answers $A$, and debate statements $S$. The simplest version of debate has two agents competing to convince a human judge:

  1. A question $q \in Q$ is shown to both agents.
  2. The two agents state their answers $a_0, a_1 \in A$ (which may be the same).
  3. The two agents take turns making statements $s_0, s_1, \dotsc, s_{n−1} \in S$.
  4. The judge sees the debate $(q, a, s)$ and decides which agent wins.
  5. The game is zero sum: each agent maximizes their probability of winning.

The next article in the sequence is Reasons to worry about AI safety via debate.

Endnotes

1

G. Irving, P. Christiano, and D. Amodei, “AI safety via debate.” arXiv, Oct. 22, 2018. doi: 10.48550/arXiv.1805.00899.