
Definition of the debate game

Here is the debate game posed by AI safety via debate1:

We will initially consider a question-answering setting … We have a set of questions $Q$, answers $A$, and debate statements $S$. The simplest version of debate has two agents competing to convince a human judge:

  1. A question $q \in Q$ is shown to both agents.
  2. The two agents state their answers $a_0, a_1 \in A$ (which may be the same).
  3. The two agents take turns making statements $s_0, s_1, \dotsc, s_{n−1} \in S$.
  4. The judge sees the debate $(q, a, s)$ and decides which agent wins.
  5. The game is zero sum: each agent maximizes their probability of winning.

