Article

Reasons to worry about AI safety via debate

published on · updated on

The previous article in the sequence is Definition of the debate game.

From Section 5, “Reasons to worry”1 from AI safety via debate2:

We turn next to several reasons debate could fail as an approach to AI alignment. These include questions about

  • training target (whether humans are sufficient judges to align debate),
  • capability (whether debate makes agents weaker),
  • our ability to find strong play in practice using ML algorithms, and
  • theoretical and security concerns.

Reasons why debate could fail? Is that all? I’m far more pessimistic. I will go further and make this claim:

Independent of AI alignment, debate is a deeply flawed truth-seeking technique.

So, before proposing debate as a method to help with AI alignment, it seems wiser to start with a simpler goal. I will phrase it as a question:

What interactions patterns are useful for truth-seeking?

I unpack this question in Interaction patterns for truth-seeking, the next article in this sequence.

Endnotes

1

I reformatted the first two sentence from section 5 as include bullet points for clarity.

2

G. Irving, P. Christiano, and D. Amodei, “AI safety via debate.” arXiv, Oct. 22, 2018. doi: 10.48550/arXiv.1805.00899.