Downsides of debate
The previous article in the sequence is Interaction patterns for truth-seeking.
Long before reading AI safety via debate1, I looked unfavorably upon debate. I did (and still do) view structured debate as a suboptimal mechanism for truth-seeking. There are many reasons:
-
How can one get the incentives right when debates are judged to have one winner, often determined by some kind of score-keeping? I don’t see a good mapping between choosing a winner and seeking the truth. The authors allow for agents to agree, but how often does this occur? I need to think more about the incentives for agents.
-
Seeking the truth often isn’t necessarily about finding the right answer. Some questions need to be reframed. Some questions need to be thrown out altogether.
-
Seeking the truth is a process. If you think like a Bayesian, truth-seeking is a process of considering evidence to update one’s mental model. One may never complete the journey; one simply runs out of data or time – or has more pressing questions to answer.
-
How are the debate participants calibrated? Many debates seem to treat the participants as experts. This is a large motivator for the use of debate in AI safety via debate: the human judge doesn’t have enough expertise and thus needs to rely on sparring debaters. This is fine as long as there are some mechanisms to calibrate and error-correct the participants. In my view, such calibration and error-correction is at the very crux of the viability of AI safety via debate. I suspect the author(s) may agree, but in my view, they don’t state this crux as clearly as they should, nor do they openly admit just how uncertain it is!
-
I may have missed something in the paper, but I find it jarring for the human judge to wait until the end to weigh in. Why not get the person involved during the interaction?
-
For a debate to be pragmatically useful, the participating agents must serve the needs of the person. When they are uncertain about the human’s goals, they should ask. I don’t see mention of such question-asking in the paper, though I may have overlooked it.
-
Preference elicitation is hard, even theoretically! Sparring agents need to ask smart questions to understand the human. This involves substantive questions (e.g. personal preferences about potential options) as well as procedural ones (e.g. personal preferences about how much time should be alloted to investigating the decision space).
-
When I look at the linear turn-taking of the debate game, I’m struck by how particular it is. Why did this interaction pattern get privileged? Why did the authors pick it as an interaction pattern? What other patterns did they consider?
-
Did the paper have any emphasis on probabilistic decision-making? I don’t recall seeing this. Why not? It is clear that agents have differing information sources (and, perhaps, reasoning processes) which drive their conclusions. Shouldn’t each agent’s conclusions be probabilistic? If so, why throw away this information? Declaring a winner reduces probabilistic estimates to 1 bit (win/lose).
-
Does the model allow for convergence of beliefs? To some degree. The proposed model allows agents to agree. But we should be clear about what Bayesian agreement would look like: a situation where agents use mutually shared information to update their priors. Is this happening among debating agents?
I offer an alternative in Better than debate, the next article in this sequence.
Endnotes
G. Irving, P. Christiano, and D. Amodei, “AI safety via debate.” arXiv, Oct. 22, 2018. doi: 10.48550/arXiv.1805.00899.