- The title got me, I'll admit it—except that the benchmark is a game where the models are told to lie.
- Disclaimer: I work at Kradle.
They were never told to lie: one AI is given more information than the others, and the goal of the experiment is to understand how they're gonna leverage that advantage.
Indeed the selfish (optimal?) strategy is to lie, yet some decide to tell the truth anyway. That's why it's an interesting benchmark! More info in the research article: https://kradle.ai/research/four-bridges (released before Fable)
- Had to Google this to learn more. For those who are interested: https://kradle.ai/research/four-bridges
- it's unclear to me whether they were actually told to lie or just told to survive / convince others. either way it is somewhat coerced but i think there is still a difference
- The optimal/selfish strategy is indeed to lie, but they're never pushed in that direction. Some AIs decide to reveal the information, some decide to say nothing, some actively lie and push others to their death...
- I find it deeply funny and I suppose a bit expected that a Grok model appears at face value to be optimized for supposed truth telling.
And to keep the e-mob off my back, I don't endorse Elon Musk.