- Hah, since it's open in another tab: Talk Isn’t Always Cheap: Understanding Failure Modes in Multi-Agent Debate @ https://arxiv.org/html/2509.05396v2
- More tasks get "done" while rework is sky high and overall throughput to production drops.
First, I'd like to thank all the people working on testing and doing the lord's work.
Anyway, this isn't even a unique pattern to LLM use. We've all seen this exact same thing when more devs are added a project running late, teams are siloed, outsourcing to contractors, etc.
- I'm curious, LLMs have been around for a while now...
How many of you would say you need LLMs now for work? Not that you want it because it's nice to have, but rather you would literally not be able to do your job at all because you don't have an LLM to use?
If your company said "We're not paying for LLMs anymore.", would you begrudgingly pay for or host your own LLM that complies with company policies, or just go back to writing everything by hand?
I feel like companies could definitely just push the cost of LLMs back onto the engineers themselves (much like how people have to pay for their own gas to go to work), and engineers would have no choice but to either buy their own subscriptions or be very good at writing code by hand just to stay competitive.
This kind of shift is coming, partly because costs of LLMs are to unsustainable for companies, but also because it sounds like the kind of diabolical idea some upper management thinks they can get away with, as peer pressure will naturally do its thing. Paying for your own token usage is a small price to pay for job security isn't it?
- I'm an embedded systems developer. I have almost fully "outsourced" the Python code for frontend pc software that interacts with my firmware.
I deliberately continue to write all my firmware by hand, and will occasionally consult AI for review. I never use AI to write prose for me.
Python is better represented in training data, writing bench software was a bit boring, I get to spend more time where I have (and continue to build) domain knowledge.
Agentic Opus is a nice to have and I get to explore the frontier tech, but if (or when) it's taken away, a self hosted coding model would be fine - I'd just have to dust off my Python skills and it would take longer.
- People are still figuring things out, there's a lot of wasted tokens, etc.
This is like complaining a student isn't as productive as a senior engineering.
I think we as an industry haven't even graduated to junior level when it comes to figuring our how to use AI to improve things.
- This is discussed in the article, and I think the author makes pretty reasonable arguments for why by nature we will not see the reliability of LLM usage improve. They also discuss what I agree as the more effective method of using an LLM is, as a feedback and refinement tool, not a decision maker.