The unbearable cheapness of open weight models

181 points by ddxv 1 day ago | 169 comments

Tuna-Fish
> What worries me about this is that Anthropic and OpenAI seem to have backed themselves into a corner of high costs. Can they reasonably decrease their prices by 20-50x to compete with DeepSeek or Xiaomi’s Mimo?
They have high prices, not high costs. They will obviously keep prices as high as they can for as long as they can, while keeping demand up. Once demand starts to fall, so will the prices.
> Are these models cheap because they are open weight and having hundreds or people stress test running them on different hardware helped to lower the cost? Or is it that they are being provided as loss leaders to drive the prices down?
Neither. They are cheap because they have neither technical edge nor brand power to keep the prices high, and so have to ask commodity prices for them.
People somehow still don't get it, despite everyone who studies the economics of it telling them: Inference is dirt cheap. Training is expensive, inference is cheap, and getting cheaper.
- amanaplanacanal
  You can't just define training as "not a cost". Without training they have nothing to sell.
- manwithopinions
  So why are they losing so much money?
  Money is made on the subset of inference that is charged at cost + margin via their APIs. API usage is so high because customers are still finding their feet, trying to understand how to measure the value they get from their spend, erring on the side of spend.
  Yes, in a world of unmeasured value and tokenmaxxing, inference is profitable on SOTA models because all capacity is being consumed at all times, driving down marginal costs, but what about a world in which capacity isn’t constrained? There are still huge fixed costs.
  Even the most optimistic leaks with the current high prices put the margin on API token inference at around 50%. How can SOTA models ever come close to competing on price? Price always matters. Offering the best model with the most brand recognition does not exempt OpenAI from the basic rules of business.
  Historically, software has been such a successful business because the margins are incredible, 95%+ in many cases, driven by direct measurable value to customers that dwarfs the cost. A 50% margin at a time when your customers are falling over themselves to spend as much money as they can is not a good sign, it is a very bad sign, it leaves no room to ever achieve traditional technology margins, and inevitably leads to very weak margins.
  Inference needs to become an order of magnitude cheaper than the value it delivers to ever have a chance of delivering on this wildly profitable vision. The cheap model providers have a much better chance of achieving that.
  Outside of coding, almost every business case for AI doesn’t need above human intelligence, it doesn’t even need human intelligence, or half a human intelligence, a business can extract a lot of value from a machine that has a fraction of a human’s intelligence. Most human work does not use our intelligence, it is rote, a monkey could do it, and that’s where AI will be used most. Who is going to pay $10 per million tokens when they could pay $0.10 to get the same outcomes?
  Tuna-Fish
  > So why are they losing so much money?
  Mostly training. Claude didn't just get to be so good at coding by magic, it was suddenly so good because they did truly staggering amounts of RLHF and RLAIF on it. They are still doing that today, on any tasks they can figure out how to evaluate it on. This is capex for them.
  Their margins on inference are >90% today for tokens they sell (plans are hard to count, but still profitable). Based on what we know of it's size and architecture, running Opus is not more than 2x more expensive than running Deepseek v4 pro, for which tokens are available at under 10% of the cost of Opus. Again, the reason their margins are 50% is because they are spending so much on things that are not inference, not because inference is expensive.
  > The cheap model providers have a much better chance of achieving that.
  Anthropic can do it with a push of a button, once they calculate that it will provide them better profit than current pricing.
  vrighter
  you forgot that to not have a knowledge cutoff and fall behing, you need to always be training new models. It matters jack shit if inference is cheap, if you are forced to do training anyway to stay "competitive"
  tliltocatl
  Depends on the application, of course. For "google replacement" they are trying to sell it as - it's absolutely essential, but even then it works like crap. For coding… Maybe it's not so essential? Yea, it would not know about latest libraries, but maybe that's not much of a problem? Then, of course, it is a big question if LLM code generation is worth it at all.
  mikae1
  Hos much does the cutoff matter when models can JFGI?
  overfeed
  The AI companies are attempting displace Google for that sweet ad revenue. Relying on another company's search index means they can squeeze you later for a bigger slice of the pie
  CSSer
  I think people miss this because these companies exist in a space that is new in tech, and that means lots of competition through PR and marketing. When that happens, it’s easy to feel like a company is telling you about everything they’ve been working on or are openly talking about what gives them their edge when in fact the opposite is often true.
  manwithopinions
  > Their margins on inference are >90% today for tokens they sell (plans are hard to count, but still profitable).
  That doesn’t make any sense, it doesn’t add up. Have you seen how much money they’re raising and burning? We know that training does not cost tens of billions.
  Brockman said OpenAI expects to spend $50 billion on compute this year. OpenAI’s revenue run rate is less than $50 billion for this year! For 90% margins to be possible on inference, you are suggesting that less than $5 billion of that compute spend is inference and over $45 billion of that compute is training.
  Anthropic have been desperately trying to juggle capacity by shaping user behavior through peak time usage limits because they are struggling with capacity for inference.
  Plan based usage is widely acknowledged to be subsidized, you are probably the only person on earth suggesting that plans are profitable.
  winternewt
  Capacity for inference isn't a cost issue, it's an availability issue. There just isn't enough hardware out there.
  hakfoo
  > Outside of coding, almost every business case for AI doesn’t need above human intelligence, it doesn’t even need human intelligence, or half a human intelligence, a business can extract a lot of value from a machine that has a fraction of a human’s intelligence.
  What the business world actually needed isn't intelligence, it's VBA with a bit of polish on it.
  Yeah, people want tools to distill reports, and puff nonsense into bigger nonsense, but to a remarkable degree, this doesn't require an LLM. In fact, the alternatives might be preferrable by offering more consistency/repeatability and efficiency (I am so sick of watching a LLM babble for 5 minutes on something a regex would do in 5 seconds).
  The genius of the LLM industry is that it avoids "programming anxiety" by hiding it behind a friendly-ish UI and not calling it programming. It's another in the string of innovations like "hide the file system" and "removing user programmability" so we can sell it back to you.
  dada216
  I have deployed very successful LLM based software that reads sales people emails and inserts orders, or stuff akin to orders, in the rest of the systems. Can you write me a regex that parses a messy human email thread and produces a clean JSON with all the order details? It's been working for a year with less than 3% error rate, better than what the humans themselves were doing.
  chairmansteve
  Interesting. A web form completed by the sales people would not have sufficed? I guess it's the backend interfaces to multiple systems that's the difficult part?
  Onavo
  They are overly bloated organizations too. The human costs are tremendously high. Good luck finding ML engineers paid 7-8 figs USD in Asia. Same quality engineers, different market.
- Schiendelman
  I get it! And I appreciate people like you pointing out the business side of LLMs.
  Also, these open weight models are significantly lower quality than the high end coding models, and for some reason a lot of people think they're exactly the same. Maybe engineers who only dabble in LLM usage aren't doing enough complex work to notice...?
- uejfiweun
  What are you even talking about? Everyone knows that Anthropic is drastically subsidizing their plans. It's actually the exact opposite of what you're talking about. The costs are extremely high and the prices are actually what's being subsidized and cheap right now.
  Tuna-Fish
  This is an example of common knowledge that is wrong. People look at their cash burn, assume that they spend this to subsidize inference, and get bonkers answers. Inference is not their largest expense.
  Inference is cheap. Anthropic is only drastically subsidizing their plans if you count their training expenses as part of their costs.
  dathanb82
  If "inference is cheap," why is OpenAI spending a ton getting Broadcom to design custom AI chips that make inference cheaper? Reports suggest their custom silicon isn't all that good for training, it's all to make inference more efficient. That shouldn't be necessary if inference is already quite cheap.
  petra
  A large part of the market will be ad based. For that, having the lowest cost inference is useful.
  Also for agent doing r&d, cheaper tokens allows doing more, which is always good.
  therobots927
  Are you an anthropic insider or something? Because if you are you should delete this comment. If you aren’t then you don’t know what the hell you’re talking about.
  rcxdude
  For one point, you can look at the costs of similarly sized open-source models from inference providers (which are only making money on the markup on the compute), and compare with anthropic's prices. There's a pretty big price difference there and it would be hard to believe that anthropic's models are that much more expensive to run than those models.
  xquce
  Surely the same can be said for the people saying the opposite?
  therobots927
  I didn’t make a claim. The parent explicitly said it was a misconception that inference is not profitable.
  No one knows if it’s profitable or not so we’re left to speculate.
  Bnjoroge
  You can make a fairly decent assumption by calculating the margin on serving glm 5.2, and adding say 30% extra costs and it still leaves a healthy margin
  therobots927
  Where are you getting 30% from
  Bnjoroge
  It was a rough heuristic for how much more opus/5.5 would presumably cost if you extrapolate from glm5.2 prices. In any case input tokens for 5.4/4.6 are 70-100% more expensive, cached about 3-100%, and output tokens anywhere from 60%-240% more as per all their current api pricing. I highly doubt 5.4/4.6 are that much more expensive to serve given how cheap and commoditized inference has become, and how comparable they are perfomance wise.
  Bnjoroge
  I dont think that’s accurate. I mean look at how much more expensive frontier closed source models are vs something like glm 5.2 which is just about as good. Serving glm is really cheap, and high margin. Obviously no one knows, just how much their inference costs, but if we assume that opus/gpt are maybe 15-20% more parameters than glm 5.2, then it makes no sense for them to charge almost much much more than glm
Jackobrien
The giants knew this was coming, and soon 95% of AI tasks will be able to be done by open models (coding, research, cowork style work). So why pay a premium? Why use them at all? This leaves the labs with two options:
1) push the frontier in a way only massive scale can, and cash in on it (mythos level cyber security, recursive training, frontier science work). There’s big money for never before possible capabilities.
2) own the app layer with their edge in reputation and powered by their infrastructure. Be apple where everyone else is Linux. Do design, coding, research, SMBs, legal, finance, healthcare and more (they are doing all of this).
Will it be enough to justify a Google level valuation? We’ll see how fast they can push it.
- fredley
  3) Buy all the RAM, increasing the barrier to entry to push back the tide a bit, in time for a juicy IPO.
  clickety_clack
  4) Make it illegal to use anything but regulated models.
  rectang
  License the training corpus and encourage copyright suits against outputs from models trained on unlicensed corpora.
  amanaplanacanal
  This won't work if the courts decide that training is fair use, which certainly seems the direction they are going.
  rectang
  Output is a separate issue from training. Courts will never decide that a identical copy spit out by an LLM is non-infringing simply because it went through an LLM stage. Copyright laundering is wishful thinking by tech folks.
  dada216
  Then they will leave the huge advantage in cost to the competition, I mean their customers competitors. Hard to fathom how US companies will not want to use the cheaper option when EU and Asian companies can.
  vlian2088
  pretty much what altman and amodei mean when they say 'safety'.
  stackedinserter
  Why illegal, just pass these 3000 pages FAA-level certification, export controls and KYC. We're free country, after all!
  forshaper
  a: If making it illegal fails, make it a Federal procurement requirement to use regulated models. Come up with an audit standard that only fits regulated models. Watch the preference trickle down.
  samuelknight
  Buying all the RAM can't work forever. Scarcity increases prices, high prices increase supply, improves RAM R&D budgets, and forces users to find ways to economize around low RAM availability.
  OkayPhysicist
  It doesn't need to work forever. You just need to delay your competitors long enough that you can IPO to great fanfare, and then leave retail investors holding the bag. Founders and big investors get to cash out, everyone else gets screwed.
  thrwaway55
  I doubt that works today. Look at SpaceX the fanfare lasted 3 days before most of the insiders could offload to the retail bag holders. That AI company had the benefit of being attached to the largest technical moat.
  The existing AI companies can't even prevent their moat from being distilled by the Chinese token reselling industry.
  picofarad
  This is what it feels like they went with.
- CuriouslyC
  #1 isn't going to happen because we're actually data limited, not compute limited. You can throw all the compute in the world at bad data and it won't make a difference, but an undertrained model with perfect training data will absolutely slay.
  #2 isn't going to happen, because these labs have shown they have limited app/design sense, and they also lack the industry connections and domain wisdom to execute.
  The way things are actually going to go is that these labs will set up partnerships with huge biotech/engineering/etc firms, and do custom training/inference on specific tasks that promise to be wildly profitable with them, then take royalties on the creation in perpetuity. Why sell inference when you can partner with Pfizer to make a version of Ozempic that also makes people freaky jacked, or partner with Bectel to make a radically safer, more efficient Nuclear power plant?
  Schiendelman
  I don't think "data limited" is true anymore outside of very specialized cases (for instance: https://arxiv.org/abs/2510.01631). As weird as it sounds, training improves a lot with synthetic data.
  You do need business development to create those relationships. Saying they "have limited ___" mostly means they "haven't yet hired people who are good at ___". That's been changing already; the Claude app is steadily improving and handling more use cases simply through understanding which tools to use, Anthropic is building more relationships to create more tools, and all the frontier model companies are building relationships with companies that have specialized data and want specialized solutions.
  I think we're also seeing the frontier model companies offer partners their own ability to run RL on their own data, and then retrain new models on the same data. That's going to make those relationships VERY sticky in ways that won't be obvious from the outside.
  nyrikki
  Can you point me to the parts in that paper that meet those claims, I am reading something different and want to know what I am missing.
  This study seems to show that there are places where synthetic data, especially related to common crawl.
  > Pure synthetic data remains non-advantageous over CC; notably, models trained on pure rephrased synthetic data will underperform those trained on CC at larger models.
  But the tradeoffs seem to be different at large scale.
  > Overall, these model scaling results suggest synthetic data appears comparably less favorable for pre-training larger LMs relative to its utility in data scaling scenarios. Despite outperforming training on CC, larger models are not as tolerant to a higher ratio synthetic data as larger data budgets. This observation aligns with practices where synthetic data is effective for smaller LMs or specific pre-training phases, but less predominantly used for the largest models.
  How I am reading it is there are places where it is useful:
  > Notably, any mixture involving synthetic data, or pure synthetic data (except pure QA), is projected to achieve a lower irreducible loss than training only on CommonCrawl.
  But it also seems that on textbook scale synthetic data, they did show model collapse vs rephrased data.
  > These results contribute mixed evidence on “model collapse" during large-scale single-round (n=1) model training on synthetic data–training on rephrased synthetic data shows no degradation in performance in foreseeable scales whereas training on mixtures of textbook-style pure-generated synthetic data shows patterns predicted by “model collapse".
  IMHO there are some very specific areas where we aren't "data limited", like math, but as your reference states "Our work demystifies synthetic data in pre-training, validates its conditional benefits, and offers practical guidance."
  Note the cost of 30% of the total dataset being synthetic, where the model starts amplifying the generator's biases, leading to a permanent degradation in downstream zero-shot capability on unseen out-of-domain natural tasks.
  My takeaway is there is nuance where synthetic data is an amplifier and where it is a problem, and in my mind that paper demonstrates it will not solve the data problem in general.
  nomel
  > we're actually data limited
  Correction: public text data limited.
  There's a ridiculous amount of proprietary text and non-text data out there that much of society is run on.
  dominotw
  what is 'bad data' and 'perfect data' according to you?
  CuriouslyC
  Worst possible bad data is where the data is orthogonal to the task, so increasing the data never provides information on the task. Perfect data is where the data exactly encapsulates the task being trained.
- AnthonyMouse
  > Be apple where everyone else is Linux.
  Apple and Linux barely even compete in the same markets. Linux runs on the servers and embedded devices, Apple on the smartphones. Android is technically Linux but not in the "is a good analogy for open weight models" sense because Android is so deeply under the thumb of Google. The main place Linux and Apple actually compete is for PCs and laptops, and that's the market where the thing with 65% market share is Microsoft.
  Gud
  Apple tried to make servers(they were awesome btw) but lost to Linux.
  Linux are on more phones than iOS.
  pseudosaid
  youre missing the point entirely and opted to entertain your own framework
  AnthonyMouse
  It's meaningless to suggest doing what Apple does when faced with Linux when the vast majority of Apple's business isn't competing with Linux. The majority of Apple's revenue is from hardware when Linux is software -- that can run on Apple's hardware.
- ed_elliott_asc
  Won’t all they need to do is say “best in class, latest models, fastest” and wine and dine a few execs and those enterprise deals will be signed?
  In this case the people tasked with using the product won’t actually mind.
  NitpickLawyer
  No one is getting fired for using SotA.
  saltcured
  Well, getting laid off during the bankruptcy spiral is a form of firing.
  But that is months away, so not my problem?
  spwa4
  If the price difference is 2x? Sure.
  If the price difference is 50x? No way.
  RobotToaster
  Tell that to Oracle
  brainwad
  So long as the benefit:cost ratio is still sufficiently high, I don't think anyone gets fired for not scrimping. Better to encourage positive EV behaviour by your employees than to scare them away by firing them for not being perfectly optimal.
  ThunderSizzle
  The CEO won't get in trouble, but the employee who can't justify a bad result/prompt?
  dualvariable
  Laughs in 2005-era VMWare and EMC...
  watwut
  Accenture says "yeah totally CEOs will pay a lot for literal nothing"
  actionfromafar
  Yes, exactly that. Be Azure and Office 365 and Sharepoint and AWS where everyone else is Debian Stable on a USB thumbdrive.
  fragmede
  Office 365? Ew, Google docs, please.
- christkv
  You forgot
  3. Try to get the government to "certify models" to cause regulatory capture which is what both Anthropic and OpenAI has been pushing. No certification no use in business.
- orwin
  Mythos was outperformed by small, specific local models in multiple oss project.
  RugnirViking
  i'd love to hear about this! do you have examples?
  kyleomalley
  It might be kind of overlooked when people read about the big scary results from mythos; the real breakthrough was probably just as much the application of the (very decent) model through a well engineered wrapper (harness). Other models including codex or glm result in significant findings as well.
  Harness example: https://github.com/evilsocket/audit
  orwin
  https://aisle.com/blog/aisle-discovers-6-new-cves-in-curl-in...
- sofixa
  > own the app layer with their edge in reputation and powered by their infrastructure. Be apple where everyone else is Linux. Do design, coding, research, SMBs, legal, finance, healthcare and more (they are doing all of this).
  The problem with this is that there are incumbents in all those spaces doing their own AI agents / platforms, and they're the ones choosing the models they use internally and they sell to their own customers. The margins and the possibility to fine tunie using open weight models, as well as the guarantee they'll keep running at predictable costs (no US orders yanking access), make them a very appealing option.
  And if you're a company that needs an AI powered legal software, would you buy it from OpenAI/Anthropic, or from someone who you've already bought legal software from before and has the domain knowledge?
- ForHackernews
  Google already owns the app layer, and hardware, and they are a frontier-level AI research firm.
  I don't see how Anthropic or OpenAI survives being eaten by DeepSeek et al from the bottom of the stack and Google from the top.
  dubbie99
  The only reason people use google apps is because they are cheap and reliable. The user experience is awful. Have you ever tried to find a document you had open yesterday in drive?
  nickthegreek
  You just got to https://drive.google.com/drive/u/0/recent
  hobo_mark
  Uh? Recently and frequently opened documents always show up on the first screen as soon as I open the app or website.
  PunchyHamster
  I used their enterprise chat the other week coz one of the clients used it
  It is truly amazing how bad it is. Made me miss using MS Teams. No software should make anyone miss using MS Teams
  dualvariable
  Anthropic is at least renting their datacenters, not owning, so all the capital accounting bullshit is getting laundered by someone else, who will wind up holding that bag.
  And Anthropic is currently cornering the enterprise coding market, and they were smart to avoid video. Under current economic conditions they're a lot closer to being profitable than anyone else, and they can take advantage of crashing prices for compute if we hit a datacenter-buildout-glut.
arthurofbabylon
Let's imagine that Anthropic/OpenAI fail to manufacture scarcity by villainizing Open Weight models (a sincere probability). What is left for these corporations to prop up their prices, or any margin at all? I expect scaffolding around tool use, supporting bespoke implementation and driving risk down for institutional adoption. (They might even build an insurance tool to protect accountants/lawyers from errors in compounded probabilism!)
A question for economists... It seems plainly clear to me that information and information processing is commodifying (for the first time in human history?). Without the age-old bottlenecks at the top of the value chain, capital will surely flow downwards, right?
- AnthonyMouse
  > It seems plainly clear to me that information and information processing is commodifying (for the first time in human history?). Without the age-old bottlenecks at the top of the value chain, capital will surely flow downwards, right?
  Isn't this the thing people have said about every new technology since the printing press? And it has been mostly true, but it has also been the case that the incumbents have fought hard to lock things back up again. Newspapers and radio stations buy each other up, the open web gets locked inside Facebook (which, 30 years ago, people were already worried about with AOL), people have computers in their pockets they can't run their own programs on anymore.
  Interests are going to want to lock the new information thing behind a gate so they can charge a toll and censor what they don't like, same as it ever was. You don't win by default, you have to fight to stop them.
  arthurofbabylon
  I don’t think that comparing LLM’s to the printing press (and radio, film, TV, etc) is an apt analogy, and I don’t think that people have said the same things about the two technologies; the prior technological changes in information dealt with distribution, while this one deals with processing and production.
  Recall the notion of a bottleneck, and this distinction will become clear. Those prior technological changes never inverted a bottleneck, and this one does.
  AnthonyMouse
  > the prior technological changes in information dealt with distribution, while this one deals with processing and production.
  Computers and the internet did a lot to make production easier in addition to distribution. Anyone today can use a photo editor to superimpose text over an image in any font in seconds like it's child's play. That used to require knowledge of calligraphy. Film production used to require very expensive equipment that everyone now has built into their phone.
  > Those prior technological changes never inverted a bottleneck, and this one does.
  Before the printing press, copying books had to be done by hand. If you wanted a million copies of something made you had to be the church or a government. Today there are independent pundits who get a million impressions on their shitposts, and that's with consolidated platforms being largely against them.
  We still have an entire edifice (copyright) which is structured around copying requiring a sufficiently centralized apparatus to serve as a useful chokepoint for imposing restrictions and collecting royalties, which is correspondingly under increasing distress as
- ddxv
  OpenAI, though they seem to backtrack it lately, have been slowly pushing forward of their launch of ads which would be a supplemental way to support cheaper use of their models. This is currently not as great a fit as the modern day banner ads, but it will be interesting to see where they go with that.
linzhangrun
It would not be surprising if GPT and Claude get cheaper too as inference gets cheaper. Two years ago, o1 was the strongest model and cost much more than Fable, while being nowhere near as smart as a Qwen 3.6 35B that you can now run on a DGX Spark without much trouble.
- an0malous
  > It would not be surprising if GPT and Claude get cheaper too as inference gets cheaper
  No because the biggest factor in their current price is VC subsidization which has likely peaked if OpenAI is now serving ads and Anthropic has increased their API pricing
- ddxv
  True, outside of the dark tactics I imagined in the article, they will have to compete at lower costs. It's just that the current iteration does not feel cost competitive yet.
- tsss
  Probably they will, unless Claude and GPT become luxury brands like Gucci. Currently it makes no sense for them to invest into efficiency. They need to put everything into competing for the top spot as long as they still have a shot.
beepdyboop
I don’t get it. So many here are saying open weight models will kill the frontier labs. But open source and similar have tried to beat private companies everywhere all the time, and people still buy the best products even if great open source alternatives are available. Why wouldn’t this be the case for AI too?
- drudolph914
  I feel like this comment is just engagement farming, but I'll bite anyways
  there is a larger appetite for something like open source AI mostly b/c of price. we all know these labs have not figured out their pricing model, and we're all holding our breath out of fear of what the prices could be.
  also, if you consider that the only toll to knowledge work before was personal time, and now you need to pay $100s month just to keep up with the baseline speed. it makes sense people are looking for something that gets them back to a workflow where the price to do work is near $0.00.
  I think for a smaller group though, it's more to do with a certain combination of principles. Some people don't want censorship, other's want ownership, some want the knowledge of working on LLMs to not be gate kept.
  dominotw
  ppl buy iphones over cheap android phones . android phones can do everything that an iphone does ( and better)
  05
  Which cheap (or expensive) phone does better log/raw video than iPhone 17 pro max?
- an0malous
  The closest example I can think of is using a proprietary hosted database versus a self hosted open source option, like Oracle vs Postgres. OpenAI and Anthropic are each individually privately valued over $1T and Oracle is currently valued at half that. They’re not worthless, but they’re severely overvalued.
  aldonius
  Oracle's not a good comparison though, because it's getting valued as an AI compute provider as well as for its traditional business. At the end of 2021 Oracle's stock was about $100 (all-time-high in nominal terms at the time), its current ATH was October last year at $292.
- tuatoru
  Yes. Many industries are zero-sum-ish in nature,have winner-take-all dynamics or reputational costs for cheaping out. Financial trading. Big law. Military. National Security. Big insurance. Management consulting. Advertising.
  For others even a small edge can be important. Pharma and Biochem research. Research in general. Any industry where there are major reputational risks.
  It may not make sense to use the most expensive model to replace your payroll clerk, but there are plenty of use cases for the best available.
arikrahman
With cache hit rates being effectively free, harnesses like Reasonix have let me do a month of work for less than 2 dollars. It's not even the subsidies making it cheap, American providers like Digital Ocean or Cloudflare host the same model with similar pricing.
- Scaevolus
  Cloudflare's Deepseek V4 Pro prices are 4x more than Deepseek's for input and output tokens, and 100x more for cached input tokens, which is crucial for the tool uses of agents which cause multi-turn conversations.
  arikrahman
  Cache hit is less than a cent with Deepseek Flash and 3 cents with Cloudflare, it's free vs almost free. Where are you finding the statistics on Deepseek Pro? I don't see Cloudflare as a provider on openrouter for Pro, only flash.
- pjc50
  How does caching help here? How much repetition is there in queries?
  jcparkyn
  Agent loops (particularly coding agents) have a huge amount of repetition, because the entire context is included in every model request. So long as it's at the start of the input and doesn't change, it will be able to hit the KV cache (assuming the model provider actually has the prefix in cache).
  This only works because prompt caching is done by matching prefixes, not the entire input.
  arikrahman
  Their blob explains it best, although this is a link to an older version design: https://github.com/esengine/DeepSeek-Reasonix/blob/v1/docs/A...
  From my understanding if previous tokens are frozen and guaranteed to be immutable you can leverage that.
  AnthonyMouse
  It probably depends on what you're doing, but imagine you're something in the shape of a search engine. How many user queries are unique vs. the same thing someone else searched for an hour ago?
  crazylogger
  In a typical agent loop your N-th LLM request naturally becomes prefix for the (N+1)-th request. As the thread grows longer, cache hit rate converges to 100% and unit pricing for cached tokens is 10-100x cheaper.
- ForHackernews
  I think this is very likely and something that everyone seems to be missing when valuing these AI firms. AI is not the new industrial revolution, it's the new cloud VM: a very useful commodity software offering.
  antonvs
  The parallels to the Industrial Revolution are so close that we even have a new generation of Luddites. (Not saying they don’t have some valid points; so did the original group.)
  The reason it’s like the Industrial Revolution is simply that there’s no question it’s going to completely transform jobs. It can make a very similar difference to the difference between a craftsman and a factory worker. The latter is massively more productive.
drillsteps5
Open weights models are cheap in the context of the article (when you run inference in the cloud) because they are free. When I pay for inference for running DeepSeek open weights model I only pay the inference service provider for compute/memory/storage/network throughput. The model itself is free, the developer isn't getting a dime.
Developing these things is NOT free, there's a lot of labor, hardware, compute/memory/storage/network that goes into that. Who's paying for all this? Chinese govt? Developers themselves? What's the revenue model here?
I absolutely LOVE ability to either run them locally or access inference providers on the cheap, but having a hard time understanding the financial side of this.
odie5533
This is what concerns me about how AI giants are planning to make money. Their product has already been commoditized at prices which for them are still subsidized to grab market share. Unless the giants invent a technological leap, their prices are going to be dragged down by open weight models and I don't see how they'll turn a profit.
- Jimega36
  Reach AGI to leapfrog whoever is behind. Burn everything to get there faster.
  holtkam2
  If you used a time machine to go back to 2021 and showed someone the best open source LLMs from 2026 they would surely say “yeah that’s AGI”
  odie5533
  If Anthropic announced AGI tomorrow, how much better would that model be than Fable 5? It's looking like the road to AGI is gradual and moat-less. Models seem capable of improving other models, and even without illegal distillations many are nipping at the heels of Anthropic.
  InsideOutSanta
  Yeah, I think we're learning that we overestimated the relevance of recursive self-improvement in a singularity/intelligence takeoff scenario. We thought that once an AI could start improving itself, it would cause an exponential, self-reinforcing intelligence explosion.
  Turns out that scaling up compute is much more important and also limits the upper end of intelligence.
  AnthonyMouse
  The bigger mistake is assuming it would be better at everything all at once.
  Suppose it can do 80% of what the 20th percentile human can do. That's a huge advance and very useful, but it means there are still things it's not very good at. If any of those things is (or becomes) a bottleneck, you're not getting the hockey stick graph.
  ForHackernews
  What is an "illegal" distillation? Terms of service are not laws, and clearly copyright laws are no barriers to developing AI models.
  dogwalker5000
  If AGI = Data from Star Trek, it would be a huge leap. Frankly, anything less I wouldn’t consider as AGI.
  IncreasePosts
  Why would the creator of AGI sell it to anyone, when they could keep it to themselves and corner dozens of markets?
  jorisw
  'Reach AGI', the same way SpaceX will put data centers in orbit. A pipe dream.
  ben_w
  I'm currently writing a blog post about data centres in orbit, and my current conclusion is that even though they can build one, they definitely can't put 1 million up there and would have better things to do if they could.
  AGI? Too loosely defined. They lack a lot of competences which humans recognise when we see them but find it hard to put into words; on the other hand what they can do they already do faster than any human (and have greater breadth than any single human, but this usually doesn't matter because "coder" and "economist" and "translator" gets solved in human teams by hiring three people).
  I do not think current ML has the tools to solve for quality. But we know it's possible for a really mediocre intelligence to make human level intelligence, because evolution made us, so for me the question of AGI is more a practical one: is it affordable?
  (I also think not at the present time, but that's an "I think" not "I am analyzing it carefully").
  trick-or-treat
  Maybe you missed the part where starlink / orbiting datacenters don't really have to even make money as long as they partially fund rocket launch tests.
  Or maybe you don't take Elon seriously when he talks about Mars.
  ben_w
  > Maybe you missed the part where starlink / orbiting datacenters don't really have to even make money as long as they partially fund rocket launch tests.
  I am only dismissing the orbital data centres, I do see a future for Starlink. One with competition, but a future nonetheless.
  I'm old enough to remember the dot.com bubble and "we lose money on each unit and make up for it in scale":
  If they don't make sense, they don't help. Putting a single one in space, or even a handful, is physically possible! But even optimistic Alphabet researchers (and Alphabet owns more of SpaceX than the entire IPO) say this only makes sense at $200/kg, while early Starship launch costs while they sort out reusability be at best $400/kg and the researchers don't expect $200/kg until the mid-2030s even with a high launch rate:
  If the learning rate is sustained—which would require∼180 Starship launches/year—launch prices could fall to <$200/kg by∼2035
  - section 2.4, https://arxiv.org/abs/2511.19468
  At $200/kg, and using the payload estimates elsewhere in the paper (the learning rate is based on mass rather than launch count), they'd need to launch 370,000 tons (4.4 ibid); even at the "good enough" cost, $200/kg, they'd need to spend $200/kg * 3.7e8 kg = $7.4e10. That's a hell of an R&D spend for the next 10 years of a company whose lifetime revenue (not profit) is reportedly $4.6e10.
  My current draft has a few thousand words of additional problems, plus a bunch of things which I mention only to say why they are not, and some more where I say the research has yet to be done.
  > Or maybe you don't take Elon seriously when he talks about Mars.
  Used to, not any more. Has been too slow with Starship even before the fact that iteration with hardware is necessarily slowed down by a 2-year gap between launch windows.
  There's not even been any news about demonstration models of either Mars-rated or Starship-rated Sabatier processors, which would be an easy win and also win points for both environmentalism and energy independence viz. Iran/Hormuz.
  trick-or-treat
  Ok so you're ignoring the entire thing. Sigh.
  ben_w
  On the contrary: I've paid a lot of attention, causing me to look at it closely and determine it is a terrible idea worthy of an illustrated 5,000 word blog post explaining exactly how terrible.
  If you build the DC satellites as currently specified, you're strictly better off not launching them. That's how bad the idea is.
  NitpickLawyer
  > will put data centers in orbit. A pipe dream.
  Cheap access to space was once a pipe dream.
  Reusable boosters were once a pipe dream.
  A new player beating Boeing to the ISS was once a pipe dream.
  LEO constellations were once a pipe dream.
  Launching thousands of satellites was once a pipe dream.
  You should know that a) they are already running "AI" chips on their current sats. and b) they are already producing kW of power on orbit and have ~10k sats on orbit. You can watch Scott Manley's video on it, where he does some rough calculations and explains the overall architecture. There is nothing stopping them to do this, from an engineering perspective. If it makes commercial sense, that's another question, but 5-10-20 years in the future things might change there as well.
  InsideOutSanta
  I don't think people's argument is that it's impossible to put data centers into space. The argument is that the downsides (radiation, cooling, maintenance, power) are so severe that it is pointless to do it at scale.
  NitpickLawyer
  Go back to the megathreads when this came up. Even here on HN. Plenty of people used the argument that it can't be done, for various reasons.
  And my point was that at one point or the other there were many "downsides" for all the tech that SpaceX already has. Reusable boosters were seen as "uneconomical" and "pointless unless they can fly 10 times" by industry experts. They're now flying 30+times a booster.
  LEO constellations were similarly "full of downsides" plus "all the companies that tried it went bankrupt in the 90s", so "it's pointless". And so on.
  InsideOutSanta
  Reusable boosters have clear upsides, though.
  Pretty much everything about data centers in space is worse than having them on Earth. Apart from niche use cases, the only reason you'd talk about data centers in space is if you had a company with rocket ships and needed a story to tie your rocket ships to the current AI craze.
  grebc
  And you had a lot of stock to sell to bagholders.
  dogwalker5000
  Yet spacex is losing money … only StarLink is profitable.
  ben_w
  > You can watch Scott Manley's video on it, where he does some rough calculations and explains the overall architecture.
  I'm currently writing a blog post, and there's one big thing everyone, including Scott Manley, missed.
  Once I realised it, I wondered what took me so long to spot this issue.
  jgord
  care to share the one glaring obstacle ?
  slightly related .. I saw a talk on DCs in space, and it said median Earth orbit had a latency of 500ms .. but back of envelope seems to be : 15,000km above Earth would have around 100ms latency, comparable to internet ping times.
  Not an expert, feel free to weigh in.
  ben_w
  > care to share the one glaring obstacle ?
  I'm still working on the blog, but as a quickie: it's the lesson of the Datasaurus dozen, that sometimes you need to look at the actual distribution rather than statistics.
  Here's what the safety exclusion zone around a million of them in orbit looks like, if arranged something like the current plan: https://raw.githubusercontent.com/BenWheatley/blog/refs/head...
  There's no (safe) gaps. Plenty of physical space, but the safety margin eats it all up. Nothing else is allowed to use those orbital shells or anything between them.
  Also, this is what happens if you put them all in a single orbit at the same altitude:
  https://raw.githubusercontent.com/BenWheatley/blog/refs/head...
  > slightly related .. I saw a talk on DCs in space, and it said median Earth orbit had a latency of 500ms .. but back of envelope seems to be : 15,000km above Earth would have around 100ms latency, comparable to internet ping times.
  500ms means ~150,000 km travel distance; for that distance as round-trip time from origin to destination and back again means the one-way distance is 75,000 km, so if it's via a single satellite bounce then the average distance to the satellite would be 37,500 km: [You]-37.5Mm-[Satellite]-37.5Mm-[Them]-37.5Mm-[Satellite]-37.5Mm-[You].
  I think they must be assuming all comms are via geostationary satellites. In some talks, this is what the speaker actually meant, though they may not have been clear about it; other times, there's talks from people who copied the former but perhaps didn't understand.
  For DCs in space, even in GEO, it would be half the distance because you're communicating with the satellite itself not with someone else somewhere else on the ground.
  amanaplanacanal
  My gut says another obstacle is maintenance. How long can a datacenter on the ground run without maintenance? How will this be affordable in orbit?
  ben_w
  People already talk about that, so I wouldn't be adding much new. That said, had already put in a bit about cost of launching.
  TL;DR: Alphabet researchers (and Alphabet owns more of SpaceX than the entire IPO so if anything they're biased to optimism), recon it will take SpaceX launching about 370,000 tons to orbit before they've even figured out how to get the costs down to the point it makes sense to put these in orbit.
  general1465
  Microsoft tried to put datacenters into ocean [1] and then shelved the idea, because even that you have lower amount of failures, you still have failures and somebody has to go there and fix them. Which turns out to be problem.
  And in ocean you don't have to solve for radiation nor cooling.
  [1] https://www.tomshardware.com/desktops/servers/microsoft-shel...
  IncreasePosts
  If just Elon was taking about data centers in space, you could take it with a grain of salt. But there are other serious players talking about it like Google and blue origin that it should be pretty clear it can't just be dismissed with "you didn't think about cooling!"
  NitpickLawyer
  Yeah, and there's already been tech demonstrators for this. Starcloud-1 launched in '25 (on a F9) and demoed a CotS H100 in a ~60kg bus w/ 1kW of power. They ran inference on a "gemini" model (probably something small) and trained a GPT2 version LLM as a tech demonstrator.
  ForHackernews
  Google also wanted to deliver internet from balloons and put everyone's real name on their YouTube comments. Not all their ideas are winners.
  chpatrick
  I think it's such a vague term. If you showed someone in 2010 what we have now they would say it's science fiction.
Zak
One issue I keep seeing with cost comparisons is that they compare API rates while a substantial fraction of users are on subscription plans.
It's more expensive to use GLM 5.2 paying z.ai or Opencode Zen API rates than it is to use Opus on a subscription plan. Both of those providers offer subscriptions priced favorably relative to their API rates, but only in what are effectively trial sizes.
- cherryteastain
  Enterprise plans don't have the equivalent of the subsidized-usage-included Claude Max/ChatGPT Pro plans anymore. The revenue generated and total amount of tokens used by individuals is probably a tiny fraction of tokens billed at API pricing.
- 1matin
  And that means either:
  1. They overprice their APIs to make their subscriptions look reasonable
  2. They burn money with their subscriptions
  Zak
  Could be a little of each, plus a third option: subscription users don't always consume their entire quota.
anax32
Open weight and local hosting is far, far cheaper. In every respect. Even support is cheaper, over time.
However, it's difficult to sell this to businesses who want contracts and KPIs, not staff and commitments.
Regulated industries will favour the closed sources, either by choice or mandate. The interesting question is whether they will have better models, or worse models. History says they will receive a worse service, but continue anyway.
- tuatoru
  Cheaper until you factor in security and liability, which are going to get increasingly salient over time.
- general1465
  > Regulated industries will favour the closed sources, either by choice or mandate
  Until your country will appear on naughty list of US administration because your local politician did something what mildly inconvenienced US oligarch
bmnbmnbmn
One of the purposes of open weight models is to create a moat. If there were no open models available, I think we'd see much more and better models coming from Europe by now. Right now, any startup wanting to build and sell a model needs to be substantially better than the open models, which has become increasingly difficult and expensive.
- tuatoru
  Europe has Mistral.
  You and readers may be interested in Europe 2031
  1. https://europe2031.ai/
MoonWalk
I'd appreciate an explanation of what "open weight model" means. Is it a "weight model" that is open, or a model with open weights (so should be "open-weight model"), or is it weights that can be applied to a model?
Are weights separable from a model? And if not, what is the point of saying "open-weight model" instead of just "open model?"
To the newcomer, it's hard to determine what the components of an AI system are from the throwing-around of these terms.
- drillsteps5
  As another commenter said a "model" is a file (or group of files, there's multiple formats available; GGUF format is all in one file for example). You download it to the hardware of your choice (ie your own desktop with NVIDIA GPU). You run the inference engine (llama-cpp, ollama,lm studio etc) and tell it where the downloaded model is and it runs inference (so you can start chatting with it, or run agents).
  "Open weights model" means the developer made the model available for everyone for free. You can download it from huggingface.co for example and do whatever you want with it.
  Why "open weights" and not "open source"? Because the "source code" for LLM would include things like training data, training methodologies and tools, so that you can do the training and produce the model (files) yourself. That would be like compiling from source code. Which is not done with these models, it's company's know-how, they only share the end result.
  It's more analogous to "freeware" which is what we traditionally call freely distributed binary executable files. But people started calling them "open weights" instead and the term stuck.
- philipkglass
  A completely open model is one like the Allen Institute's Olmo model series:
  https://allenai.org/olmo
  The trained weights are open, the training software is open, and the data that goes into training the model is open.
  Not many models are fully open.
  An open weights model is one that has freely available trained weights, and maybe fine-tuning tools, but it lacks the original training data (and usually lacks the training software). These are the most commonly used local models, like Google's Gemma series, Meta's Llama, or Alibaba's Qwen.
  MoonWalk
  So you can apply different weights to those "non-open" models?
  Also, I've read a bunch of descriptions of AI components, but none of them has said what the weights are applied to in the model. I guess that every model contains a dictionary of words and phrases, and the weights map relationships between them?
  All the descriptions simply talk about weights being applied to "input," but neglect to say what that input is compared to. If a user submits a query, are the words in the query weighed against the words in the model?
  Can you recommend a primer on this whole process?
  philipkglass
  The weights are just numbers. I don't what technical background you have in other areas of computing, but I think that this is a good, short introduction that doesn't assume too much:
  https://www.3blue1brown.com/lessons/mini-llm/
  To quote part of it, Training a model can be thought of as tuning the dials on a really big machine. The way that a language model behaves is entirely determined by these many different continuous values, usually called parameters or weights.
  Longer and slightly more technical, "Intro to Large Language Models" by Andrej Karpathy:
  https://www.youtube.com/watch?v=zjkBMFhNj_g
- adrian_b
  An "open weights" model is one where you can download all the data and the code that you need to run inference with that model on your own hardware (typically from Huggingface.co).
  That data includes not only the "weights" but also various files with required information, e.g. the tokenizer, the chat template, files that describe the structure of the "weights", e.g. number of layers, the number of "experts", routing information, etc. All this information may be distributed in many files (e.g. *.safetensors files with weights, *.json files etc.) or it may be aggregated in a single container file (with the .gguf extension).
  You can see an example of the files included in a very simple open weights LLM here:
  https://huggingface.co/google/gemma-4-12B-it/tree/main
  Bigger LLMs have much more files, especially much more *.safetensors files, which contain the "weights". The "weights", i.e. matrices of numbers that are used in the computational algorithm that generates the output tokens, constitute the bulk of the data needed to run a model, i.e. from a few gigabytes to a couple of terabytes, which is why the term "open-weights" is used, but in fact by this term it is understood that all data needed for running inference is open.
  For an open weights LLM, you do not have access to the data set used for training the model or to the algorithms that have been used during the training of that model.
  You can still do some fine-tuning of the model, using your own training methods and your own additional training data. To facilitate this, several open weights models offer not only a model version that can be used for inference to implement a chat application or an agentic workflow, but also a "base" or "raw" version that is not suitable for being used directly for inference but which is suitable for you to do a post-training/fine-tuning, to create a model more appropriate for your particular needs.
  An "open weights" model is sufficient for most of the potential LLM users, because training a model is something that requires expertise, expensive hardware and a lot of time, so few would be able to do it even when given access to the necessary data.
my-next-account
I wonder whether Oracle is going to go bankrupt because of this
- worldsayshi
  Why Oracle?
  InsideOutSanta
  They're extremely exposed to a market crash due to their huge debt-funded compute contracts.
  Having said that, while one can always hope, I would assume that Oracle is one of these companies that will be bailed out or find a way to survive.
  cyanydeez
  oracle is licking so much boot, you'd need to also have the republican fascist party completely faall apparent.
leroman
The token-economics for closed source models are different, they are optimizing for 200 USD tokens worth of software engineer monthly usage, they will increase per token price as models or harnesses are more optimized.
surgical_fire
One thing it doesn't even mention is how good those models are. Evet since I moved to DeepSeek I had zero regrets. It performs exceptionally well. I honestly prefer it to ChatGPT (or Claude that I use at work).
I never used Fable, maybe it is that much better. DeepSeek has no problems with the workloads I give it though - if it only keeps marginally improving with each interaction I don't see myself needing to come back.
dwaite
Where's a few good places to go to learn more about open weight models, both running hosted and running locally?
- drillsteps5
  Aside from googling "how to download and run open weights model" check out localllama (yes 3Ls) subreddit. Huggingface.co is where many of them are published.
  There's many providers that run open weights models and give you access. Many decent open weights models cannot be run on consumer-grade hardware (DeepSeek, GLM, many others).
juancn
90% of my model use is on local open-weights models.
The things that I need to automate do not need frontier models. Heck, even a gemma-4-12B-it-qat-UD-Q4_K_XL can deal with a lot of complexity if properly guided (it can run on 16GB of unified memory, for example on a base model Macbook Air).
I've been using it to translate Javascript to a custom scripting language in a product I work for, just by providing a system prompt and an MCP tool to call the target compiler to check for errors.
Sometimes it converges faster than Opus 4.6 (I've tried) because it doesn't over-think stuff.
If it were a person I would say it knows less, but it's still smart.
I mean, you don't need the most powerful tool at all times. We treat AI as one-size-fits-all, and once cost gets in the way, it will matter.
CuriouslyC
The government is going to ban foreign models and foreign inference providers, without question. The US govt is going to dig its dirty little fingers into OAI/Anthropic/Oracle/(probably)SpaceX and end up taking some stock for a sovereign wealth fund (probably timed to prop up flagging share prices, and with the promise of sweet government grift down the line), and at that point the bans will be framed as protecting that investment.
dist-epoch
It's so refreshing to read a short to the point article, which is not extruded into 10 pages with LLMs.
isoprophlex
Aren't these open models so cheap because they're (partially) chinese gov. sponsored, and because they're stealing and redistributing the IP that comes in?
- amanaplanacanal
  Whose IP do you think they are stealing? According to US courts, training is fair use. And even if it wasn't, they are distilling output from other models, which isn't copyrightable, again according to US courts.
- grebc
  And the American ones are stealing and redistributing the IP of every single person who authored anything on the internet at some point.
- titanomachy
  Maybe, but there's tons of providers available, so you can pick one that you trust not to steal your IP (or run it yourself, if you're rich and paranoid enough).
- blamestross
  Well I can't speak to the chinese gov part, but ALL the models are IP laundering systems. I'd rather IP get laundered into open source.
- jrm4
  Technically correct, the worst kind of correct :)
tuatoru
Deepseek's price looks unsustainable. Ant have said their operating margin is 70%. A leaner company could maybe raise that to 90%.
Most of the cost of supplying inference compute is depreciation of the GPUs. Maybe Deepseek is anticipating a 50 year life for theirs.
snootypoot
i agree with his statement that the big companies and the string pullers in government are inching toward banning open models.allowing the plebs unrestricted access to things seems against the wishes of the "you will own nothing and be happy" / "you will rent everything on the cloud and subscribe to your appliances" crowd such as blackrock and so on.
anyone who disagrees is not seeing the forest, only the trees.
- amanaplanacanal
  I don't see how they could ban them in the US. Code is speech, and the first amendment still mostly holds. They might try, but I don't see the courts upholding it.
kittikitti
Even if open weight models were vastly more expensive, I would still prefer them. I don't know where my data is going and whether they're lying about the model when I make an API call. They can ban you from their API for any reason. Anthropic recently pulled their frontier models. There are numerous compliance concerns. The list goes on and on.
cws_ai_buddy
[flagged]
nsoonhui
[flagged]