Knowledge Distillation of Black-Box Large Language Models (2024)

118 points by babelfish 19 hours ago | 23 comments

potus_kushner
probably more interesting (from 01/2026) https://arxiv.org/pdf/2511.10643 "Black-Box On-Policy Distillation of Large Language Models". they got a qwen 2.5 14B model trained to GPT5 level using the described technique "Generative Adversarial Distillation (GAD)".
phantompeace
Considering the very small difference between just SFT on the student model as compared to SFT + DPO on a proxy, doesn't it make sense to concentrate on ensuring the SFT dataset is perfect rather than sorry about DPO etc? And just train directly on the student model?
dmezzetti
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Related paper that's a good read: https://arxiv.org/abs/1908.08962
Alifatisk
Why is this published again? Is this a reference to recent events?
- babelfish
  I just saw some post about it on Threads and found it interesting so decided to share!
  tough
  My best guess is this is a reference to the recent accusations from Anthropic of chinese labs ¨distilling¨ on their models
  swingboy
  And it’s a paper from Alibaba researchers, the company/lab that Anthropic called out by name.
  adrian_b
  I do not find the Anthropic allegations believable.
  All the results presented in these distillation papers are for very small models.
  In order to gain anything, Alibaba or others would need today to use the Anthropic models to improve LLMs at least one hundred times bigger than those tested in these papers.
  I assume that the number of queries to the teacher LLM grows superlinearly with the size of the student model, which would mean that billions of queries would be needed. Even for a linear growth, at least hundreds of millions of queries would be needed.
  I do not see how any Claude account could do so many queries without being detected. Even if the queries would be distributed over thousands of accounts, it would still be easy for Anthropic to stop any such attempts.
StreamCtx
“Relevant to anyone building failure-attribution systems for agent pipelines — black-box distillation techniques here could feed into causal attribution models without needing white-box access to the underlying model.”
- adrian_b
  That is easy when you can control the teacher model yourself and you want to transfer its capabilities to a smaller model.
  If the teacher model is run by an external entity, e.g. Anthropic or OpenAI, then the number of queries to the blackbox model that is required is so great that it should be easy for the owner of the teacher LLM to detect and stop any such attempts.
duendefm
The Chinese are really going strong on destroying the American AI economy bubble. Honestly, despite the fact that I'm totally pro USA and anti China, I think we should help them crashing the American AI bubble. They are controlling everything and we can't even buy a new computer nowadays while getting no benefit from this. I wish some influential programmers stimulated coders everywhere to skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.
- Roark66
  If we programmers united we had a clouded code alternative that didn't suck :-)
  But we're not far.
  My requirements: - a terminal app without advanced tui, not written like "a browser running in a terminal" or a game. There is no need to overcomplicated. - ability to manage prompts per model, compress context using alternate models, and minimise token costs better - like the YouTube's Sentdex's Minion mini harness (in fact I'm building on top of his as we speak). - support for agent work fanout - support for MCP, but switchable off/on depending if needed (I use a single MCP aggregator anyway so mcp tool use doesn't eat my context) - support for lsp/tree-sitter, again switchable when needed. - support for OpenAI api and written easily enough so other ones like deepinfra are easy to add.
  Nice to have: - have some sort "prompt library" that would store tweaked versions of prompts for different models so it adjusted the harness as needed depending on which model we call.
  That's it.
- laichzeit0
  The US government will do the job of destroying the American AI economy through their export controls.
- addedGone
  "anti China", why so? have you lived there?
- anax32
  The US "product machine" is so strong. They really know how to do frictionless signup and vendor lock-in on the corporate side.
- nozzlegear
  > skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.
  I'm doing my part!
- cynicalsecurity
  [flagged]
  anon373839
  Dario, is that you? Is Anthropic’s next ploy to seek support via the culture wars?
  girvo
  Why would I care about Christian morals? In fact from what I can see of the US, you don’t have them either.
  duendefm
  Nvidia, Anthropic and OpenAI are controlling everything, and nothing is improving for everyone, quite the opposite. So I just hope they crash to the ground.
  gmerc
  lol Christian Morals. Epstein and his best buddy running the show tells you all about this
  big-and-small
  What Epstein and buddies were doing was very... Christian...
  Virgin Mary was very young in the events you know.
  LNSY
  "They don't have Christian morals" -- does that mean they don't commit genocide and fuck kids? Because that sounds like a point for them
linolevan
Can we note that this is a 2024 paper in the title?
spacebacon
[dead]
TimXare
[dead]
modgate
[flagged]