• probably more interesting (from 01/2026) https://arxiv.org/pdf/2511.10643 "Black-Box On-Policy Distillation of Large Language Models". they got a qwen 2.5 14B model trained to GPT5 level using the described technique "Generative Adversarial Distillation (GAD)".
  • Considering the very small difference between just SFT on the student model as compared to SFT + DPO on a proxy, doesn't it make sense to concentrate on ensuring the SFT dataset is perfect rather than sorry about DPO etc? And just train directly on the student model?
  • Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

    Related paper that's a good read: https://arxiv.org/abs/1908.08962

  • Why is this published again? Is this a reference to recent events?
    • I just saw some post about it on Threads and found it interesting so decided to share!
      • My best guess is this is a reference to the recent accusations from Anthropic of chinese labs ¨distilling¨ on their models
        • And it’s a paper from Alibaba researchers, the company/lab that Anthropic called out by name.
          • I do not find the Anthropic allegations believable.

            All the results presented in these distillation papers are for very small models.

            In order to gain anything, Alibaba or others would need today to use the Anthropic models to improve LLMs at least one hundred times bigger than those tested in these papers.

            I assume that the number of queries to the teacher LLM grows superlinearly with the size of the student model, which would mean that billions of queries would be needed. Even for a linear growth, at least hundreds of millions of queries would be needed.

            I do not see how any Claude account could do so many queries without being detected. Even if the queries would be distributed over thousands of accounts, it would still be easy for Anthropic to stop any such attempts.

  • “Relevant to anyone building failure-attribution systems for agent pipelines — black-box distillation techniques here could feed into causal attribution models without needing white-box access to the underlying model.”
    • That is easy when you can control the teacher model yourself and you want to transfer its capabilities to a smaller model.

      If the teacher model is run by an external entity, e.g. Anthropic or OpenAI, then the number of queries to the blackbox model that is required is so great that it should be easy for the owner of the teacher LLM to detect and stop any such attempts.

  • The Chinese are really going strong on destroying the American AI economy bubble. Honestly, despite the fact that I'm totally pro USA and anti China, I think we should help them crashing the American AI bubble. They are controlling everything and we can't even buy a new computer nowadays while getting no benefit from this. I wish some influential programmers stimulated coders everywhere to skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.
    • If we programmers united we had a clouded code alternative that didn't suck :-)

      But we're not far.

      My requirements: - a terminal app without advanced tui, not written like "a browser running in a terminal" or a game. There is no need to overcomplicated. - ability to manage prompts per model, compress context using alternate models, and minimise token costs better - like the YouTube's Sentdex's Minion mini harness (in fact I'm building on top of his as we speak). - support for agent work fanout - support for MCP, but switchable off/on depending if needed (I use a single MCP aggregator anyway so mcp tool use doesn't eat my context) - support for lsp/tree-sitter, again switchable when needed. - support for OpenAI api and written easily enough so other ones like deepinfra are easy to add.

      Nice to have: - have some sort "prompt library" that would store tweaked versions of prompts for different models so it adjusted the harness as needed depending on which model we call.

      That's it.

    • The US government will do the job of destroying the American AI economy through their export controls.
    • "anti China", why so? have you lived there?
    • The US "product machine" is so strong. They really know how to do frictionless signup and vendor lock-in on the corporate side.
    • > skip Claude and Chatgpt subscriptions for Chinese ones, at scale. If we programmers united we could help this bubble burst, I'm sure.

      I'm doing my part!

    • [flagged]
      • Dario, is that you? Is Anthropic’s next ploy to seek support via the culture wars?
      • Why would I care about Christian morals? In fact from what I can see of the US, you don’t have them either.
      • Nvidia, Anthropic and OpenAI are controlling everything, and nothing is improving for everyone, quite the opposite. So I just hope they crash to the ground.
      • lol Christian Morals. Epstein and his best buddy running the show tells you all about this
        • What Epstein and buddies were doing was very... Christian...

          Virgin Mary was very young in the events you know.

      • "They don't have Christian morals" -- does that mean they don't commit genocide and fuck kids? Because that sounds like a point for them
  • Can we note that this is a 2024 paper in the title?
  • [dead]
  • [flagged]