• I think many developers worth their salt will argue the same. Cloud is and has always been a shortcut to buying your own hardware. Local models will get better and smaller. Qwen3-coder-next runs on a Spark and is as capable as Sonnet 4.5. Bonsai released a 1-bit model yesterday.

    I also like the freedom of not having to ration a daily allowance of tokens.

    • Qwen3-coder-next is way worse than Sonnet 4.5. Also, despite he lack of "coder" in the name Qwen3.5 is much better at coding than Qwen3-coder-next so you might want to check that out.
    • A Spark is still not cheap hardware though. It is only cost effective for heavy users (i am not one).
  • MBP M5 Max. 128GB ram. oMLX. unsloth-Qwen3-Coder-Next-mlx-8bit. opencode with the telemetry stripped out. This seems to be the sweet spot for now for my local dev. Helps me to not accidentally blow through $100 in Claude tokens in a day when exploring different performance tradeoffs the backend of my $DAYJOB codebase.
    • > opencode with the telemetry stripped out

      Care to share? This happens to be important to me and I’m sure I’m not the only one (as evidenced by Github issues).

      Did you also change the other questionable behaviors?

      • No really, no. the last thing Github needs is yet another vibe coded fork of a mostly vibe coded app in the first place.

        If I ever get around to vibe rewriting-it-in-go I might share that.

  • I'd like a local LLM too, but they're expensive (consider the opportunity cost of a GPU, if it sits idle most of the time), and produce heat and noise in places that I'm trying to cool and quiet.

    I'd like a private jet too, alas.

  • I love local first. I am finding that a 120B MoE is hitting the sweet spot for local hosted. Right now that takes a 2K strix halo, a 4k GB10 machine, or a 5k Mac Pro. 2 years from now I think hardware will take us back to the 2k ish range with good performance.

    I love my dual GPU setup (2AMD Radeon r9700 64GB vram) but it costs 5x electricity than my GX10 (GB10 chip inside) and since layers are landing in system memory my TPS is half the GX10.

    Now a dense model like Devstral2 24B slaps on the Dual GPU setup. I just haven’t gotten as much out of that as I have the 120 MoEs