180 points by seahorseemoji 2 days ago | 73 comments
  • Superpowers feels like 20 years ago when people would be sharing and debating their incredibly elaborate .vimrc files, which totally made them super productive. Meanwhile, I tried to stick to stock configuration as much as possible (mostly for portability / ssh reasons). In a similar vein, these days some of my colleagues are sharing all their skills and prompt tricks and stuff, and I try to just use barebones Claude Code as much as possible, and I feel like it keeps getting better and better and all these prompt shenanigans are just not worth it.
    • The main advantage of skills is defining a process that is at least vaguely consistent across different executions for a given task, and plugging in some of the common pitfalls an LLM might fall into for some of those executions.

      But to me, both the process and the pitfalls are going to be heavily specific to the individual or team, and to the work they are doing... It's something that evolves over time as you bump into repeated rough edges.

      Taking someone else's skills and blindly applying them to my situation feels odd. I don't know what rough edges those skills were made to address, so I have no reason to believe they would fit my specific needs, initially, any better than the baseline LLM.

    • At least for me, writing the code // using the IDE // prompting the LLM to do the thing is not the hard part. The hard part is always understanding the actual problem // underlying assumptions // actual customer need and then architecturing the right solution to that. Actually implementing the solution is the easy part, and LLMs have made that now even easier.

      But they've not really helped interpret customer requirements when they give you logically inconsistent / unimplementable business processes that need major re-vamping before they can be coded. To some extent they can help de-code poorly worded emails sent by some exec while golfing or in a meeting. But they still can't conjure information out from nothing. Nor are they that good at helping to play the political game when you have team X and team Y depending new feature Z, but feature Z requires completely changing how either team does process Ab but neither will even admit that their processes aren't compatible with each other.

    • The only skill I genuinely find useful is the /strategic-compact from ecc (everything Claude Code) but you can get the same effect by saying something like "I am going to compact the session, give the next Claude session a summary of what was added or changed and what the next step is". The you can just use /compact <message>. Other than that, just barebone too. ECC is shipped with something like 300 skills, maybe I should just delete it!
    • Same here but yesterday I decided to give this a goal. It started a workflow that spawned like a 100 sub agents to research the best o11y product I can use. It burnt through all my max plan (before it could start making any changes), something I could never do simply by using claude code yet. It's not for me.
    • Agreed. It seems like a large and growing artifice sitting on a foundation that is constantly shifting.
    • Agreed. I often take a look at the skills and maybe take something from there to create a more minimal version that does just enough for my needs and nothing more. YAGNI is definitely the principle to be followed here.

      I'm sure all these people on Reddit that talk about having 5 Claude Max 20x plans and hitting the weekly limits on them all have a ton of these loaded.

    • 100% Also good to note that Cherny and Steipete have said in interviews that they keep it simple and do not use any of these shenanigans.
    • Ultracode is pretty good. I’m not basically using the grill-with-docs from Matt and default plan mode on Cc or codex. It’s just enough.
      • Is it usable on Pro plan? Seems like it would burn my rate limits immediately
      • Did you mean "I'm basically" (!not)?
    • I am currently slowly exploring subagent based feature factories people have been talking about. I looked at superpowers and it just felt like a lot. The process seems be trying to match the complexity of a human team workflow. Seems like the wrong angle to take for this. I am getting a lot out of my simple subagent team by just clearly defining their roles and restricting what they can access. Although experimenting with orchestration policies is required. This feels like tokenmaxxing for marginal improvements.
  • For what it's worth, I really enjoy superpowers. In particular, it does a great job with TDD that stops the model from jumping to conclusions, and I've been able to get it, even with Opus, to execute on much longer specs quite well.
  • Neither the article or the corporate blog post explains what Superpowers is. Seems to be an opinionated collection of skills for dev work

    https://github.com/obra/superpowers

    • The GitHub description is a pretty good summary: "An agentic skills framework & software development methodology that works."

      Here's what that methodology looks like: https://github.com/obra/superpowers#the-basic-workflow

    • Not really - it's essentially a workflow.

      The steps, described [here](https://github.com/obra/superpowers#the-basic-workflow), are: brainstorming → using-git-worktrees → writing-plans → subagent-driven-development or executing-plans → test-driven-development → requesting-code-review → finishing-a-development-branch.

      The principles, described [here](https://github.com/obra/superpowers#philosophy), are: Write tests first, always; Process over guessing; Simplicity as primary goal; Verify before declaring success.

      Install it, take a complex tasks, and instruct the agent to implement it; it's easier to watch it in action than to describe it.

      In my own experience, the advantage is that it's a very systematic workflow - investigation of requirements, breakdown in simpler steps, and TDD development, among the other aspects.

  • I feel like all these fat skills on top of agents will become stale very quickly. Unless it is for a very specific workflow with a need for deterministic outputs, I just don't see them having high value.

    If I do need such workflows I just use plan mode, and it is 90% sufficient. I created a skill that hooks on top of plan mode because of its shortcomings, but I'm pretty sure even this will become obsolete soon as models improve.

    https://github.com/oliver-im/jidoka

  • I used Superpowers for a few weeks. I ran into a couple issues:

    * I wish I could turn it on selectively. Many of my requests do not require the "verification before completion" and TDD ceremony. For example, agents using stock Superpowers will go so far as to grep a file every time you ask to add something to them to verify that the edit really landed.

    * While I like speccing out/designing a project before implementation (nothing new in that regard), I don't like how precisely superpowers plans out the implementation in the /writing-plans skill. It tells future agents exactly what files to edit. There are two big issues with this:

      * We need to manage context rot. If one LLM session is responsible for writing out the entire plan, we aren't solving context rot. Not only is the "smart window" of context exhausted by the time the agent is planning, eg, step 7 out of 15, but it's also dragging forward all the possibly bad ideas it had earlier. It would be better if steps were planned independently.
    
      * Implementation is an iterative process. You find things out as you go. Your assumptions turned out to be wrong, you realize APIs don't behave the way you thought you did, etc. This is why writing out a precise plan ahead of time is an issue – it's written without this iteration.
    
    IMO, the strongest part of Superpowers is /subagent-driven-development. Yes, it's SUPER slow. For a laugh, you can ask it to make a change you know can be done in one line. It'll do it in one line, but it take literally an hour with all the verification. But that's sort of the point. It is _very_ deliberate. For each step, it reviews the step for both compliance and code quality, then has another agent implement the fixes, _and then it reviews the fixes again_. It does this for every step (not at the end of the project). While this might seem like overkill, it leads to code which complies with the spec far better.

    Instead of writing a super detailed spec, I think I'd like /writing-plans to come up with appropriate "units" of work (sometimes called slices) and to brainstorm with the user regarding implementation, but to leave it looser than "edit this exact file in this exact way". That should leave a lot more leeway to implementation agents but still give the review agents something to check compliance against.

    • I think you can write a quick script to toggle the disable-model-invocation to turn off auto invocation.
  • Where I $work, someone used Superpowers to pull off two big projects that before AI have always been left untouched because of the effort and time required. One was about unifying lots of duplicating (but kot exactly) libraries, and another to convert our bespoke shell scripts used throughout deployment pipeline to ansible.

    When I used it though , I only found it burning too many tokens to do too little. I guess Superpowers is useful only in hands that know how to manipulate it.

    • This is true, one has to step back and learn a new methodology. It's basically pulling our brains up and staying at the high level, and letting the Superpowers workflow do all the heavy lifting. And learning to trust that.

      Similar to Addy Osmani's Agent Skills and Matt Pococks skills.

      Great way to build larger projects!

  • The screenshot of ol' claude closed code with that ascii table tells it all: Vapor AIware.

    As if it really would work like that. The noise added by the verbosity alone is not taken care of enough, and this entire thing belongs on the great pile of ai vaporware.

  • How does Superpowers compare with Matt Pocock's skills[1]? I only tried the latter, and to be honest, I had positive results without burning a quadrillion tokens.

    [1] https://www.youtube.com/watch?v=-QFHIoCo-Ko

    • I heckin love his /grill-me skill. Terse, to the point, and delivers outsized results.

      Gonna take a moment to share my own generic "retro" prompt, which has found many areas of improvement IME.

      > Let's conclude with a retro. Did you run into any issues during this session that you think could be improved? Any failed tool calls, confusing docs/prompts, or tricky wording that took you effort to figure out, etc? Any final thoughts that you want to raise? Anything minor you didn't mention? Help make this codebase easier for the next agent to work in.

      It's somewhat doc-focused since I'm currently working on fairly dense design docs... but you can easily customize it for your own needs.

      This prompt reveals how absolutely _ass_ the Claude Code harness is (so many stupid tool call failures), but not much I can do about that.

      • > This prompt reveals how absolutely _ass_ the Claude Code harness is (so many stupid tool call failures)

        I've just started using Claude Code this month after months of Claude in VSCode + GitHub Copilot (and a bit of dabbling with AWS Kiro), and I'm actually impressed by how seemingly polished Claude Code is.

        I think Copilot in VSCode broke far more in my months of (ab)using it.

      • Side note: Matt has a new skill /grill-with-docs, which he recommends as the one to use for coding. Regular /grill-me he doesn't recommend for coding anymore.
    • apparently Matt has a upgraded version /grill-with-docs

      https://www.youtube.com/watch?v=6BB6exR8Zd8

    • Is this available in written form or even as a GitHub repository?
  • I can't believe that a bunch of Markdown files now comes with a "Commercial Services" section. It feels like an elaborate GitHub Karma farm. Everything has to be commercialized and advertised.
  • I gave Superpowers 5.x a whirl for a week, and aside from consuming a stupid amount of tokens, it did materially worse across all my personal benchmarks and general day-to-day development compared to plain Codex/Claude. I'm convinced it's either some 4D ploy by the AI cartels to set tokens ablaze, or it only provides Superpowers to those without any power to begin with. Rating: 1/5 Pinocchios. Would not recommend.
    • I'm a certified Superpowers hater. It's just not necessary with the modern models and fills up the context windows with garbage and adds an insane number of turns for no benefit.

      I had similar prompts back when the models were terrible at instruction-following, so it was actually useful to fill up their context with a mass of instructions so they'd be less likely to forget rules.

      Now I've got a few small slash commands or pasted prompts that work perfectly every time as the models follow them exactly.

      • Likewise. I use Reasonix with planning-with-files and that's been better than even openspec. Cache hits with the help of Reasonix on Deepseek make requests practically free. And that's unsubsidized, with American providers like Digital Ocean or cloudflare.
      • Yeah I like the idea, but I'd rather just not use most of these plugins, superpowers included. Code seems to include the best tricks in itself anyway at a fast pace.
    • 6.x feels much more efficient with respect to token usage to be fair.

      I picked up superpowers back when it first started gaining traction; the first iteration felt like an “oh shit” moment for me, then the sheen quickly wore off. Higher spend, slower throughput and mediocre results made me eventually drop it and go back to plan mode, which had improved significantly during that time.

      Coming back, 6.x does feel different and I’m back on the superpowers train. I’m finding it great at taking discrete tasks from beginning to end with very little hand holding.

      I run every session with a /goal as well: “Spec + Plan is written and you have implemented the plan without my involvement. You have validated that the implementation is complete and ready to merge”

      It’s also great in situations where you may need to complete a plan over multiple sessions, because you get a whole ton of state with superpowers that new sessions can pickup on.

    • This. I found superpowers a huge token guzzler. And more generic a skill is the worse it seemed to perform. I have found that skills are something you need to build yourself and for your needs and most importantly be willing to throw away. One team I know blindly checked this into every repo they had. They also had the highest cost per pr across all teams in our org (of about 60 eng teams). AI has already given people superpowers. How sad is it that they now need to be told how to just chat and prompt and use AI effectively as a pair programmer:(
      • This. I found the original superpowers was a great start, but rewrote all those skills to fit my workflow, and iterated on that.

        Writing skills is the new writing code.

    • What I like about superpowers is that for my workflow, I spend most of my time brainstorming with the claude session. Using its brainstorm mode helps to keep it from shifting to just writing code. That's basically never what I want until I actually want it. Once we've locked in a design, whatever, I don't care.

      But when I don't trigger brainstorm mode, even using the built in plan mode, it's just never as in depth of a brainstorm partner for me.

    • It sounded like it might not hurt and seemed endorsed by Claude and codex because they both had plugins for it by default BUT I ripped it out after I kept seeing Claude/codex TDD things like when I asked them to make a pydantic model immutable. I’d end up with unit tests testing that my immutably configured pydantic model was immutable or tests that setting foo=bar was actually foo=bar in app config.
    • I haven't used superpowers yet, but it seems a major focus of this release was to reduce clocktime as well as token spend.

      From TFA (well, blog):

      > The long and the short of it it is that across about 36 hours of work and what would have been $650 of unsubsidized token spend, our Anthropic eval benchmarks were looking like we'd reduced wall-clock runtime for Superpowers builds by 50% and token spend by 60%.

    • I found it slowed me down significantly at first, and produced more verbose code. After a few weeks of using it, I think I've gotten used to it (sometimes I explicitly bypass it, but it's good enough to know which skill to use).

      Yeah on the token consumption, I'll be doing something small at work, and it'll consume a lot of tokens.

    • And the fact that this article’s story is basically “I prompted Fable with a goal and went to sleep and the model got it done” is telling me that the latest models have gotten past the need for Superpowers… even the creators of superpowers is just using a simple /goal!
    • I only use superpowers when I want to stick with composer2.5(fast) for everything. If I use it with other models it's terrible. With composer2.5 it is slightly better, though not much.
    • Same here. I tried something similar to Superpowers and it went completely overboard for a small bug fix - writing a TDD, generating artifacts, etc.
    •   The model 
          |
        The harness
          |
        The harness of the harness?!!
      
      This structure doesn't make any sense to me. It's like adding a half leather half shag wrap to your steering wheel. Not to mention the harness itself is updated almost multiple times daily. I'm sure these framework authors are keeping tabs on the harness of the harness performance for every release of the harness.
      • It's harnesses all the way down.
  • So far SDD (Spec Driven Development) with openspec hit the right balance for me, the Workflow is not too heavy while execution still churning good result given the spec is done well.
  • How do I know if this is worthwhile without any benchmarks against 'not using Superpowers'?
    • As a long time user, I'd recommend checking out https://github.com/obra/superpowers#the-basic-workflow. If that reasonates with you, try using it to develop a few features or capabilities.

      It works very well for the way that I work (interactively and iteratively, not "one-shot"), and it helps me to better work in less time. Superpowers is one of the few skill/agent suites I use for all software development projects.

      If you like building skill/agents, the posts at https://blog.fsck.com/ are a great resource for learning how to do well. The effectiveness of my project Axiom (a skill/agent suite for Apple OS developers) has benefited enormously from the knowledge that Superpowers' creator Jesse Vincent has been kind enough to share.

      TLDR: You owe it to yourself to try it.

  • My use case is I have them installed and let Claude decide when to use them. Looks like for my recent sessions it has been using superpowers:test-driven-development 5%, and superpowers:subagent-driven... 1%. I haven't really been working on new projects this past week though which seems to be where they fire off the most, in particular the "writing a plan" one.
  • The cool thing about superpowers is it's built using evals rather than just vibes.
  • I think I’ve mostly found superpowers helpful, especially for TDD. It’s cool from this blog (and their GitHub) that they’ve verified it in various ways too. One issue is that I’ll sometimes see it wanting to write a whole doc for a follow on feature with a similar structure and have to tell it to just use the last one again.

    The most annoying thing is that it always pauses before implementing to ask if I want it to use Subagents (which it always recommends) or not.

    • This is the point where I usually start a new session with a fresh context window and invoke the sub-agent development skill and just point to the spec+plan.

      I’d be curious to see if there was a way to automatically do this for me in Claude Code.

  • I’m a big fan of the Compound Engineering plugin from Every. As an amateur developer it helps me brainstorm, plan and implement apps very well.

    https://github.com/everyinc/compound-engineering-plugin

  • I've loved Superpowers right along. I think a lot of what it does has been ingested into Claude Code proper now so I'll be interested to see if this release actually changes things up.
  • Superpowers is pretty much convincing LLMs they can do better. It almost never works that way.
  • Anyone have an opinion comparing this to GSD?
  • seems like this would perform better with cheap open-source models on OpenCode compared to proprietary models like Claude Code or Codex.
  • I'm honestly surprised at all the people here commenting that superpowers didn't work out for them.

    For me personally, it was a game changer when I first began using it and now it simply is as much a part of my workflow as any say, using git (yeah it has its warts but way way more value).

    Also, the latest (version 6) is noticebly token efficient as claimed.

    Did the people who found it underwhelming not try starting with the brainstorming skill first?

    • I feel the same way. I've used superpowers since I found it during the initial Ralph hysteria and love it. Every task I do starts with brainstorming and it always produces great results, even coordinating across multiple repos. Having the plan to read and comment on ahead of time is great, although admittedly maybe that is built in to the major harnesses now and I just don't know about it. Always feel uneasy kicking off a task without having used superpowers.
      • Same here, the brainstorming and research phases are good and the spec part at well, I do side adversarial reviews all the time with other independent agents and feed sp the reviews and they are tools to avoid gliding over the surface. The dev process is longer but the result when done in a sound architecture with documented practices is quite good even though it’s slow. Very happy with the tool so far
  • i just dont find skills work flow all that generic enough.
  • This is great in concept but what prevents me from using it is TDD. I don't want to waste tokens on producing code that doesn't ship to the end user. Design by Contract is a far superior approach. If you've never heard of Design by Contract I don't blame you, our culture really failed to bring it mainstream. But I swear by it and it gives me real superpowers. Maybe I should fork this and gut the TDD part and replace it.
    • Nothing about Superpowers forces you to use TDD, brainstorm first, etc. It’s not rigid about the workflow.
    • What programming language are you using for Design by Contract?
      • I'd like to know, too!

        DbC is an actual superpower. Coupled with a gradual type system, especially one that provides type refinements (not sure if that's a Racket-specific[1] or generic term), DbC covers a wide variety of problems and either eliminates them or makes debugging them a lot easier. The problem is that only two/three languages are built around DbC (Eiffel, Racket, Ada/SPARK). There are a few others (e.g., Clojure, Raku, Scala) that provide some degree of support, but their capabilities are incredibly basic compared to what, for example, Racket offers. And for mainstream programming languages, there are libraries, but it's a coin toss whether authors even understand the idea (I once asked in a ticket for some Python contract library about contracts for callables and was met with "what?" - as if specifying range constraints on ints was all DbC was about).

        Unfortunately, Racket is tiny, barely a blip in the training data. In theory, you could probably get agents to a new level of reliability by making them write Racket; in practice, though, you'll burn a lot more tokens on every single edit, because the agent will need to rediscover how to do things in Racket much more often than in Python.

        I had some hopes that LLMs and agents based on them would be an opportunity for less popular, but technically advanced languages. So far, it doesn't seem like it's happening; the ridiculous per-token API prices mean that you need a really good agent harness for your language - and what niche PL has resources to focus on building one?

        [1] https://docs.racket-lang.org/ts-reference/Experimental_Featu...

  • All these prompt and skill based git repos are sus... nothing is benchmarked -its all so subjective and unproven and breaks with model updates -everyone and his uncle has a 'secret sauce skill' -that just proves to me the subjectivity of this endeavor.
  • To be blunt I can't take this product seriously when they don't even run benchmarks. Your prompts make Claude better? Cool: prove it. Methods to evaluate LLM performance exist, they're called evals/benchmarks, and every company that is serious about AI runs them when they release a new version. (Of course benchmarks have their own issues, but squabbling over which benchmark is best and what issues there are is step 2 in being a Serious AI Company and step 1 is running them at all!) The fact that the only proof they have that 6 is better than five is a hacky table in a screenshot from Fable is, honestly, concerning.
  • I thought this would be about https://github.com/obra/superpowers