- I read the GitHub repo, but still don't quite understand-
What exactly is the advantage of doing this vs just running a prompt in my existing coding agent?
I don't understand why this is a harness/project vs just for example, a skill?
I'm confident there's a good reason, I just don't understand.
- Totally fair question. If you only want one agent to sanity-check one doc change, a skill/prompt is probably enough.
We actually aren’t rebuilding a harness here, it’s Pi with several LLM options to select from. The reason this is a project is that the useful workflow is more like a docs test suite: run realistic user tasks across multiple models, isolate each run in a greenfield sandbox, keep the transcripts/results, and make failures reproducible in CI.
You could ask an existing coding agent to spawn subagents for every task/model pair, but once that matrix grows, running hundreds of subagents on your computer gets messy. It’s also the wrong isolation boundary: for docs testing, you usually want the agent to start from a clean environment with access only to the docs/product surface you’re testing, not your whole working tree or local setup.
- Nice! I want to use this for my product at ngram.com. Btw, I also created a sample teaser video: https://www.ngram.com/watch/dari-explainer-video-brief-d7991.... Feel free to use it on your social media
- Cool approach actually letting agents test the docs makes debugging way more practical than just reading them
- [flagged]