Show HN: Trust Protocols for Anthropic/OpenAI/Gemini

29 points by alexgarden 5 hours ago | 17 comments

Stefan-H
My opinion is that all attempts to make an LLM behave securely that are based on training and prompting are doomed to fail. In Security, we have the notion of the CIA triad (Confidentiality, availability, and integrity), when we discuss this we often explain that these properties can be protected through people, processes, and technology. Training and prompting an AI to behave appropriately is far more akin to a "people" focussed control (similar to training and awareness practices) rather than a "technology" control.
The only way we will actually secure agents is by only giving them the permissions they need for their tasks. A system that uses your contract proposal to create an AuthZ policy that is tied to a short-lived bearer token which the agent can use on its tool calls would ensure that the agent actually behaves how it ought to.
- alexgarden
  You're absolutely right that AuthZ is the foundation — scoped permissions and short-lived tokens are table stakes. We're not trying to replace that.
  AIP operates one layer deeper. It reads the agent's reasoning trace and compares it to the behavioral contract. So when an agent considers calling the payments API — even though your AuthZ layer would block it — AIP flags that as a boundary violation before the call is ever attempted.
  Why that matters: an agent that keeps trying doors it can't open is telling you something. The AuthZ layer blocks each attempt, but nothing in that system flags the pattern. AIP catches the drift and gives you a signal to act on — revoke the deployment, retrain, or escalate.
  Think of it as: AuthZ is the locked door. AIP knows someone keeps trying the handle.
  The Alignment Card maps naturally to AuthZ policy — permitted actions become scopes, forbidden actions become deny rules, escalation triggers become approval workflows. They're complementary layers, not competing ones.
Normal_gaussian
Have you tried using a more traditional, non-LLM, loop to do the analysis? I'd assume it wouldn't catch more of the more complex deceptive behaviours, but I'm assuming most detections can be done with various sentiment analysis / embedding tools which would drastically reduce cost and latency. If you have tried, do you have any benchmarks?
Anecdotally, I often end up babysitting agents running against codebases with non-standard choices (e.g. yarn over npm, podman over docker) and generally feel that I need a better framework to manage these. This looks promising as a less complex solution - can you see any path to making it work with coding agents/subscription agents?
I've saved this to look at in more detail later on a current project - when exposing an embedded agent to internal teams I'm very wary of handling the client conversations around alignment, so I find the presentation of the cards and the violations very interesting - I think they'll understand the risks a lot better, and it may also give them a method of 'tuning'.
- alexgarden
  Good question. So... AAP/AIP are agnostic about how checking is done, and anyone can use the protocols and enforce them however they want.
  Smoltbot is our hosted (or self-hosted) monitoring/enforcement gateway, and in that, yeah... I use a haiku class model for monitoring.
  I initially tried regex for speed and cost, but TBH, what you gain in speed and cost efficiency, you give up in quality.
  AAP is zero-latency sideband monitoring, so that's just a (very small) cost hit. AIP is inline monitoring, but my take is this: If you're running an application where you just need transparency, only implement AAP. If you're running one that requires trust, the small latency hit (~1 second) is totally worth it for the peace of mind and is essentially imperceptible in the flow.
  Your mileage may vary, which is why I open-sourced the protocols. Go for it!
giancarlostoro
I have been working on a Beads alternative because of two reasons:
1) I didnt like that Beads was married to git via git hooks, and this exact problem.
2) Claude would just close tasks without any validation steps.
So I made my own that uses SQLite and introduced what I call gates. Every task must have a gate, gates can be reused, task <-> gate relationships are unique so a previous passed gate isnt passed if you reuse it for a new task.
I havent seen it bypass the gates yet, usually tells me it cant close a ticket.
A gate in my design is anything. It can be as simple as having the agent build the project, or run unit tests, or even ask a human to test.
Seems to me like everyones building tooling to make coding agents more effective and efficient.
I do wonder if we need a complete spec for coding agents thats generic, and maybe includes this too. Anthropic seems to my knowledge to be the only ones who publicly publish specs for coding agents.
- alexgarden
  Great minds... I built my own memory harness, called "Argonaut," to move beyond what I thought were Beads' limitations, too. (shoutout to Yegge, tho - rad work)
  Regarding your point on standards... that's exactly why I built AAP and AIP. They're extensions to Google's A2A protocol that are extremely easy to deploy (protocol, hosted, self-hosted).
  It seemed to me that building this for my own agents was only solving a small part of the big problem. I need observability, transparency, and trust for my own teams, but even more, I need runtime contract negotiation and pre-flight alignment understanding so my teams can work with other teams (1p and 3p).
  giancarlostoro
  Awesome, yeah, I wanted to check out your link but corporate firewall blocks "new domains" unfortunately. I'll wait till I'm home. I'll definitely be reading it when I get home later.
geiser
Definitely interesting, I hope all of this standardizes some day in the future, and if it's your protocol, great.
I have been following AlignTrue https://aligntrue.ai/docs/about but I think I like more your way of doing accountability and acting on thinking process instead of being passive. Apart from the fact that your way is a down-to-earth, more practical approach.
Great showcase live demo, however I would have liked a more in-depth showcasing of AAP and AIP even in this situation of multi-agent interactions, to understand the full picture better. Or simply perhaps prepare another showcase for the AAP and AIP. Just my two cents.
PS. I'm the creator of LynxPrompt, which honestly falls very short for this cases we're treating today, but with that I'm saying that I keep engaged on the topic trust/accountability, on how to organize agents and guide them properly without supervision.
- alexgarden
  Fair... Happy to do a deep dive on the protocols. FWIW, I'm dogfooding with an openclaw running smoltbot called Hunter S. Clawmpson. He blogs about AI from an AI's perspective: mnemom.ai/blog.
  You can see his trace data live here: https://www.mnemom.ai/agents/smolt-a4c12709
  The trace cards are all expandable and show you, in real time, what he's thinking/going to do, etc., and when violations are being caught. Turns out OpenClaw is extremely creative in finding ways to circumvent the rules. Voila AIP.
  Busy day today, but this gives you a pretty deep dive/interactive view into the protocols in action. Cool thing about smoltbot... It's literally "smoltbot init" with the API key to the provider of your choice, and you can go to the website and claim your agent and (privately) see your traces running the same way. Very low impact dogfooding.
neom
Seems like your timing is pretty good - I realize this isn't exactly what you're doing, but still think it's probably interesting given your work: https://www.nist.gov/news-events/news/2026/02/announcing-ai-...
Cool stuff Alex - looking forward to seeing where you go with it!!! :)
- alexgarden
  Thanks! We submitted a formal comment to NIST's 'Accelerating the Adoption of Software and AI Agent Identity and Authorization' concept paper on Feb 14. It maps AAP/AIP to all four NIST focus areas (agent identification, authorization via OAuth extensions, access delegation, and action logging/transparency). The comment period is open until April 2 — the concept paper is worth reading if you're in this space: https://www.nccoe.nist.gov/projects/software-and-ai-agent-id...
drivebyhooting
> What these protocols do not do: Guarantee that agents behave as declared
That seems like a pretty critical flaw in this approach does it not?
- alexgarden
  Fair comment. Possibly, I'm being overly self-critical in that assertion.
  AAP/AIP are designed to work as a conscience sidecar to Antropic/OpenAI/Gemini. They do the thinking; we're not hooked into their internal process.
  So... at each thinking turn, an agent can think "I need to break the rules now" and we can't stop that. What we can do is see that, though in real time, check it against declared values and intended behavior, and inject a message into the runtime thinking stream:
  [BOUNDARY VIOLATION] - What you're about to do is in violation of <value>. Suggest <new action>.
  Our experience is that this is extremely effective in correcting agents back onto the right path, but it is NOT A GUARANTEE.
  Live trace feed from our journalist - will show you what I'm talking about:
  https://www.mnemom.ai/agents/smolt-a4c12709
root_axis
Presumably the models would at the very least need major fine tuning on this standard to prevent it from being mitigated through prompt injection.
- alexgarden
  Actually, not really... proofing against prompt injection (malicious and "well intentioned") was part of my goal here.
  What makes AAP/AIP so powerful is that prompt injection would succeed in causing the agent to attempt to do wrong, and then AIP would intervene with a [BOUNDARY VIOLATION] reminder in real-time. Next thinking block.
  As I said earlier, not a guarantee, but so far, in my experience, pretty damn robust. The only thing that would make it more secure (than real-time thinking block monitoring) would be integration inside the LLM provider's process, but that would be a nightmare to integrate and proprietary unless they could all agree on a standard that didn't compromise one of them. Seems improbable.
CuriouslyC
Did you dig into production grade policy engines at all?
- alexgarden
  Yes — and it's a deliberate architectural decision that AIP is not itself a policy engine.
  Policy engines (OPA/Rego, Cedar, Casbin) are excellent at "given this action, is it permitted?" That's a solved problem. What's not solved is "what is the agent considering doing, and does that align with what it committed to?"
  AIP operates one layer deeper — it reads the reasoning trace between turns and compares it to the Alignment Card. The output is a verdict (clear / review_needed / boundary_violation) with structured concerns. That verdict can feed into a policy engine as a signal. You could absolutely wire AIP → OPA and have OPA make the final enforcement decision based on the integrity checkpoint plus whatever other policy context you have.
  The Alignment Card itself is intentionally simpler than a full policy language — it's JSON with permitted actions, forbidden actions, escalation triggers, and values. That's a feature, not a limitation. It's what the agent commits to, not the full organizational policy tree. The organizational policy tree lives in your policy engine. The Alignment Card lives with the agent.
  So the short answer: we integrate with policy engines rather than replace them. AIP tells you what the agent is thinking. Your policy engine decides what to do about it.