- We've been ramping up our gen ai usage for the last ~month at Upsolve and it's becoming a huge pain. There are already a million solutions for observability out there, but I like that this one is open source and can detect hallucinations
Thanks for open sourcing and sharing, excited to try this out!!
- Yeah thanks for the feedback.
We think we stand out from our competitors in the space because we built first for the enterprise case, with consideration for things like data governance, acceptable use, and data privacy and information security that can be deployed in managed easily and reliably in customer-managed environments.
A lot of the products today have similar evaluations and metrics, but they either offer a SAAS solution or require some onerous integration into your application stack.
Because we started w/ the enterprise first, our goal was to get to value as quickly and as easily as possible (to avoid shoulder-surfing over zoom calls because we don't have access to the service), and think this plays out well with our product.
- Love this. More transparency + better tooling is exactly what AI needs right now. Excited to give it a try.
- Interesting, AI needs much better guardrails and monitoring!
- Cool, I'm running few GenAI automations, but they're rather unsupervisored. So I'm gonna try it and check how they're doing.
- Very excited to be trying this out! The examples look very useful and excited to tie it up with other open source solutions
- Thanks for sharing! This looks perfect for teams getting started with monitoring for all model types -- excited to try it out!
- Yoo! Hopefully no more "oops our AI just leaked the system prompt" moments thanks to these guardrails!
- Looks great! How does the system detect hallucinations?
- Yeah great question
We based our hallucination detection on "groundedness" on a claim-by-claim basis, which evaluates whether the LLM response can be cited in provided context (eg: message history, tool calls, retrieved context from a vector DB, etc.)
We split the response into multiple claims, determine if a claim needs to be evaluated (eg: and isn't just some boilerplate) and then check to see if the claim is referenced in the context.
- [dead]
- Excited to get hands on with this. I've had too many sleepless nights trying to figure out how to track when my agents were hallucinating.
- Very cool!
- [dead]