5 points by jaxline506 2 days ago | 3 comments
- What were the personas [0] trained on?
- Most "AI ad testing" is GPT sentiment scoring with a wrapper. We built something architecturally different and the Super Bowl felt like the right moment to show it publicly. The core issue is that LLMs model language. They predict what a person might say about something, which is not the same as modeling how a person will respond. That requires thought-to-behavior mappings, not next-token prediction. We call our scoring framework FEEL-THINK-ACT: emotional activation, cognitive framing, and behavioral prediction in that order. To check we were not just building a different flavor of hallucination, we benchmarked against the OASIS dataset (Harvard/Illinois). Our MAE on emotional response is 0.02-0.15 vs 1.0-2.5+ for GPT/Claude, emotional accuracy 98% vs ~78% for base models, consistency across runs 96% vs 72% for competitors. Every output also carries a confidence score and a hallucination risk score because we did not want to hide uncertainty behind a clean number. On the simulation side we do not model a single synthetic person. We run the target population 10,000 times with built-in variation and return a distribution. The tails and the variance are the insight. For the Super Bowl we scored all 101 units pre-game through post-game covering memorability, brand clarity, audio-off performance, and cultural resonance by platform and audience segment in under 4 hours. Live context came in via our 615 Environmental Intelligence system which folds news cycles and cultural signals in at run time, so the scores reflect the world as it was when the ads aired. The API is OpenAI-compatible. If you are already building on OpenAI it is a base URL swap:
from openai import OpenAI
client = OpenAI( api_key="YOUR_MAVERA_KEY", base_url="https://app.mavera.io/api/v1" )
response = client.chat.completions.create( model="mavera", messages=[{"role": "user", "content": "Score this ad copy for emotional resonance."}], extra_body={"persona_id": "YOUR_PERSONA_ID"} )
print(response.choices[0].message.content)
Free tier, no enterprise contract, no demo call. Full methodology and scores at superbowl.mavera.io, API docs and free key at docs.mavera.io. Happy to dig into the OASIS benchmarking, simulation architecture, or 615 in the comments.
- How did the predicted response compare to actual responses for super bowl ads?
I noticed the benchmark mentioned but its too jargon-filled to follow for me.