Regression testing for AI agents. Golden baselines, CI/CD, LangGraph, CrewAI, OpenAI, Claude.
The open-source behavior regression gate for AI agents. Think Playwright, but for tool-calling and multi-turn AI agents. Your agent can still return and be wrong. A model or provider update can change tool choice, skip a clarification, or degrade output quality without changing your code or breaking a health check. EvalView catches those silent regressions before users do — and gives you the loop…
Verification confirms publisher identity (repo ownership), not code safety. The security scan covers known CVEs and suspicious install scripts — it cannot prove the absence of malicious code.
The open-source behavior regression gate for AI agents. Think Playwright, but for tool-calling and multi-turn AI agents. Your agent can still return and be wrong. A model or provider update can change tool choice, skip a clarification, or degrade output quality without changing your code or breaking a health check. EvalView catches those silent regressions before users do — and gives you the loop to investigate them, grade the confidence, and broadcast the verdict to your team. You don't need…