io.github.hidai25/evalview-mcp

MCPcommunity
v0.6.0io.github.hidai25UnknownUpdated 3mo agoGitHub

Regression testing for AI agents. Golden baselines, CI/CD, LangGraph, CrewAI, OpenAI, Claude.

The open-source behavior regression gate for AI agents. Think Playwright, but for tool-calling and multi-turn AI agents. Your agent can still return and be wrong. A model or provider update can change tool choice, skip a clarification, or degrade output quality without changing your code or breaking a health check. EvalView catches those silent regressions before users do — and gives you the loop…

Automatically indexed from public sources. Not yet verified by the developer on Forge.Claim this listing →
3mo agoLast update
Package
Authorio.github.hidai25
LicenseUnknown
Version0.6.0
Sourcemcp-registry
Trust Status
B
60/100Good
Listed in Forge index+10/10
Publisher identity verified+0/25
Publisher: run `forge publish` from the package repo to claim ownership
Ed25519 publish signature+0/10
Included automatically when the publisher runs `forge publish`
Domain verification+0/5
Publisher: host /.well-known/forge.json on the package homepage with { "publisher": "<github-login>" }
CVE scan · clean+30/30
Static analysis · clean+20/20
npm provenance (Sigstore)+0/5
Publish from GitHub Actions with the --provenance flag
Paste into Claude Code, Cursor, or any AI assistant to fix all gaps
StatusCommunity-indexed
PublisherUnverified
SignatureUnsigned
Domain
Provenance
DependenciesNot audited
Tool surface
Security scan✓ Cleanv0.8.0 · 19d ago
EvalsNone
IndexedJun 13, 2026

Verification confirms publisher identity (repo ownership), not code safety. The security scan covers known CVEs and suspicious install scripts — it cannot prove the absence of malicious code.

About

The open-source behavior regression gate for AI agents. Think Playwright, but for tool-calling and multi-turn AI agents. Your agent can still return and be wrong. A model or provider update can change tool choice, skip a clarification, or degrade output quality without changing your code or breaking a health check. EvalView catches those silent regressions before users do — and gives you the loop to investigate them, grade the confidence, and broadcast the verdict to your team. You don't need…

Keywords
mcp