io.github.ahmedEid1/forgejudge

MCPcommunity
v0.1.1io.github.ahmedEid1UnknownUpdated 1mo agoGitHub

Open eval leaderboard + CI gate for autonomous coding agents (solve, score, trace).

An open, always-on leaderboard and CI gate for autonomous coding agents — every patch runs in a sandbox, every run has a public trace, every regression fails the build. ▶ Live leaderboard: forgejudge.ahmedhobeishy.tech · playground · methodology · model swap · MCP registry Current numbers (hidden-test = the agent never sees the failing test; $0 free tier; same harness, swap the model; 18 tasks ×…

Automatically indexed from public sources. Not yet verified by the developer on Forge.Claim this listing →
1mo agoLast update
Package
Authorio.github.ahmedEid1
LicenseUnknown
Version0.1.1
Sourcemcp-registry
Trust Status
B
60/100Good
Listed in Forge index+10/10
Publisher identity verified+0/25
Publisher: run `forge publish` from the package repo to claim ownership
Ed25519 publish signature+0/10
Included automatically when the publisher runs `forge publish`
Domain verification+0/5
Publisher: host /.well-known/forge.json on the package homepage with { "publisher": "<github-login>" }
CVE scan · clean+30/30
Static analysis · clean+20/20
npm provenance (Sigstore)+0/5
Publish from GitHub Actions with the --provenance flag
Paste into Claude Code, Cursor, or any AI assistant to fix all gaps
StatusCommunity-indexed
PublisherUnverified
SignatureUnsigned
Domain
Provenance
DependenciesNot audited
Tool surface
Security scan✓ Cleanv0.1.0 · 19d ago
EvalsNone
IndexedJun 13, 2026

Verification confirms publisher identity (repo ownership), not code safety. The security scan covers known CVEs and suspicious install scripts — it cannot prove the absence of malicious code.

About

An open, always-on leaderboard and CI gate for autonomous coding agents — every patch runs in a sandbox, every run has a public trace, every regression fails the build. ▶ Live leaderboard: forgejudge.ahmedhobeishy.tech · playground · methodology · model swap · MCP registry Current numbers (hidden-test = the agent never sees the failing test; $0 free tier; same harness, swap the model; 18 tasks × 3 seeds = 54 runs/model, 162 total): | Model | pass@1 | pass@3 | The score rises with the better…

Keywords
mcp