Skill
advanced-evaluation
This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise comparison, position bias, evaluation pipelines, or automated quality assessment.
Claim this listing
Connect your GitHub to prove you own or maintain this listing. We verify repo access automatically — most publishers are confirmed in seconds.
1Connect GitHub
2Submit your claim
3Auto-verified, or reviewed within 48h