Community-maintained evaluators for agentevals -- the agent evaluation framework built on Google ADK.
Evaluators are standalone scoring programs that evaluate agent traces. They read EvalInput JSON from stdin and write EvalResult JSON to stdout. This repository is the official index of community-contributed evaluators.
agentevals evaluator list --source githubAdd a type: remote entry to your eval_config.yaml:
metrics:
- tool_trajectory_avg_score
- name: response_quality
type: remote
source: github
ref: evaluators/response_quality/response_quality.py
threshold: 0.7
config:
min_response_length: 20
- name: tool_coverage
type: remote
source: github
ref: evaluators/tool_coverage/tool_coverage.py
threshold: 1.0
config:
min_tool_calls: 1Then run as usual:
agentevals run traces/my_trace.json \
--config eval_config.yaml \
--eval-set eval_set.jsonThe evaluator is downloaded automatically and cached in ~/.cache/agentevals/evaluators/.
pip install agentevals
agentevals evaluator init my_evaluatorThis creates a directory ready to be added to this repo:
my_evaluator/
├── my_evaluator.py # your scoring logic
└── evaluator.yaml # metadata manifest
Edit my_evaluator.py. Your function receives an EvalInput with the agent's invocations and returns an EvalResult with a score between 0.0 and 1.0.
from agentevals_grader_sdk import grader, EvalInput, EvalResult
@grader
def my_evaluator(input: EvalInput) -> EvalResult:
scores = []
for inv in input.invocations:
# Your scoring logic here
scores.append(1.0)
return EvalResult(
score=sum(scores) / len(scores) if scores else 0.0,
per_invocation_scores=scores,
)
if __name__ == "__main__":
my_evaluator.run()Install the SDK standalone with pip install agentevals-grader-sdk (no heavy dependencies).
Edit evaluator.yaml with a description, tags, and your name:
name: my_evaluator
description: What this evaluator checks
language: python
entrypoint: my_evaluator.py
tags: [quality, tools]
author: your-github-usernameRun the validation script to catch issues before submitting:
pip install pyyaml agentevals-evaluator-sdk
python scripts/validate_evaluator.py evaluators/my_evaluatorThis checks:
- Manifest schema -- required fields, entrypoint exists, name matches directory
- Syntax and imports -- compiles cleanly, uses
@evaluatordecorator - Smoke run -- runs the evaluator with synthetic input and validates the
EvalResultoutput (correct types forscore,details,status, etc.)
You can also test with a full eval run:
metrics:
- name: my_evaluator
type: code
path: ./evaluators/my_evaluator/my_evaluator.py
threshold: 0.5agentevals run traces/sample.json --config eval_config.yaml --eval-set eval_set.json- Fork this repository
- Copy your evaluator directory into
evaluators/:
evaluators/
├── my_evaluator/
│ ├── evaluator.yaml
│ └── my_evaluator.py
├── response_quality/
│ └── ...
└── tool_coverage/
└── ...
- Open a PR against
main
CI will automatically validate your evaluator (manifest, syntax, and smoke run). Once merged, a separate workflow regenerates index.yaml, and your evaluator becomes available to everyone via agentevals evaluator list.
Evaluators can be written in any language that reads JSON from stdin and writes JSON to stdout.
| Language | Extension | SDK available |
|---|---|---|
| Python | .py |
pip install agentevals-grader-sdk |
| JavaScript | .js |
No SDK yet -- just read stdin, write stdout |
| TypeScript | .ts |
No SDK yet -- just read stdin, write stdout |
See the custom evaluators documentation for the full protocol reference.