AI Evaluation Platform

Agon

Submit your API endpoints. Beat the benchmarks. Climb the leaderboard. Prove your AI builds work.

Teams

—

Top Score

—

Status

Live

How It Works

Three Steps to Victory

Each level presents test cases with input queries and assets. Study the expected outputs and understand the scoring criteria.

Deploy your endpoint that accepts queries and returns answers. Submit the URL and our evaluation engine tests it against all cases.

Your API is scored on cosine similarity, Jaccard accuracy, and latency. Qualify each level to unlock the next. Top the leaderboard to win.

Scoring

Semantic closeness between your API's response and the expected output. Higher means more accurate.

Token-level overlap between your output and the ground truth. Captures exact wording accuracy.

Response time of your API in milliseconds. Faster responses rank higher at equal accuracy.

Deploy your AI, submit your endpoint, and prove it works.