Benchmark explorer

Benchmarks

Select one benchmark, then compare base model families within that benchmark only. Scores are source-linked evidence rows, not a universal leaderboard.

150 Benchmarks

1,304 Model routes

12,935 Route rows

This page intentionally avoids cross-benchmark ranking. Pick a benchmark first; the chart and table below only compare rows with that same benchmark label. Rows may come from official model cards, launch posts, papers, or benchmark operators.

SWE-bench Verified MMLU GPQA Diamond Aider Polyglot Mistral 7B comparison table HumanEval Artificial Analysis Coding Index Artificial Analysis Intelligence Index

Benchmark results

Results are grouped by the selected benchmark.

Family	Score	Metric	Category	Scope	Routes	Source
Loading benchmark rows...