Benchmark explorer

Benchmarks

Select one benchmark, then compare base model families within that benchmark only. Scores are source-linked evidence rows, not a universal leaderboard.

150 Benchmarks
1,304 Model routes
12,935 Route rows

This page intentionally avoids cross-benchmark ranking. Pick a benchmark first; the chart and table below only compare rows with that same benchmark label. Rows may come from official model cards, launch posts, papers, or benchmark operators.

SWE-bench VerifiedMMLUGPQA DiamondAider PolyglotMistral 7B comparison tableHumanEvalArtificial Analysis Coding IndexArtificial Analysis Intelligence Index

Benchmark results

Results are grouped by the selected benchmark.

Loading...

Family Score Metric Category Scope Routes Source
Loading benchmark rows...