Live Leaderboard
Compare AI Models
Benchmarks updated daily. Click any model to explore its full profile, or select multiple to run a side-by-side comparison.
| Rank | Model | Overall ↕ | Reasoning | Coding | Language | Speed (tok/s) | Cost / 1M tok | Context |
|---|
Side-by-Side Comparison
Model Spotlight
Top picks for common use cases
Best for Coding
GPT-Omega Pro
Dominates HumanEval with a 96.4 score. Exceptional at multi-file refactoring and test generation.
HumanEval
96.4
Best for Reasoning
Claude-X Sonnet
Leads on multi-step logical reasoning and complex instruction following. Preferred for agentic workflows.
MMLU
93.1
Best Value
Llama-4 Turbo
Open-source powerhouse. Near-frontier performance at a fraction of the cost. Ideal for high-volume workloads.
Value Score
91.0