Live Leaderboard

Compare AI Models

Benchmarks updated daily. Click any model to explore its full profile, or select multiple to run a side-by-side comparison.

	Rank	Model	Overall ↕	Reasoning	Coding	Language	Speed (tok/s)	Cost / 1M tok	Context

Side-by-Side Comparison

Model Spotlight

Best for Coding

Dominates HumanEval with a 96.4 score. Exceptional at multi-file refactoring and test generation.

HumanEval

96.4

Best for Reasoning

Leads on multi-step logical reasoning and complex instruction following. Preferred for agentic workflows.

MMLU

93.1

Best Value

Open-source powerhouse. Near-frontier performance at a fraction of the cost. Ideal for high-volume workloads.

Value Score

91.0