Time horizon of software tasks
that various AI models can solve in 50% of cases
Task duration for humans at which a logistic regression predicts a 50% or 80% success probability
50% success
80% success
Estimated ⚠
Source: METR – Time Horizon 1.1 · ⚠ GPT-5.4 and Claude Sonnet 4.6: no official METR value