Post
540
October version of Claude 3.5 lifts SOTA (set by its June version) by 7 points.
onekq-ai/WebApp1K-models-leaderboard
Closed sourced models are widening the gap again.
Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.
onekq-ai/WebApp1K-models-leaderboard
Closed sourced models are widening the gap again.
Note: Our frontier leaderboard now uses double test scenarios because the single-scenario test suit has been saturated.