
Michael Nuñez / VentureBeat:
Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories: GPT-5.5 leads at 70%, GPT-5.4 got 56%, and Opus 4.7 got 54% — For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same.

Michael Nuñez / VentureBeat:
Datacurve releases the DeepSWE coding benchmark, a 113-task test across 91 open-source repositories: GPT-5.5 leads at 70%, GPT-5.4 got 56%, and Opus 4.7 got 54% — For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same.
Source: TechMeme
Source Link: https://www.techmeme.com/260527/p13#a260527p13