DEV Community

# benchmarks

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Turn the camera away, and the AI's world freezes

Turn the camera away, and the AI's world freezes

Comments
3 min read
Reliable, and still wrong

Reliable, and still wrong

Comments
3 min read
Put AI agents in charge of a Civilization game and they reach for the nukes

Put AI agents in charge of a Civilization game and they reach for the nukes

Comments
3 min read
We Benchmarked BrassCoders Against a Frontier Model

We Benchmarked BrassCoders Against a Frontier Model

Comments
5 min read
Claude Fable 5 Scores 95% on SWE-bench, Then Hands Off to Opus 4.8

Claude Fable 5 Scores 95% on SWE-bench, Then Hands Off to Opus 4.8

Comments
3 min read
An LLM benchmark is only useful for as long as it's hard

An LLM benchmark is only useful for as long as it's hard

2
Comments
10 min read
An AMD GPU Beat My Mac on Llama 8B. The Same GPU Lost on Phi-3.

An AMD GPU Beat My Mac on Llama 8B. The Same GPU Lost on Phi-3.

Comments
5 min read
NLTK vs Compiled Regex: Tokenizing 100 MB of Text in .NET

NLTK vs Compiled Regex: Tokenizing 100 MB of Text in .NET

Comments
3 min read
Why AI Benchmarks Fail Real Hermes Agent Workflows

Why AI Benchmarks Fail Real Hermes Agent Workflows

Comments
10 min read
pypdf vs PdfPig: Text Extraction at Scale

pypdf vs PdfPig: Text Extraction at Scale

Comments
2 min read
NetworkX vs CSR + TensorPrimitives: PageRank on 28M Edges

NetworkX vs CSR + TensorPrimitives: PageRank on 28M Edges

Comments
3 min read
Cross-Machine Memory Query: About 20 Milliseconds, Most Days

Cross-Machine Memory Query: About 20 Milliseconds, Most Days

1
Comments
9 min read
Single-Prompt Safety Scores Are Measuring the Wrong Thing

Single-Prompt Safety Scores Are Measuring the Wrong Thing

Comments
3 min read
textdistance vs ArrayPool: Edit Distance Without the Allocations

textdistance vs ArrayPool: Edit Distance Without the Allocations

1
Comments 1
3 min read
Four Chinese Labs Rewrote the Open-Weights Leaderboard in 18 Days

Four Chinese Labs Rewrote the Open-Weights Leaderboard in 18 Days

Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.