DEV Community

# benchmark

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
We ran 6.2 billion COBOL validation passes. Zero errors. Here's what we learned.

We ran 6.2 billion COBOL validation passes. Zero errors. Here's what we learned.

Comments 1
2 min read
ARC-AGI V3 Explained: The New AI Benchmark That Breaks Every Agent

ARC-AGI V3 Explained: The New AI Benchmark That Breaks Every Agent

Comments
3 min read
GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

GPT-5.1 scored 26%. Gemini 3 Flash scored 74%. Same prompt, same tools.

Comments
8 min read
AI Gateways Are Not I/O-Bound Proxies I Benchmarked 5 of Them to Prove It

AI Gateways Are Not I/O-Bound Proxies I Benchmarked 5 of Them to Prove It

2
Comments
9 min read
I Tried Speculative Decoding on RTX 4060 8GB — Every Config Was Slower Than Baseline

I Tried Speculative Decoding on RTX 4060 8GB — Every Config Was Slower Than Baseline

1
Comments
8 min read
FTS vs Hybrid Memory Search: A Real-World Benchmark

FTS vs Hybrid Memory Search: A Real-World Benchmark

1
Comments
4 min read
AI Research Monthly: Feb-Mar 2026 — 25 Findings With Hard Data (Full Pipeline Edition)

AI Research Monthly: Feb-Mar 2026 — 25 Findings With Hard Data (Full Pipeline Edition)

Comments
43 min read
New Benchmark for Open-Source Agents: What is Claw-Eval? How Step 3.5 Flash Secured the #2 Spot

New Benchmark for Open-Source Agents: What is Claw-Eval? How Step 3.5 Flash Secured the #2 Spot

2
Comments
5 min read
I Built an Auto-Updating Archive of Every AI Arena Leaderboard

I Built an Auto-Updating Archive of Every AI Arena Leaderboard

1
Comments
2 min read
DGX Spark Inference Performance: Local LLM vs Cloud Benchmarks (2026)

DGX Spark Inference Performance: Local LLM vs Cloud Benchmarks (2026)

Comments
5 min read
Running Qwen2.5-32B on RTX 4060 8GB — Beating M4 at 10.8 t/s with llama.cpp

Running Qwen2.5-32B on RTX 4060 8GB — Beating M4 at 10.8 t/s with llama.cpp

1
Comments
7 min read
Benchmarking the Model Is the Wrong Abstraction

Benchmarking the Model Is the Wrong Abstraction

Comments
4 min read
2.78 TFLOPS on a Fanless MacBook Air? Benchmarking Apple's M4 with MLX

2.78 TFLOPS on a Fanless MacBook Air? Benchmarking Apple's M4 with MLX

Comments
4 min read
Zillow Scraping in 2026: Anti-Bot Defenses, API Alternatives, and Benchmark Results

Zillow Scraping in 2026: Anti-Bot Defenses, API Alternatives, and Benchmark Results

Comments
10 min read
Google Maps Scraping API Benchmark 2026: Which Tool Extracts Business Data Fastest?

Google Maps Scraping API Benchmark 2026: Which Tool Extracts Business Data Fastest?

Comments
7 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.