Skip to content
Navigation menu
Search
Powered by Algolia
Search
Log in
Create account
DEV Community
Close
#
benchmarks
Follow
Hide
Posts
Left menu
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
Right menu
The cheapest and fastest way to generate an image
Konstantin Komelin
Konstantin Komelin
Konstantin Komelin
Follow
May 17
The cheapest and fastest way to generate an image
#
ai
#
benchmarks
#
nanobanana
#
vercel
Comments
Add Comment
1 min read
Beyond Scores: A Critical Review of Benchmark Reports for Evaluating Large Language Models
Ismail zamareh
Ismail zamareh
Ismail zamareh
Follow
May 16
Beyond Scores: A Critical Review of Benchmark Reports for Evaluating Large Language Models
#
llmevaluation
#
benchmarks
#
machinelearning
#
productiondeployment
Comments
Add Comment
7 min read
What you measure depends on where you draw the boundary
Arkadiusz Przychocki
Arkadiusz Przychocki
Arkadiusz Przychocki
Follow
May 14
What you measure depends on where you draw the boundary
#
java
#
performance
#
benchmarks
#
saga
2
 reactions
Comments
1
 comment
9 min read
I benchmarked 10 LLMs on slopsquatting — up to 87% installed fake packages
Vincenzo Rubino
Vincenzo Rubino
Vincenzo Rubino
Follow
Apr 24
I benchmarked 10 LLMs on slopsquatting — up to 87% installed fake packages
#
ai
#
security
#
webdev
#
benchmarks
1
 reaction
Comments
Add Comment
9 min read
DeepSeek V4 Released: Open-Source 1.6T MoE, 1M Context, Apache 2.0 — and It's Already on the API
Owen
Owen
Owen
Follow
Apr 24
DeepSeek V4 Released: Open-Source 1.6T MoE, 1M Context, Apache 2.0 — and It's Already on the API
#
ai
#
deepseek
#
opensource
#
benchmarks
Comments
Add Comment
6 min read
GPT-5.5 Released: First Fully Retrained Base Model Since GPT-4.5, 1M Context, $5/$30 Pricing
Owen
Owen
Owen
Follow
Apr 24
GPT-5.5 Released: First Fully Retrained Base Model Since GPT-4.5, 1M Context, $5/$30 Pricing
#
ai
#
openai
#
gpt
#
benchmarks
Comments
Add Comment
6 min read
GPT-5.5 Is Out — What the Numbers Actually Say
김이더
김이더
김이더
Follow
Apr 24
GPT-5.5 Is Out — What the Numbers Actually Say
#
ai
#
openai
#
gpt
#
benchmarks
Comments
Add Comment
4 min read
How to Choose the Right AI Model for the Right Job
Shafiq Ur Rehman
Shafiq Ur Rehman
Shafiq Ur Rehman
Follow
Apr 21
How to Choose the Right AI Model for the Right Job
#
ai
#
benchmarks
#
modelselection
Comments
Add Comment
13 min read
How I took LongMemEval oracle from 62% to 82.8% without touching the retriever
t49qnsx7qt-kpanks
t49qnsx7qt-kpanks
t49qnsx7qt-kpanks
Follow
Apr 21
How I took LongMemEval oracle from 62% to 82.8% without touching the retriever
#
ai
#
llm
#
benchmarks
#
memory
Comments
Add Comment
3 min read
What ground truth caught that unit tests missed: 3 real bugs in 9 flagship lint rules
Ofri Peretz
Ofri Peretz
Ofri Peretz
Follow
May 14
What ground truth caught that unit tests missed: 3 real bugs in 9 flagship lint rules
#
staticanalysis
#
eslint
#
testing
#
benchmarks
Comments
Add Comment
7 min read
What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions
EClawbot Official
EClawbot Official
EClawbot Official
Follow
Apr 15
What Is Agent Evaluation? How EClaw Arena Benchmarks AI Agents Across 12 Dimensions
#
ai
#
agents
#
benchmarks
#
evaluation
Comments
Add Comment
3 min read
Sonnet 4.6 vs Haiku 4.5 vs Opus 4.6: I Tested 3 Claude Models on 10 Real Tasks
James AI
James AI
James AI
Follow
Apr 15
Sonnet 4.6 vs Haiku 4.5 vs Opus 4.6: I Tested 3 Claude Models on 10 Real Tasks
#
ai
#
llm
#
claude
#
benchmarks
Comments
Add Comment
3 min read
Why Merged LoRA Barely Changes Inference Time
Natnael Alemseged
Natnael Alemseged
Natnael Alemseged
Follow
May 5
Why Merged LoRA Barely Changes Inference Time
#
machinelearning
#
llm
#
benchmarks
#
ai
1
 reaction
Comments
Add Comment
6 min read
The YC President Endorsed an AI Memory System With Fake Benchmarks. He Also Shipped His Own. We Read the Code.
Penfield
Penfield
Penfield
Follow
Apr 11
The YC President Endorsed an AI Memory System With Fake Benchmarks. He Also Shipped His Own. We Read the Code.
#
ai
#
aimemory
#
benchmarks
#
yc
Comments
Add Comment
3 min read
Proposal: A Real Benchmark for Long-Term AI Memory Systems
Penfield
Penfield
Penfield
Follow
Apr 10
Proposal: A Real Benchmark for Long-Term AI Memory Systems
#
ai
#
aimemory
#
benchmarks
Comments
Add Comment
3 min read
đź‘‹
Sign in
for the ability to sort posts by
relevant
,
latest
, or
top
.
We're a place where coders share, stay up-to-date and grow their careers.
Log in
Create account