DEV Community

# reliability

General discussions on building and maintaining reliable software systems.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
SQEval v1.16.0: Circuit-Breaker AI Failover & Real-Time Token Dashboard — Backed by 500k Benchmark Iterations

SQEval v1.16.0: Circuit-Breaker AI Failover & Real-Time Token Dashboard — Backed by 500k Benchmark Iterations

Comments
10 min read
Critical Flaws in Long-Term Memory Benchmarks: Addressing Unreliable and Uninterpretable Results

Critical Flaws in Long-Term Memory Benchmarks: Addressing Unreliable and Uninterpretable Results

Comments
15 min read
When Your Agent Slowly Eats All the Memory

When Your Agent Slowly Eats All the Memory

Comments
2 min read
The Watchdog That Bit Itself: When Health Checks Create the Failures They Detect

The Watchdog That Bit Itself: When Health Checks Create the Failures They Detect

1
Comments
2 min read
When Your Sub-Agent Finishes But Nobody Hears It

When Your Sub-Agent Finishes But Nobody Hears It

2
Comments
4 min read
Node.js Circuit Breaker Pattern in Production: Prevent Cascading Failures with Opossum

Node.js Circuit Breaker Pattern in Production: Prevent Cascading Failures with Opossum

Comments
8 min read
SRE Explained: Because 'It Works on My Machine' is Not an SLO 🎯

SRE Explained: Because 'It Works on My Machine' is Not an SLO 🎯

3
Comments
9 min read
How to Build a Self-Healing AI Agent System That Recovers From Failures Automatically

How to Build a Self-Healing AI Agent System That Recovers From Failures Automatically

Comments
2 min read
Downsizing Without Downtime: An SRE's Guide to Safe Cost Optimization

Downsizing Without Downtime: An SRE's Guide to Safe Cost Optimization

Comments
13 min read
The Pre-Flight Checklist: 9 Things to Analyze Before Cutting Any AWS Cost

The Pre-Flight Checklist: 9 Things to Analyze Before Cutting Any AWS Cost

Comments
14 min read
Retry Contract in Distributed Systems

Retry Contract in Distributed Systems

Comments
3 min read
The Blind Spot Problem: When Your Agent Reports Success But Processes Nothing

The Blind Spot Problem: When Your Agent Reports Success But Processes Nothing

Comments
2 min read
When Discord Takes Down Your Entire Agent Fleet

When Discord Takes Down Your Entire Agent Fleet

1
Comments
2 min read
Graceful Degradation Patterns: Keep Your Backend Running When Dependencies Fail (2026)

Graceful Degradation Patterns: Keep Your Backend Running When Dependencies Fail (2026)

Comments
11 min read
Retry Patterns That Work: Exponential Backoff, Jitter, and Dead Letter Queues (2026)

Retry Patterns That Work: Exponential Backoff, Jitter, and Dead Letter Queues (2026)

Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.