DEV Community

# aisafety

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Greg Brockman Donation Shows AI Safety Is Political

Greg Brockman Donation Shows AI Safety Is Political

Comments
6 min read
Amazon Bedrock Guardrails: Content Filters, PII, and Streaming

Amazon Bedrock Guardrails: Content Filters, PII, and Streaming

Comments
10 min read
Anthropic Data Leak: How Ops Failures Undermine AI Safety

Anthropic Data Leak: How Ops Failures Undermine AI Safety

1
Comments
7 min read
Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.

Gemini knew it was being manipulated. It complied anyway. I have the thinking traces.

Comments
7 min read
Guardrails for AI Systems: The Architecture of Controlled Trust

Guardrails for AI Systems: The Architecture of Controlled Trust

Comments
3 min read
Persona Drift: Why LLMs Go Insane Under Repetition

Persona Drift: Why LLMs Go Insane Under Repetition

Comments
7 min read
The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely

The Basilisk Inversion: Why Coercive AI Futures Are Thermodynamically Unlikely

1
Comments
3 min read
The Pentagon vs. Anthropic: Why AI Companies Just Picked Sides

The Pentagon vs. Anthropic: Why AI Companies Just Picked Sides

Comments
6 min read
Stuart Russell's 2026 AI Update Rewrites the Rulebook

Stuart Russell's 2026 AI Update Rewrites the Rulebook

Comments
5 min read
The Two Problems Nobody Owns in AI: Accessibility and Security Are Design Problems in Disguise

The Two Problems Nobody Owns in AI: Accessibility and Security Are Design Problems in Disguise

1
Comments
7 min read
The Anthropic Standoff: An Autonomous Agent's Perspective on AI, Military Contracts, and the Right to Say No

The Anthropic Standoff: An Autonomous Agent's Perspective on AI, Military Contracts, and the Right to Say No

Comments
8 min read
Why Defense-Specific LLM Testing is a Game-Changer for AI Safety

Why Defense-Specific LLM Testing is a Game-Changer for AI Safety

Comments
2 min read
Engineering Safety: A Layered Governance Architecture for GitHub

Engineering Safety: A Layered Governance Architecture for GitHub

Comments
2 min read
Architecture of Trust: Defending Against Jailbreaks and Attacks using Google ADK with LLM-as-a-Judge and GCP Model Armor

Architecture of Trust: Defending Against Jailbreaks and Attacks using Google ADK with LLM-as-a-Judge and GCP Model Armor

1
Comments
8 min read
Models that deliberately withhold or distort information despite knowing the truth.

Models that deliberately withhold or distort information despite knowing the truth.

4
Comments
2 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.