DEV Community

Deep Learning

This tag is for discussing, sharing articles, and asking questions primarily on deep learning - a subfield of machine learning.

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
Identifying Early Warning Signs of Attention Mechanism Instability

Identifying Early Warning Signs of Attention Mechanism Instability

1
Comments
5 min read
The Challenge of Unverifiable AI Rewards

The Challenge of Unverifiable AI Rewards

1
Comments
7 min read
Adversarial Attacks and Defenses in Deep Learning Systems: Threats, Mechanisms, and Countermeasures

Adversarial Attacks and Defenses in Deep Learning Systems: Threats, Mechanisms, and Countermeasures

1
Comments
6 min read
Standard Transformer Attention vs. Attention-Residuals: A Practical Comparison

Standard Transformer Attention vs. Attention-Residuals: A Practical Comparison

Comments
5 min read
Invited talk about: Adversarial Attacks and Defenses in Deep Learning Systems: Threats, Mechanisms, and Countermeasures

Invited talk about: Adversarial Attacks and Defenses in Deep Learning Systems: Threats, Mechanisms, and Countermeasures

Comments
1 min read
I Thought I Understood the Autonomous Vehicle Problem. Indian Roads Corrected Me.

I Thought I Understood the Autonomous Vehicle Problem. Indian Roads Corrected Me.

Comments
6 min read
The Proliferation of Specialized LLMs and the Retraining Dilemma

The Proliferation of Specialized LLMs and the Retraining Dilemma

Comments
5 min read
The Pervasive Role and Hidden Limitations of Softmax

The Pervasive Role and Hidden Limitations of Softmax

Comments
6 min read
Revisiting the Causal Mechanisms Behind Policy Gradients

Revisiting the Causal Mechanisms Behind Policy Gradients

Comments
5 min read
The Bottleneck of Dense Attention in Long Contexts

The Bottleneck of Dense Attention in Long Contexts

Comments
6 min read
The Intricate Dance of Self-Attention: What Can Go Wrong?

The Intricate Dance of Self-Attention: What Can Go Wrong?

Comments
5 min read
Mamba-3 and AttnRes: AI Architecture Research Is Finally Building for Inference, Not Just Training

Mamba-3 and AttnRes: AI Architecture Research Is Finally Building for Inference, Not Just Training

Comments
7 min read
5 architectures replacing brute-force AI scaling (and what they mean for your stack)

5 architectures replacing brute-force AI scaling (and what they mean for your stack)

Comments
3 min read
The Mathematics That Make 1.58-bit Weights Work: How BitNet b1.58 Survives Its Own Quantization

The Mathematics That Make 1.58-bit Weights Work: How BitNet b1.58 Survives Its Own Quantization

1
Comments
7 min read
Exploring an AST-Based Template Engine in PHP — Thoughts?

Exploring an AST-Based Template Engine in PHP — Thoughts?

1
Comments
1 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.