DEV Community

# transformers

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
MoE Architectures Keep Solving the Wrong Problem

MoE Architectures Keep Solving the Wrong Problem

Comments
3 min read
Chapter 12: Inference - Generating New Text

Chapter 12: Inference - Generating New Text

Comments
9 min read
Chapter 11: The Full GPT - Assembling the Model

Chapter 11: The Full GPT - Assembling the Model

Comments
10 min read
Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Chapter 9: Single-Head Attention - Tokens Looking at Each Other

Comments
9 min read
Chapter 8: RMS Normalisation and Residual Connections

Chapter 8: RMS Normalisation and Residual Connections

Comments
4 min read
Beating Eager TurboQuant Was Not Enough: Why Dense GPU Attention Still Won

Beating Eager TurboQuant Was Not Enough: Why Dense GPU Attention Still Won

Comments
8 min read
Chapter 7: The Training Loop and Adam Optimiser

Chapter 7: The Training Loop and Adam Optimiser

Comments
7 min read
Chapter 6: Embeddings, the Forward Pass, and the Loss Function

Chapter 6: Embeddings, the Forward Pass, and the Loss Function

Comments
7 min read
Mamba vs. Transformers: Architecture Comparison

Mamba vs. Transformers: Architecture Comparison

1
Comments
5 min read
Without google's transformers, there is no GPT-ishs

Without google's transformers, there is no GPT-ishs

Comments
6 min read
Chapter 5: Linear Transformation and Softmax

Chapter 5: Linear Transformation and Softmax

Comments
4 min read
Chapter 4: The Bigram Model - Simplest Possible Language Model

Chapter 4: The Bigram Model - Simplest Possible Language Model

Comments
5 min read
Chapter 3: The Tokenizer - Text to Numbers and Back

Chapter 3: The Tokenizer - Text to Numbers and Back

Comments
2 min read
Chapter 2: Backward - Automatic Gradient Computation

Chapter 2: Backward - Automatic Gradient Computation

Comments
7 min read
Chapter 1: The Value Class - Recording the Forward Pass

Chapter 1: The Value Class - Recording the Forward Pass

Comments
10 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.