AI Transformer (Smalltalk)

from blog matt.sh, | ↗ original
system> You know the Transformer archietcture? Let’s implement a full Transformer architecture complete with a training loop in Smalltalk Squeak. Since this is for teaching, the code should have nice detailed comments about why every section of code matters and how it relates to the Transformer architecture including self-attention and multiple-layers and generating embeddings. assistant> Before we start with the implementation of the Transformer architecture, let’s...