AI Transformer (LISP)

from blog matt.sh, | ↗ original
Attempt 1 system> You know the Transformer archietcture? Let’s implement a full Transformer architecture complete with a training loop in Common LISP! Since this is for teaching, the code should have nice detailed comments about why every section of code matters and how it relates to the Transformer architecture including self-attention and multiple-layers and generating embeddings. assistant> First, let’s start by defining the necessary...