Some Intuition on Attention and the Transformer

from blog Eugene Yan, | ↗ original
What's the big deal, intuition on query-key-value vectors, multiple heads, multiple layers, and more.