« home

Single-head attention

Creator: Francois Fleuret (original)

Flow diagram of single-head attention illustrating the equation $\displaystyle \mathrm{Attention}(Q, K, V) = \mathrm{softmax}_\text{row} \left( \frac{Q K^\top}{\sqrt{d}} \right) V$ with border colors to indicate tensor dimensions.


Single-head attention

  Download

PNGPDFSVG

  Code

  single-head-attention.typ (167 lines)

  single-head-attention.tex (75 lines)