Creator: Francois Fleuret (original)
Flow diagram of single-head attention illustrating the equation $\displaystyle \mathrm{Attention}(Q, K, V) = \mathrm{softmax}_\text{row} \left( \frac{Q K^\top}{\sqrt{d}} \right) V$ with border colors to indicate tensor dimensions.