Creator: Francois Fleuret (original)
machine learningattention mechanismattention is all you needtransformercetztikz
Flow diagram of single-head attention illustrating the equation Attention(Q,K,V)=softmaxrow(QK⊤d)V\displaystyle \mathrm{Attention}(Q, K, V) = \mathrm{softmax}_\text{row} \left( \frac{Q K^\top}{\sqrt{d}} \right) VAttention(Q,K,V)=softmaxrow(dQK⊤)V with border colors to indicate tensor dimensions.