• Uncategorised

How are embeddings multiplied with weight matrices

Let’s walk through exactly how WQW_QWQ​ (the query weight matrix) is multiplied with the input embedding, using a small example with embedding size = 5.


🧠 Context: What is WQW_QWQ​?

In a Transformer, we compute the Query (Q) vector for each token like this: Q=X⋅WQQ = X \cdot W_QQ=X⋅WQ​

Where:

  • XXX is the input embedding matrix (token representations)
  • WQW_QWQ​ is a learnable weight matrix that projects from embedding size to query size (often the same size)


🔸 Token 1:

Q[0] = [1, 0, 0, 1, 1] • W_Q
= [1×0.1 + 0 + 0 + 1×0.1 + 1×0.2,
1×0.2 + 0 + 0 + 1×0.1 + 1×0.2,
1×0.3 + 0 + 0 + 1×0.1 + 1×0.2,
1×0.4 + 0 + 0 + 1×0.1 + 1×0.2]
= [0.4, 0.5, 0.6, 0.7]

🔸 Token 2:

Q[1] = [0, 1, 1, 0, 0] • W_Q
= [0+0+0.6+0+0, 0+0.5+0+0+0, 0+0+0.6+0+0, 0+0.5+0+0+0]
= [0.6, 0.5, 0.6, 0.5]

🔸 Token 3:

[2] = [1, 1, 0, 0, 1] • W_Q
= [1×0.1 + 1×0 + 0 + 0 + 1×0.2,
1×0.2 + 1×0.5 + 0 + 0 + 1×0.2,
1×0.3 + 1×0 + 0 + 0 + 1×0.2,
1×0.4 + 1×0.5 + 0 + 0 + 1×0.2]
= [0.3, 0.9, 0.5, 1.1]


Summary

  • Input embedding XXX shape: seq_len×embed_size\text{seq\_len} \times \text{embed\_size}seq_len×embed_size
  • Weight WQW_QWQ​: embed_size×head_size\text{embed\_size} \times \text{head\_size}embed_size×head_size

You may also like...