How a word’s embeddings change as context varies
Let’s break down how a model like BERT or OpenAI produces different embeddings for the word “bank” in different contexts.
🎯 The Goal:
We want:
- “bank” in “river bank” to mean land near water
- “bank” in “loan bank” to mean financial institution
🧠 Step-by-Step: How Different Embeddings Are Calculated
1. Tokenization
Sentence is split into tokens (words or subwords).
Example:
textCopyEditSentence 1: "He sat by the river bank."
Sentence 2: "She went to the bank to deposit money."
Tokenized into:
textCopyEdit["He", "sat", "by", "the", "river", "bank", "."]
["She", "went", "to", "the", "bank", "to", "deposit", "money", "."]
2. Embedding Layer
Each token gets an initial vector (pre-trained lookup). At this stage:
- “bank” is the same vector in both sentences ✅
BUT THEN…
3. Transformer Layers (Contextualization via Self-Attention)
Here’s where the magic happens.
Each word’s vector is passed through multiple transformer layers (like 12 or 24). These layers use a mechanism called self-attention that lets each token “look at” other tokens in the sentence.
So:
- In Sentence 1, “bank” attends to “river”
- In Sentence 2, “bank” attends to “deposit”, “money”, “went”
As a result:
- The vector for “bank” becomes contextualized — i.e., it changes based on nearby words.
4. Output Vectors
After all transformer layers:
- The vector for each token has been modified by context
- So “bank” now has a different vector in each sentence
Example (simplified 3D vectors):
Token | Final Embedding (in river sentence) | Final Embedding (in loan sentence) |
---|---|---|
bank | [0.4, 0.8, -0.2] | [0.9, -0.3, 0.1] |
5. (Optional) Pooling for Sentence Embeddings
If you’re embedding the whole sentence (not individual words), a final mean-pooling or [CLS] token will give a single vector for the sentence — capturing its overall meaning.
📌 Summary:
Step | What happens |
---|---|
Tokenization | Words/subwords split |
Embedding Lookup | Initial word vectors (same for “bank”) |
Self-Attention | “bank” learns from nearby words (“river” vs “money”) |
Final Output | “bank” gets a different vector in each sente |