• Uncategorised
  • 0

How a word’s embeddings change as context varies

Let’s break down how a model like BERT or OpenAI produces different embeddings for the word “bank” in different contexts.


🎯 The Goal:

We want:

  • “bank” in “river bank” to mean land near water
  • “bank” in “loan bank” to mean financial institution

🧠 Step-by-Step: How Different Embeddings Are Calculated

1. Tokenization

Sentence is split into tokens (words or subwords).

Example:

textCopyEditSentence 1: "He sat by the river bank."
Sentence 2: "She went to the bank to deposit money."

Tokenized into:

textCopyEdit["He", "sat", "by", "the", "river", "bank", "."]
["She", "went", "to", "the", "bank", "to", "deposit", "money", "."]

2. Embedding Layer

Each token gets an initial vector (pre-trained lookup). At this stage:

  • “bank” is the same vector in both sentences ✅

BUT THEN…


3. Transformer Layers (Contextualization via Self-Attention)

Here’s where the magic happens.

Each word’s vector is passed through multiple transformer layers (like 12 or 24). These layers use a mechanism called self-attention that lets each token “look at” other tokens in the sentence.

So:

  • In Sentence 1, “bank” attends to “river”
  • In Sentence 2, “bank” attends to “deposit”, “money”, “went”

As a result:

  • The vector for “bank” becomes contextualized — i.e., it changes based on nearby words.

4. Output Vectors

After all transformer layers:

  • The vector for each token has been modified by context
  • So “bank” now has a different vector in each sentence

Example (simplified 3D vectors):

TokenFinal Embedding (in river sentence)Final Embedding (in loan sentence)
bank[0.4, 0.8, -0.2][0.9, -0.3, 0.1]

5. (Optional) Pooling for Sentence Embeddings

If you’re embedding the whole sentence (not individual words), a final mean-pooling or [CLS] token will give a single vector for the sentence — capturing its overall meaning.


📌 Summary:

StepWhat happens
TokenizationWords/subwords split
Embedding LookupInitial word vectors (same for “bank”)
Self-Attention“bank” learns from nearby words (“river” vs “money”)
Final Output“bank” gets a different vector in each sente

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *