Mar Java Mit Java Blog

How are embeddings multiplied with weight matrices

Let’s walk through exactly how WQW_QWQ​ (the query weight matrix) is multiplied with the input embedding, using a small example with embedding size = 5. 🧠 Context: What is WQW_QWQ​? In a Transformer, we...

How does attention changes context and embeddings

🟦 1. What is a word embedding? Let’s say a model reads this sentence: “The cat sat on the mat.“ Each word like “cat”, “mat”, “sat” is turned into a vector — imagine this...

How LLMs Work: A Deep Dive into Text Prediction

Large Language Models (LLMs) like OpenAI’s GPT, Google’s Gemini, or Meta’s LLaMA have become incredibly powerful at understanding and generating human-like text. But how do they actually work under the hood? What happens when...

How to Build an MCP Client Using the Java SDK

The Model Context Protocol (MCP) allows language models to interface with tools and external systems in a structured, programmable way. If you’re building intelligent apps or agents that need dynamic tool invocation, MCP gives...

Compressing Vector DB data

Question: If a system has 100 million user-product interactions, and if each order is mapped toa vector of 768 dimensions, how much space will the system require to store all vectors?Answer: 100 million interactions...