• Uncategorised
  • 0

Understanding Embeddings: Converting Text into Searchable Vectors

Introduction

Embeddings are a way to represent text (or other data) as dense numerical vectors that capture semantic meaning. These vectors allow us to perform tasks like similarity searches, clustering, and machine learning efficiently. In this guide, we’ll cover:

  • How to generate embeddings using OpenAI
  • Storing embeddings in Redis
  • Retrieving and using them for similarity searches
  • Java code examples for end-to-end implementation

Generating Embeddings with OpenAI

Before storing text as vectors, we need to generate embeddings. OpenAI provides an API to convert text into embeddings:

Java Code for Fetching Embeddings

import java.net.HttpURLConnection;
import java.net.URL;
import java.io.OutputStream;
import java.io.InputStreamReader;
import java.io.BufferedReader;
import org.json.JSONObject;

public class OpenAIEmbeddingFetcher {
    private static final String API_URL = "https://api.openai.com/v1/embeddings";
    private static final String API_KEY = "your_api_key_here";

    public static String getEmbedding(String text) throws Exception {
        URL url = new URL(API_URL);
        HttpURLConnection conn = (HttpURLConnection) url.openConnection();
        conn.setRequestMethod("POST");
        conn.setRequestProperty("Content-Type", "application/json");
        conn.setRequestProperty("Authorization", "Bearer " + API_KEY);
        conn.setDoOutput(true);

        String jsonInput = new JSONObject()
            .put("model", "text-embedding-ada-002")
            .put("input", text)
            .toString();

        try (OutputStream os = conn.getOutputStream()) {
            byte[] input = jsonInput.getBytes("utf-8");
            os.write(input, 0, input.length);
        }

        StringBuilder response = new StringBuilder();
        try (BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream(), "utf-8"))) {
            String responseLine;
            while ((responseLine = br.readLine()) != null) {
                response.append(responseLine.trim());
            }
        }
        return response.toString();
    }
}

Storing Embeddings in Redis

Redis, with its RediSearch module, allows vector storage and similarity searches. We store embeddings using the HSET command.

Java Code for Storing Vectors in Redis

import redis.clients.jedis.Jedis;
import org.json.JSONArray;
import org.json.JSONObject;

public class RedisVectorStorage {
    private static final String REDIS_HOST = "localhost";
    private static final int REDIS_PORT = 6379;

    public static void storeVector(String key, String text, String embeddingResponse) {
        try (Jedis jedis = new Jedis(REDIS_HOST, REDIS_PORT)) {
            JSONObject responseJson = new JSONObject(embeddingResponse);
            JSONArray embeddingArray = responseJson.getJSONArray("data").getJSONObject(0).getJSONArray("embedding");
            
            jedis.hset(key, "text", text);
            jedis.hset(key, "vector", embeddingArray.toString());
        }
    }
}

Retrieving Similar Documents

To find similar embeddings, we use FT.SEARCH in Redis (if RediSearch is enabled).

Java Code for Searching Similar Embeddings in Redis

import redis.clients.jedis.Jedis;
import org.json.JSONArray;

public class RedisVectorSearch {
    private static final String REDIS_HOST = "localhost";
    private static final int REDIS_PORT = 6379;

    public static void findSimilar(String vector) {
        try (Jedis jedis = new Jedis(REDIS_HOST, REDIS_PORT)) {
            String query = "@vector:[" + vector + " 10]";
            System.out.println(jedis.ftSearch("vectorIndex", query));
        }
    }
}

Generating a Response with OpenAI

Now that we have the most relevant documents, we pass them to OpenAI to generate a response.


import java.util.List;
import redis.clients.jedis.search.Document;

public class OpenAIChat {
    private static final String CHAT_ENDPOINT = "https://api.openai.com/v1/chat/completions";
    private static final String API_KEY = "your-openai-api-key";

    public static String getChatResponse(String query, List<Document> retrievedDocs) throws Exception {
        HttpClient client = HttpClient.newHttpClient();

        StringBuilder contextBuilder = new StringBuilder("Context:\n");
        for (Document doc : retrievedDocs) {
            contextBuilder.append(doc.getString("text")).append("\n");
        }

        JSONObject request = new JSONObject();
        request.put("model", "gpt-4-turbo");
        request.put("messages", new JSONArray()
                .put(new JSONObject().put("role", "system").put("content", "You are an expert AI assistant."))
                .put(new JSONObject().put("role", "user").put("content", contextBuilder.toString() + " Query: " + query)));

        HttpRequest httpRequest = HttpRequest.newBuilder()
                .uri(new URI(CHAT_ENDPOINT))
                .header("Authorization", "Bearer " + API_KEY)
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(request.toString()))
                .build();

        HttpResponse<String> response = client.send(httpRequest, HttpResponse.BodyHandlers.ofString());
        JSONObject jsonResponse = new JSONObject(response.body());

        return jsonResponse.getJSONArray("choices").getJSONObject(0).getJSONObject("message").getString("content");
    }
}

Conclusion

By leveraging OpenAI embeddings, storing them in Redis, and performing similarity searches, we can build powerful applications for search, recommendations, and clustering. This approach is scalable and efficient for handling large text datasets.

Next Steps

  • Experiment with different embedding models.
  • Optimize vector storage with Redis indices.
  • Integrate embeddings into applications for intelligent search.

Now, you can convert text into embeddings, store them, and search for relevant results efficiently!

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *