Embeddings & Vector Search: The Foundation of Semantic AI

Understand how text embeddings capture meaning, how vector databases enable semantic search, and how to build similarity systems that power recommendations, RAG, and search.

18 min read

EmbeddingsVector SearchFAISSSemantic SearchPython

What Are Embeddings?

An embedding is a dense vector representation of data (text, images, audio) where similar items are close in vector space. Unlike keyword search, embeddings capture semantic meaning — 'king' is close to 'monarch', 'happy' is close to 'joyful'. Modern embedding models map text to 768-3072 dimensional vectors using transformer encoders.

Generating Embeddings

python

# Using sentence-transformers (open source, runs locally)
from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")  # 384-dim, fast

sentences = [
    "The cat sat on the mat",
    "A feline rested on the rug",
    "Python is a programming language",
    "JavaScript runs in the browser",
]

embeddings = model.encode(sentences)
print(f"Shape: {embeddings.shape}")  # (4, 384)

# Compute cosine similarity
def cosine_sim(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Semantically similar sentences are close
print(cosine_sim(embeddings[0], embeddings[1]))  # ~0.75 (cat/feline)
print(cosine_sim(embeddings[0], embeddings[2]))  # ~0.12 (cat/Python)
print(cosine_sim(embeddings[2], embeddings[3]))  # ~0.52 (programming)

python

# Using OpenAI embeddings (API-based, higher quality)
from openai import OpenAI

client = OpenAI()

response = client.embeddings.create(
    model="text-embedding-3-small",     # 1536-dim
    input=["The cat sat on the mat", "A feline rested on the rug"],
)

embedding_1 = response.data[0].embedding  # List of 1536 floats
embedding_2 = response.data[1].embedding

Vector Search with FAISS

FAISS (Facebook AI Similarity Search) enables fast nearest-neighbor search over millions of vectors. It supports exact search (IndexFlatL2), approximate search (IndexIVFFlat for speed), and product quantization (IndexIVFPQ for memory). For most applications, IVF with ~100 centroids gives 95%+ recall at 10-100x speedup.

python

import faiss
import numpy as np

# Create and populate index
dimension = 384
num_vectors = 100_000

# Generate random data (in practice, these are your embeddings)
data = np.random.random((num_vectors, dimension)).astype("float32")

# Exact search — brute force, perfect recall
index_flat = faiss.IndexFlatL2(dimension)
index_flat.add(data)

# Query
query = np.random.random((1, dimension)).astype("float32")
distances, indices = index_flat.search(query, k=5)
print(f"5 nearest neighbors: {indices[0]}")

# Approximate search — much faster at scale
nlist = 100  # Number of Voronoi cells
quantizer = faiss.IndexFlatL2(dimension)
index_ivf = faiss.IndexIVFFlat(quantizer, dimension, nlist)
index_ivf.train(data)
index_ivf.add(data)
index_ivf.nprobe = 10  # Search 10 cells (trade-off: speed vs recall)

distances, indices = index_ivf.search(query, k=5)

Embedding model selection: all-MiniLM-L6-v2 for speed (384d). text-embedding-3-small for quality (1536d). BGE-large or E5-large for state-of-the-art retrieval. For multilingual: multilingual-e5-large. Always normalize embeddings and use cosine similarity.

Building a Semantic Search Engine

python

class SemanticSearch:
    """Simple semantic search engine using embeddings + FAISS."""

    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
        self.index = None
        self.documents = []

    def index_documents(self, docs: list[str]):
        """Embed and index a list of documents."""
        self.documents = docs
        embeddings = self.model.encode(docs, normalize_embeddings=True)
        dimension = embeddings.shape[1]

        self.index = faiss.IndexFlatIP(dimension)  # Inner product = cosine (normalized)
        self.index.add(embeddings.astype("float32"))

    def search(self, query: str, top_k: int = 5) -> list[tuple[str, float]]:
        """Return top-k most similar documents with scores."""
        query_embedding = self.model.encode(
            [query], normalize_embeddings=True
        ).astype("float32")

        scores, indices = self.index.search(query_embedding, top_k)
        return [(self.documents[i], scores[0][j]) for j, i in enumerate(indices[0])]


# Usage
search = SemanticSearch()
search.index_documents([
    "Binary search runs in O(log n) time",
    "Quick sort has O(n log n) average case",
    "Hash tables provide O(1) average lookup",
    "BFS finds shortest paths in unweighted graphs",
    "Dynamic programming solves overlapping subproblems",
])

results = search.search("fastest way to find an element")
for doc, score in results:
    print(f"[{score:.3f}] {doc}")

Vector databases (Pinecone, Weaviate, Qdrant, ChromaDB) add metadata filtering, persistence, and distributed scaling on top of core vector search. For < 1M vectors, FAISS in-memory is usually sufficient. For larger scale, use a dedicated vector DB.

Algorithms & DSBeginner

Arrays & Strings: The Foundation of DSA

Master the most fundamental data structure — arrays and strings. Learn traversal patterns, two-pointer technique, sliding window, and common interview patterns.

ArraysStringsTwo PointersSliding Window

15 min read

Read

Algorithms & DSBeginner

Hash Tables: O(1) Average Lookup Explained

Understand how hash tables work internally — hash functions, collision resolution, load factors — and master hash map patterns for solving problems efficiently.

Hash TablesHash MapsSetsCounting

14 min read

Read

Algorithms & DSBeginner

Linked Lists: Pointers, Patterns & Pitfalls

Master singly and doubly linked lists — insertion, deletion, reversal, cycle detection, and the fast/slow pointer technique that solves countless problems.

Linked ListsPointersFast SlowReversal

16 min read

Read