Back to Blog
Part II · Chapter 7platformsbuild vs buy

Elasticsearch vs Pinecone vs Weaviate: No Platform Wins on Everything

Every search platform now advertises hybrid support. The implementations behind those APIs differ, and so does the platform that is right for your team.

March 9, 20266 min read

Intercom's engineering team evaluated Pinecone, Milvus, Qdrant, and Weaviate for a deployment serving 300 million embeddings, and picked Elasticsearch anyway (Bhatt, 2025). Not because Elasticsearch had the best vector search in isolation, but because the team already understood it. That outcome is the point of this piece. With the vector database market reaching roughly 2.55 billion USD in 2025 (Global Market Insights, 2025) and every major platform converging on the same feature labels between 2022 and 2025, the useful comparison is no longer which product supports hybrid retrieval. It is how each one implements it, and what that implementation quietly forbids.

Same API label, different machines underneath

Weaviate shipped hybrid search in version 1.17 in 2022, Elasticsearch added a technical preview in 8.9 in 2023, Qdrant's Query API arrived in 1.10 in 2024, Milvus added native BM25 in 2.5 in late 2024, and Redis added FT.HYBRID in 8.4 in 2025. The surfaces converged. The guts did not.

Elasticsearch combines mature BM25, analyzers, and synonym dictionaries with vector retrieval grafted onto Lucene's segmented architecture, which adds per-segment latency overhead on vector queries. Its Retriever API fuses multiple retrievers via Reciprocal Rank Fusion (RRF), with a linear retriever and weighted RRF added in later releases.

Pinecone takes a structurally different route. Sparse and dense vectors live in a single index and are combined by a convex function, score(d)=αsdense+(1α)ssparse\text{score}(d) = \alpha \cdot s_{\text{dense}} + (1 - \alpha) \cdot s_{\text{sparse}}, with alpha controlling the blend. The architecture is clean, but the constraints are load-bearing: the $in and $nin metadata operators are limited to 10,000 values, metadata is capped at 40KB per vector, and cross-namespace queries are not supported. Teams that need per-vector payloads larger than 40KB end up running a second lookup against another database.

Weaviate runs BM25F keyword search and vector search in parallel, then merges results through relativeScoreFusion (the default since 1.24, which normalizes scores to 0 to 1) or rankedFusion (rank-based, similar to RRF). An alpha parameter controls the blend. Multi-tenancy is native: each tenant is a separate shard that can be activated, deactivated, or offloaded to cold storage individually.

Qdrant and Milvus both moved past "pure vector with filters bolted on." Qdrant's Query API (v1.10, 2024) is compositional: named vectors in one collection, multi-stage prefetch, and fusion via RRF or Distribution-Based Score Fusion. A sparse prefetch feeding a dense rescorer is a single request. Milvus v2.5 added native BM25 as a built-in function and supports multi-vector hybrid through AnnSearchRequest per vector field, fused by WeightedRanker or RRFRanker. Its distinctive is the distributed architecture, five node types plus etcd and object storage, which is the reason it shows up most often in billion-scale reference deployments and also the reason one production team hit an etcd leader election storm at 200 million vectors.

The real axis is build versus buy

Feature tables hide the decision that actually matters: how much of the pipeline you want to own. At one end, pgvector gives you vector similarity inside PostgreSQL and leaves the BM25 layer, fusion logic, and query routing to you. In the middle, self-hosted Elasticsearch, OpenSearch, Milvus, Qdrant, and Weaviate provide hybrid APIs but hand you the cluster. At the far end, Pinecone Serverless, Weaviate Cloud, Qdrant Cloud, Elastic Cloud, Azure AI Search, and Vertex AI run the infrastructure and hand you opinionated defaults.

The Intercom choice is one data point on that axis. A team with deep Elasticsearch operations experience and a 200TB existing cluster will rationally refuse to split that expertise across two systems for a marginal relevance gain. A team with no Lucene experience and a strong Kubernetes practice will rationally make the opposite call.

The unresolved question

Three specifics from the chapter complicate any clean recommendation. First, filtered search degrades on every platform: pre-filtering can perform worse than brute force once filter selectivity exceeds 90%, and post-filtering discards most of the nearest neighbors when filters are restrictive. Second, scores across BM25, cosine, and dot product live on different scales, and a study on the BEIR benchmarks found that convex combination of normalized scores outperformed RRF in both in-domain and out-of-domain settings (Bruch, 2023), which puts the field's default fusion choice under real pressure. Third, the build-versus-buy crossover is sharp but moving: managed services are cheaper than self-hosting at small scale, and the break-even toward self-hosting sits near 60 to 80 million queries per month or roughly 100 million vectors with high query volume.

So the question a feature comparison cannot answer is the one that decides the project: at what projected scale, filter selectivity, and fusion sensitivity does your current platform choice start actively costing you, and will that happen before or after you have the team to migrate?

Related chapter

Chapter 7: Choosing Your Search Platform

Every search platform now claims hybrid support, but the implementations behind those APIs differ substantially in architecture, fusion methods, filtering behavior, and cost structure. This chapter provides a vendor-neutral comparison of how each major platform implements hybrid search and decision frameworks organized by use case, scale, and team capability.

Get notified when the book launches.

L

Laszlo Csontos

Author of Designing Hybrid Search Systems. Works on search and retrieval systems, and writes about the engineering trade-offs involved in combining keyword and vector search.