Enterprise Search Access Control: Decide at Index Time, Not Query Time
Enforcing document-level access control inside a vector search index is an architectural decision, not a runtime filter. Getting it wrong leaks data in subtle ways.
At 2% permission selectivity, a top-10 vector search returns an expected 0.2 accessible documents. That single number captures why enterprise knowledge search is harder than it looks: a typical employee is authorized to see about 2% of the corpus, and the math of E[accessible] = K * s says that requesting K = 10 nearest neighbours and then dropping anything the user cannot open leaves you with roughly one accessible result every five queries. This is an architectural decision that must be made at index time, not bolted on as a query-time filter.
Why this is different from e-commerce
E-commerce search largely assumes every user can see every product. Enterprise search assumes the opposite: any given query must return results filtered to what that specific user is allowed to see. And the access model is not a small number of tiers. It is often a per-document ACL drawn from Active Directory, SharePoint permissions, Slack channel membership, Git repository access, and a dozen other systems.
The scale of the problem is not small. Productivity drains, including searching for information, consume roughly 25% of working time (APQC, 2021), and 62% of knowledge workers report struggling with too much time spent searching for information (Microsoft, 2023). The average organization now deploys 101 distinct applications (Okta, 2025), each a potential content silo that enterprise search must reach, and each with its own permission model.
Why the obvious approaches both fail
A naive approach applies ACLs as a post-filter on the retrieval results: retrieve top-K from the vector index, then drop documents the user cannot see. The problem is the expected-value calculation above. With K = 10 and selectivity s = 0.02, you get 0.2 accessible results on average. To reliably return 10 accessible results you would need to over-fetch by a factor of 1/s, which is K = 500 at 2% selectivity and K = 1,000 at 1%. Even then, the formula describes an expectation, not a lower bound, so the result set can still come back thin or empty while relevant accessible documents sit just beyond the fetched window. The user sees poor results and has no way to know why.
The apparent fix is to move the filter earlier: restrict the candidate set to permitted documents before the ANN search begins. This is called pre-filtering, and it avoids the empty-result problem by construction. But it pays for that guarantee with a performance cliff. Azure AI Search benchmarks show pre-filtering running roughly 7 times slower than post-filtering at 1 million vectors when the filter passes under 2% of documents (Microsoft, 2025). At 1 billion vectors, the same 7x penalty kicks in at 10% selectivity. A vector index tuned for unfiltered traversal does not gracefully absorb a highly restrictive predicate; the filter changes both the size and the distribution of qualified data, and graph connectivity degrades with it.
Neither knob is free
So the two intuitive choices are a result set that is almost always too small (post-filtering) and a latency curve that collapses exactly where enterprise permissions live (pre-filtering). This is the shape of the problem, not a tuning issue. Filter selectivity in enterprise ACLs sits in the 1% to 5% band for most users, which is precisely where both approaches are at their worst: post-filtering's expected yield drops toward zero, and pre-filtering's latency penalty reaches its peak.
A third axis sometimes cited is partitioning the index by broad permission group and applying fine-grained filtering within each partition. Research on RBAC-aware partitioning reports up to 13.5x lower query latency than row-level post-filtering, with only a 1.24x memory increase and a 90.4% reduction in additional memory versus dedicated per-role indexes (Zhong, 2025). That helps when groups are coarse and relatively stable. It helps much less when every document has its own ACL and users cross dozens of groups, because the partition count starts to chase the ACL count.
Consequences for the system
Two design constraints follow regardless of which filtering strategy a team lands on. First, the connector infrastructure that ingests from source systems must also capture and keep current the ACL graph for each document. Stale ACLs, not algorithmic leakage, are the most common cause of enterprise search leaks. Second, the evaluation and monitoring stack needs to understand access control. Zero-result rate and recall must be computed per-user or per-role, not aggregate, because a user who never sees relevant documents is indistinguishable in aggregate metrics from a user whose permissions legitimately exclude them.
Pre-filtering cripples latency, post-filtering empties the result set, and partition-per-ACL explodes operationally. Chapter 20 lays out which of these you actually want, and why the answer depends on ACL cardinality.
Related chapter
Chapter 20: Enterprise Knowledge Search
No other hybrid search domain presents the same operational surface area as enterprise knowledge: content lives in dozens of source systems that were built in isolation and never intended to interoperate. This chapter addresses retrieval over heterogeneous document types, enforcing per-document access control inside vector indexes, meeting regulatory and audit requirements, and building the connector and ingestion plumbing that makes the content reachable in the first place.
Get notified when the book launches.
Laszlo Csontos
Author of Designing Hybrid Search Systems. Works on search and retrieval systems, and writes about the engineering trade-offs involved in combining keyword and vector search.
Related Posts
Upgrading the LLM is the most visible decision in a RAG pipeline. Upgrading retrieval usually moves output quality more, and for less money.
December 22, 2025
A product query like 'red running shoes under $120' mixes semantic intent with exact filtering. Neither pure keyword nor pure vector search handles that, and hybrid is only a partial answer.
December 15, 2025