What Is Cross-Encoder Reranking? Full Attention on Query and Document

By | SearchMinistry Media |

A cross-encoder is a neural relevance model that takes a query and document concatenated as a single input, applies full transformer attention across all tokens in both texts, and outputs a scalar relevance score. Cross-encoder reranking applies this model to a small candidate set retrieved by a faster first-stage system to reorder results with high precision.

How Cross-Encoders Score Relevance

Cross-encoders format input as [CLS] query [SEP] document [SEP]. Every query token attends to every document token across all transformer layers. This full attention enables the model to resolve paraphrase, negation, and specific term relationships that bi-encoders miss when compressing each text to a single vector. The [CLS] token's final representation passes through a linear head to produce the relevance score.

Two-Stage Retrieval Architecture

Cross-encoder reranking is deployed in a two-stage pipeline. Stage one uses a fast first-stage retriever (bi-encoder ANN, BM25, or hybrid RRF) to retrieve 50-200 candidate documents with high recall. Stage two runs the cross-encoder over all candidates to reorder them by precise relevance scores. Only the top 5-20 from the reranked list are passed to the language model for answer generation.

SEO Implications

Cross-encoder reranking rewards direct answer presence. A document containing the exact answer in clear prose scores higher than one containing related information requiring inference. Answer-first writing, precise terminology, and focused passage content improve cross-encoder scores. Content that survives the reranking stage is what appears in AI-generated answers. Optimising for the reranker means producing content that answers the question directly and unambiguously within a single retrievable chunk.