Lazy loaded image
Technology
Lazy loaded imageRetrieval Techniques
Words 405Read Time 2 min
Apr 23, 2024
May 22, 2025
type
status
date
slug
summary
tags
category
icon
password

Representative Retrieval Techniques

Technique
Core Concept
Typical Use Cases
Boolean Retrieval
Exact matching using Boolean logic (AND, OR, NOT) on term presence.
Legal, medical, or expert systems requiring strict control.
Vector Space Model (VSM)
Represents documents and queries as weighted term vectors (e.g., using TF-IDF).
Classic academic retrieval systems; early search engines.
BM25
An enhanced probabilistic ranking model that adjusts for term frequency and document length.
Widely used in modern search engines; strong baseline for ranking.
Neural Retrieval (Neural IR)
Leverages deep learning (e.g., BERT, DPR) to model semantic similarity beyond exact words.
QA systems, semantic search, large-scale web search.
Hybrid Retrieval
Combines traditional (e.g., BM25) and neural methods for improved relevance and recall.
State-of-the-art search engines and intelligent assistants.
 
Comparison of Retrieval Methods
Method
Matching Strategy
Ranking Support
Semantic Understanding
Complexity
Boolean Retrieval
Exact term matching
❌ No
❌ No
Low (Simple)
Vector Space Model
TF-IDF + Cosine similarity
✅ Yes
❌ No
Medium
BM25
Enhanced TF-IDF scoring
✅ Yes
❌ No (but high effectiveness)
Medium
Neural Retrieval
Semantic embeddings (e.g., BERT)
✅ Yes
✅ Yes
High (Complex)
Key Takeaways
  • Boolean Retrieval is fast and deterministic, but lacks ranking and fuzziness.
  • VSM and BM25 introduced ranking and weighting, significantly improving user experience.
  • Neural IR adds semantic understanding, allowing queries like "Who wrote Hamlet?" to return documents mentioning Shakespeare.
  • Hybrid Retrieval is now best practice, blending the precision of BM25 with the flexibility of neural models.
 

Boolean Retrieval Overview

Boolean retrieval uses binary vectors to represent the presence or absence of terms in documents—1 if the term occurs, 0 if it does not. Queries are answered by applying Boolean logic operators: AND, OR, and NOT.
Boolean Logic
The AND operation returns 1 only if both inputs are 1.
The OR operation returns 1 if either input is 1.
The NOT operation simply flips the bit—turning 1 into 0 and 0 into 1. These operations allow us to combine term vectors and identify documents that satisfy complex query conditions.
Term-Document Matrix Example
Consider three documents:
  • Norbert lives in Maryland
  • Lisa lives in California
  • Norbert and Lisa are linguists
  • Maryland and California are states
A Boolean term-document matrix would look like this:
 
Boolean Query Processing
(Norbert AND NOT California) OR (Lisa AND linguists)
vectors over documents:
We first compute NOT California:
Then apply Norbert AND NOT California:
Then, Lisa AND linguists:
Final OR operation
Result:
Documents 1 and 3 match the query.
  • Doc 1 because it contains Norbert but not California
  • Doc 3 because it contains both Lisa and linguists
 
上一篇
Different data structures for inverted index postings lists
下一篇
Fasiss