Lazy loaded image
Technology
Lazy loaded imageDifferent data structures for inverted index postings lists
Words 701Read Time 2 min
Apr 23, 2024
Apr 23, 2025
type
status
date
slug
summary
tags
category
icon
password

Goal: Boolean AND Query

You want to process the query:
Brutus AND Calpurnia
This means:
Find all documents that contain both the term "Brutus" and the term "Calpurnia".
Given the postings lists:
Now, let’s walk through how this query would be processed using different data structures:

① Using a Linked List

Data Structure:

Each postings list is a singly linked list like this:

Query Algorithm:

Use two pointers to traverse both lists:

Step-by-Step Execution:

  • Compare 1 and 2 → 1 < 2 → move p1
  • Compare 2 and 2 → match → add to result → move both
  • Compare 4 and 31 → move p1
  • Compare 11 and 31 → move p1
  • Compare 31 and 31 → match → add to result → move both
  • p2 is now 54, p1 is 45 → keep going until one ends

Final Result:

Summary:

  • Easy to insert new nodes
  • Slower search (must follow next pointers linearly)
  • No fast skipping

② Using a Sorted Array

Data Structure:

Postings are stored in sorted arrays:

Query Algorithm:

Use two index variables:

Step-by-Step Execution:

  • Compare 1 and 2 → move i
  • Compare 2 and 2 → match → add to result → move both
  • Compare 4 and 31 → move i
  • Compare 11 and 31 → move i
  • Compare 31 and 31 → match → add to result → move both
  • Calpurnia now at 54, Brutus at 45 → keep comparing
  • End when one list ends

Final Result:

Summary:

  • Faster access (index-based)
  • Works well with CPU caching
  • Insertions are expensive (requires re-sorting)

③ Using a Skip List (Skip Array)

Data Structure:

Like a sorted array, but with skip pointers to let us “jump ahead”:
Skip pointers are like express lanes: if a value is too small, you can jump past it instead of stepping through every element.

Query Algorithm (with skip logic):

Note:
  • skips1 and skips2 are dictionaries where keys are current indices and values are the skip destinations.

Final Result:

Same as before:
But with fewer comparisons if the postings lists are long.

Summary:

  • Best query speed (especially for long lists)
  • Used in real search engines (like Lucene)
  • More complex to implement and maintain

Comparison Table:

Structure
Insert Speed
Query Speed
Memory Use
Use Case
Linked List
Easy
Slow
Low
Teaching, simple systems
Sorted Array
Hard
Medium
Higher
Foundation of IR systems
Skip List/Array
Balanced
Fast
Higher
Production search engines
 
上一篇
Inverted Index: A Foundation for Search Engines
下一篇
Retrieval Techniques