type
status
date
slug
summary
tags
category
icon
password
Goal: Boolean AND Query
You want to process the query:
Brutus AND Calpurnia
This means:
Find all documents that contain both the term "Brutus" and the term "Calpurnia".
Given the postings lists:
Now, let’s walk through how this query would be processed using different data structures:
① Using a Linked List
Data Structure:
Each postings list is a singly linked list like this:
Query Algorithm:
Use two pointers to traverse both lists:
Step-by-Step Execution:
- Compare 1 and 2 → 1 < 2 → move p1
- Compare 2 and 2 → match → add to result → move both
- Compare 4 and 31 → move p1
- Compare 11 and 31 → move p1
- Compare 31 and 31 → match → add to result → move both
- p2 is now 54, p1 is 45 → keep going until one ends
Final Result:
Summary:
- Easy to insert new nodes
- Slower search (must follow next pointers linearly)
- No fast skipping
② Using a Sorted Array
Data Structure:
Postings are stored in sorted arrays:
Query Algorithm:
Use two index variables:
Step-by-Step Execution:
- Compare 1 and 2 → move i
- Compare 2 and 2 → match → add to result → move both
- Compare 4 and 31 → move i
- Compare 11 and 31 → move i
- Compare 31 and 31 → match → add to result → move both
- Calpurnia now at 54, Brutus at 45 → keep comparing
- End when one list ends
Final Result:
Summary:
- Faster access (index-based)
- Works well with CPU caching
- Insertions are expensive (requires re-sorting)
③ Using a Skip List (Skip Array)
Data Structure:
Like a sorted array, but with skip pointers to let us “jump ahead”:
Skip pointers are like express lanes: if a value is too small, you can jump past it instead of stepping through every element.
Query Algorithm (with skip logic):
Note:
skips1
andskips2
are dictionaries where keys are current indices and values are the skip destinations.
Final Result:
Same as before:
But with fewer comparisons if the postings lists are long.
Summary:
- Best query speed (especially for long lists)
- Used in real search engines (like Lucene)
- More complex to implement and maintain
Comparison Table:
Structure | Insert Speed | Query Speed | Memory Use | Use Case |
Linked List | Easy | Slow | Low | Teaching, simple systems |
Sorted Array | Hard | Medium | Higher | Foundation of IR systems |
Skip List/Array | Balanced | Fast | Higher | Production search engines |
- Author:Entropyobserver
- URL:https://tangly1024.com/article/1ded698f-3512-80ba-9616-f9e44213fd81
- Copyright:All articles in this blog, except for special statements, adopt BY-NC-SA agreement. Please indicate the source!