Chapter 3 Storage and Retrieval
-
Hash indexes
-
Append only logs are good becasue appending and merging are sequential, hence faster than random writes.
-
Concurrency and crash recovery is much easier if the files are append-only
-
If we only append, we risk running out of storage
-
One strategy is to divide data into segments, close segment when it reaches a certain size and perform compaction on these segments
-
Compaction means throwing away duplicate keys in the log.
-
Sorted String tables and Log-structured Merge Tree
-
Keep sorted key-value files in storage
-
Keep an index of some of the keys in memory.
-
on retrieval, if the key is in memory, just get it from memory
-
if not in memory, we get the closest and then start scanning storage from there.
-
The index can be a balanced binary search tree (red-black or AVL) so that we insert randomly but can access in sorted order easily.
-
B-Trees
-
If an index stores the row value for the key inside the index (as opposed to storing a reference to the value), it's called a clustered index.
-
If the index stores only some columns of the row inside the index, it's a covering index.