Skip to content
lalinsky edited this page Apr 30, 2011 · 3 revisions

The high-level structure is based on how Lucene manages its index. The index consists of multiple segments, which are regularly merged together. All the segments are write-once data structures, i.e. they are never modified. New fingerprints are added into new segments and over the time they get merged into larger segments. Since fingerprints are never modified, we do not have to handle updates. There is a separate segment file with deleted fingerprints. There are ignored in search results and removed from the index in the process of segment merging.

Segments File

  • SegmentsFileNextSegmentNumber, SegmentCount, <SegmentName, SegmentSize>SegmentCount
  • NextSegmentNumber, SegmentCount, SegSizeVInt32
  • SegmentNameString

This contains the list of segment files with the actual index data. There are SegmentCount segments in the index. The next segment added to the index will have number NextSegmentNumber.

Per-segment Files

Index

  • SegmentIndexFileBlockSize, KeyCount, KeyDeltaKeyCount
  • BlockSize, IndexInterval, KeyCount, KeyDeltaVInt32

This is a list of blocks in the data file. KeyDelta is the difference between the current key and the previous one. Each key in this file corresponds to a block in the data file (the key in this file is the first key of the block). The block corresponding to the ith key can be found at the offset i * BlockSize in the data file.

Data

  • SegmentDataFileBlockKeyCount
  • BlockBlockKeyCount, <KeyDelta, ValueDelta>BlockKeyCount
  • BlockKeyCountInt16
  • KeyDelta, ValueDeltaVInt32

Each Block has exactly BlockSize bytes. If the actual data is smaller, the rest of the block is zero-padded. The number of blocks in the file is the same as KeyCount for the zero-level index.

KeyDelta is the difference between the key and the previous key. The first KeyDelta is always 0, you should use the index to determine the key. ValueDelta is the difference between the value and the previous value within the same key.

Clone this wiki locally