Skip to content

Mv verify compactions

Matthew Von-Maszewski edited this page Jun 21, 2013 · 7 revisions

Status

  • merged to "master" June 20, 2013
  • developer tested code checked-in June 18, 2013
  • development started June 17, 2013

History / Context

In Riak 1.2 we added code to have leveldb automatically shunt corrupted blocks to the lost/BLOCKS.bad file during a compaction. This was to keep the compactions from going into an infinite loop over an issue that A) read repair and AAE could fix behind the scenes and B) took up a bunch of customer support / engineering time to help customers fix manually.

Unfortunately, we did not realize that only one of two corruption tests was actually active during a compaction. There is a CRC test that applies to all blocks, including file metadata. Compression logic has a hash test that applies only to compressed data blocks. The CRC test was not active by default. Sadly, leveldb makes limited defensive tests beyond the CRC. A corrupted disk file could readily result in a leveldb / Riak crash ... unless the bad block happened to be detected by the compression hash test.

Google's answer to this problem is the paranoid_checks option, which defaults to false. Unfortunately setting this to true activates not only the compaction CRC test but also a CRC test of the recovery log. A CRC failure in the recovery log after a crash is expected, and utilized by the existing code logic to enable automated recovery upon next start up. paranoid_checks option will actually stop the automatic recovery if set to true. This second behavior is undesired.

Branch description

Clone this wiki locally