-
Notifications
You must be signed in to change notification settings - Fork 0
Active Anti Entropy
Active Anti Entropy (AAE) is a fairly simple idea, inspired by Riak.
At a basic level, each Cask node periodically goes through the files that it has stored, looks at each, makes sure no corruption has happened, and makes sure that the file is replicated to enough other nodes in the cluster.
Without AAE, if you have a cluster with 10 nodes and every file is replicated to 3 different nodes and one of those nodes catches fire, you won't have permanently lost any data, but every file that had a replica stored on that node is now down to only two replicas in the cluster. If you lose another node, you still won't have permanently lost any data, but some portion of the files on the remaining eight nodes will be down to only a single copy. Losing a third node in the cluster will then basically guarantee that you see permanent data loss.
AAE runs in the background and attempts to reverse the damage of lost nodes or other dangers like hard drive corruption on a node that might otherwise not be detected until it's too late.
In that 10 node cluster, after the first node dies, with AAE, the nine remaining nodes, over some period of time (determined by the AAE interval, the number of files you have stored, etc) will visit each of the files that only have 2 replicas in the cluster, notice that problem, and save a replica off to another node. Then, if a second node fails after that, again, some files will be down to two replicas, but none will be reduced to 1 copy.
Obviously, a cluster can only sustain so much loss. If you keep losing nodes and don't replace them, eventually your remaining nodes will fill up and you will lose data. AAE just works to prevent data loss at earlier stages. If you lose a bunch of nodes in less time than AAE can repair things, you also might lose data.
If a cluster grows, AAE also helps populate the new node and shift data around to create balance.
AAE running in the background does have some costs. It goes through every file the node has stored, calculates a SHA1 hash of the content and compares it to the key that the file was stored under. If they differ, it knows that the file is corrupted and will attempt to fetch a clean copy from other nodes. If there's no corruption, or the corruption is successfully repaired, it then calculates which other nodes ought to have a copy of the file, queries those nodes and sends them copies if they don't have it. Finally, if it determines that all N copies of the file actually ought to be on other nodes than this one (this will happen for some files when a new node is added to the cluster) and verifies that those nodes have good copies, it deletes the local one to free up space. This whole process obviously consumes CPU cycles and network traffic. You will want to configure the AAE interval (how many seconds it waits between each file it checks) to balance those costs against the data loss prevention and cluster balancing that it provides. If you lose a node or add a new node, eg, it might be a good idea to temporarily decrease the AAE interval on other nodes to speed up the cluster's recovery and rebalancing, then you can slow it back down while the cluster stays relatively stable.