Raft Introduction
Raft is a distributed consensus algorithm designed to be easily understood. It solves the problem of getting multiple servers to agree on a shared state even in face of failures. There are two types of failures a server may undergo, a crash failure or a Byzantine failure. A crash failure occurs when a server abruptly stops and does not resume. Byzantine failures are failures in which absolutely no conditions are imposed. In a Byzantine failure, a server can inconsistently appear both failed and functioning to failure-detection systems; so it is more difficult to handle. This failure takes its name from an allegory, the “Byzantine generals problem”.
Byzantine Generals Problem
Consider a number of generals who are attacking a fortress. The generals must decide as a group whether to attack or retreat; some may prefer to attack, while others prefer to retreat. The important thing is that all generals agree on a common decision, for a halfhearted attck by a few generals would become a rout, and would be worse than either a coordinated attack or a coordinated retreat.
If all generals attack in coordination, the battle is won (left). If two generals falsely declare that they intend to attack, but instead retreat, the battle is lost (right).
The problem is complicated by the presence of treacherous generals who may cast a vote selectively. For instance, if nine generals are voting, four of whom support attacking while four others are in favor of retreat, the ninth general may send a vote of retreat to those generals in favor of retreat, and a vote of attack to others. Those who received a retreat vote from the ninth general will retreat, while the rest will attack. The problem is complicated further by the generals being physically seperated and having to send their votes via messengers who may fail to deliver votes or may forge false votes.
Resolution: Consensus
Byzantine fault tolerance can be achieved if the loyal (non-faulty) generals have a majority agreement on their strategy.