-
Notifications
You must be signed in to change notification settings - Fork 392
Home
Riak Core is the framework upon which Riak, Basho's distributed Key/Value datastore, is built. This Wiki is meant to help you understand the riak core and its source code.
Core is written in Erlang, using its OTP framework. You will need to understand Erlang and OTP to understand how Riak Core works. The Erlang/OTP in a Nutshell page is a good starting point.
For an up to date list of the applications that riak_core depends on, see the rebar configuration file
Information about existing partitions and the virtual nodes responsible to handle them is stored in a data structure called the Ring. The number of partitions in the ring is decided at installation time and stays fixed. There is current work being done to allow it to change size. We also store data about buckets in it. This might change in the future as it limits scalability. Information about ongoing ownership transitions are also stored in the ring, as well as the designated Claimant node. The ring structure is defined in riak_core_ring (it's the chstate_v2 record). Also see riak_core_ring_manager.
Each virtual node (or vnode) is a worker responsible for a given partition in the ring. A riak_core node will host any number of vnodes, depending on the size of the ring and the number of nodes in the cluster. Basically, around (size of ring) / (number of nodes). What vnodes do will vary per application. In general, they handle commands that need to be executed for the partition they own. See more about their API in the vnode page
Buckets are nothing but namespaces with configuration properties. Your application may use only one, or whatever number works best. See riak_core_bucket
Changes to the ring need to be propagated throughout the cluster. This is what the gossip protocol is for. The main module involved is riak_core_gossip.
Ownership of a partition may be transferred from one virtual node to another under certain failure scenarios to guarantee high availability. If a node goes down unexpectedly, the partitions owned by the virtual nodes it contained will be temporarily handled by virtual nodes in other physical nodes. If the original node comes back up, ownership will eventually be transferred back to the original owners, also called primary virtual nodes. The virtual nodes that took over ownership in that scenario are called secondary virtual nodes. The process by which this ownership is negotiated and any relevant data is transferred to accomplish that is what we call a handoff. Transfer of ownership may also occur when adding or removing physical nodes to the cluster. See all the gory details in the handoff page
At any given point, one node from the cluster is assigned the task of handling ownership changes when other nodes are added or removed from the cluster. This process also involves keeping the cluster nicely balanced with partitions well spread out among all nodes.
At the top, we have the riak_core_sup module. Underneath that supervisor, we have the following process hierarchy:
- riak_core_sysmon_minder
- riak_core_vnode_sup
- riak_core_vnode
- [riak_core_vnode_worker_pool](https://github.com/basho/riak_core/blob/master/src/riak_core_vnode_worker_pool.erl)
- [riak_core_vnode_worker](https://github.com/basho/riak_core/blob/master/src/riak_core_vnode_worker.erl)
- ... (more workers)
- ... (more vnodes and their workers)
- riak_core_eventhandler_sup
- riak_core_ring_events
- riak_core_ring_manager
- riak_core_vnode_proxy_sup
- riak_core_vnode_proxy
- ... (more vnode proxies)
- riak_core_node_watcher_events
- riak_core_node_watcher
- riak_core_vnode_manager
- riak_core_capability
- riak_core_handoff_sup
- riak_core_handoff_receiver_sup
- riak_core_handoff_sender_sup
- riak_core_handoff_listener_sup
- riak_core_handoff_manager
- riak_core_gossip
- riak_core_claimant
- riak_core_stat_sup
- folsom_sup
- riak_core_stats_sup
- riak_core_stat_calc_sup
For more detailed information about the modules and functions in the riak_core source code, go to the documentation generated from the source using Erlang's edoc: