Riak is a distributed key-store that ensures maximal data availability by storing the data across multiple servers. It is an open source NoSQL database with high availability, fault tolerance, and is quite scalable. Riak is written in Erlang and supports automatic data replication and data distribution via consistent hashing. This gives it a purported near-linear performance increase as you add to its capacity. Riak supports variable consistency depending on the configuration. The unique thing about Riak is not the typical master slave model. Riak stores key-values into a bucket then the bucket and key are hashed together which maps the result to a 160 bit integer space. Riak then divides this space into partitions which are managed by virtual nodes. Physical nodes then divide up the virtual nodes among themselves. The way Riak does implicit data replication is that when you write data to one of these virtual partitions it is automatically copied to n more partitions that are adjacent to the written one. Riak is based off a paper written about Amazon’s Dynamo therefore it shares a lot things in common with Dynamo.
By default, Riak is schemaless and does not enable any data types for you. If you want to use these data types listed below, you will have to enable it using the riak-admin CLI tool. To enable a data type, run:
riak-admin bucket-type create <data-type> ‘{“props”: {“datatype”: “datatype”}}’
For example, to enable the map data type you would run:
riak-admin bucket-type create maps ‘{“props”: {“datatype”: “map”}}’
You can read more about here.
In this section we will give a short summary of what each data type supported
by Riak out-of-the-box. We will assume all examples are executed using the
python
driver.
You can read more about each data type
here
A Bucket is essentially a flat namespace in Riak. It allows you to duplicate key names as long as they exist in separate buckets.
# Creates a default bucket with name key
customers = client.bucket(key)
Flags are essentially boolean values and can only be stored within a map.
map.flags[‘enterprise_customer’].disable() # Set the value
map.store() # Writes the value
map.reload().flags[‘enterprise_customer’].value # Get the value
Registers are named binary values like strings. They must also be used in a map exclusively.
map.registers[‘first_name’].assign(‘User’)
map.registers[‘phone_number’].assign(‘5555555555’)
map.store()
Counters: are bucket level and can be used by themselves. Their values can be positive, zero, or negative integers.
bucket = client.bucket_type(‘counters’).bucket(key) # Create the bucket for it
counter = bucket.new(key) # Create the counter
counter.increment(value) # Can be empty for 1 or an integer value
counter.decrement(value)
counter.store() # Write to server
counter.value
Sets are bucket level and can be used by itself or be put into another collection like a bucket. This is a collection of binary values that are unique. Duplicate additions will be ignored by Riak.
set = bucket.new(key)
# or create the bucket
key = client.bucket_type(‘sets’).bucket(‘travel’)
cities_set = key.new(‘cities’)
All data types can be put in a map. This enables a map to be the basic data type of Riak.
customers = client.bucket_type(‘map_bucket’).bucket(‘customers’)
map = customer.net(key)
This is a data structure that is used to estimate (within 2%) of the number of
distinct entries of an input. This data structure is used in queries as a
representation of the HyperLogLog
algorithm.
bucket_type = client.bucket_type('hlls')
bucket = bucket_type.bucket('my_hlls')
hll = bucket.new(key)
Installation process is paraphrased from general installation page For direct information about Ubuntu look here.
Assuming that you are installing on the Ubuntu 16.04 operating system, execute the command
curl -s https://packagecloud.io/install/repositories/basho/riak/script.deb.sh | sudo bash
sudo apt-get install riak
- Riak was written almost exclusively in Erlang and runs on an Erlang
virtual machine. Before building and starting a cluster, there are some
Erlang-VM-related changes that you should make to your configuration files.
In your
riak.conf
file, add the next two lines:erlang.schedulers.force_wakeup_interval = 500 erlang.schedulers.compaction_of_load = false
- Before using the cluster, you will need to set the ring size,
the number of data partitions that comprise the cluster, which
will impacts the scalability and performance of a cluster. This
needs to be done before the cluster receiving any data. Change
the ring creation size parameter by uncommenting it in
riak.conf
and setting it to the desired value, for example:ring_size = 64
- Other available configuration options are here.
- Configuring a Riak cluster involves instructing each node to
listen on a non-local interface (i.e. not
127.0.0.1/localhost
) and then joining all of the nodes together to participate in the cluster. Most configuration changes will be applied to theriak.conf
file located in your/riak/etc
directory. - Configuring the first node. First, you need to stop the Riak
node if it is currently running:
riak stop
- Second, you need to select an IP address and port:
If you are using the protocol buffers interface:
If you are using HTTP interface:
listener.protobuf.internal = <IP Address>:<port number>
listener.http.internal = <IP Address>:<port number>
- Next, you need to name your node. Every node in Riak has a name
associated with it, the default name is [email protected]. You can
change it from the
riak.conf
file:nodename = riak@<IP Address>
- Now, your node is properly configured, you can start it by:
riak start
- If the Riak node has been previously started, you must use the
following command to change the node name and update the node’s
ring file:
riak-admin cluster replace <old nodename> <new nodename>
- As with all cluster changes, you need to view the planned
changes by running the following command to finalize those
changes:
riak-admin cluster plan riak-admin cluster commit
- Repeat the above steps for a second host on the same network, giving the second node a different host/port and node name. Then start the second node.
- Use the following command to join the second node to the first
node, thereby creating an initial Riak cluster:
riak-admin cluster join <first node’s nodename>
- Next, plan and commit the changes using:
riak-admin cluster plan riak-admin cluster commit
- After the last command, you should see Cluster changes committed. You can use the same approach to add more nodes to the cluster.
- You can use
riak-admin
command to check the status from shell command line:bin/riak-admin status | grep ring_members
- If you are running your riak node, just use:
riak-admin status
You can use the riak-admin command from any other node in the cluster to do so:
riak-admin cluster leave <nodename of the leaving node>
Riak support a function called Riak Search with Solr integration.
Riak Search is off by default. To enable it, you need to add the following
line to riak.conf
:
search = on
In the default riak.conf
file, you can find all the Riak Search
configuration settings in riak.conf
. Setting search = on
is required,
but other search settings are optional. You can explore them more at link.
A simple command can help you retrieving a list of all of the configs currently applied in the node:
riak config effective
For detailed information about a particular configuration variable, use
riak config describe <variable> command.
For example:
riak config describe ring_size
Riak has a chkconfig
command that enables you to determine whether the
syntax in your configuration files is correct. It will output config is OK
if your configuration files are syntactically sound:
riak chkconfig
Riak supports a command that you can use to debug your configuration:
riak config generate -l debug
Now that you have read through a quick introduction and how to setup your own Riak cluster, head over to our Warm Up Activity.
- https://github.com/basho/riak
- https://en.wikipedia.org/wiki/Riak
- https://www.techopedia.com/definition/26740/riak
- https://github.com/course-book/basho_docs