In some setups the most recent data (e.g. logs) are more often queried / accessed / written than older data that is mostly read. One example for a hot / warm setup is when using daily Logstash logs.
A hot / warm architecture requires the use of Rack / Shard Allocation Awareness.
The Docker Compose environment described in the Setup chapter, defines three Elasticsearch nodes in the docker-compose.yml
.
docker-compose up elasticsearch01 elasticsearch02 elasticsearch03 cerebro
Once all containers started, check the Elasticsearch cluster via Cerebro.
Each container includes specific rack attributes that are used by Elasticsearch to distribute shards onto different nodes. The attribute node.attr.data
is a custom named attribute (in this case data
) we can use to attach labels / values to a node. This configuration setting is best configured in the elasticsearch.yml
. The docker-compose.yml
of the Docker Compose environment already assigned these settings to all Elasticsearch nodes.
- container
elasticsearch01
has valuehot
- containers
elasticsearch02
andelasticsearch03
have valuewarm
in a bigger cluster more than one node should have an assigned attribute
hot
, especially if the indices have replicas or otherwise replica shards are not assigned to any node.The node values
hot
andwarm
have no inherent meaning to Elasticsearch.
One way to create indices on hot
nodes is to specify the index routing allocation parameter in the index settings, see the Elasticsearch documentation. For example to create a new index (hot-warm-index
) on a hot
node add the index.routing.allocation.require.data
property to the index settings.
✅ To create a new index named hot-warm-index
on a hot
node run the following command:
curl -X PUT 'http://localhost:9200/hot-warm-index' -H 'Content-Type: application/json' -d '{
"settings": {
"index.routing.allocation.require.data": "hot",
"index.number_of_shards": 2,
"index.number_of_replicas": 0
}
}'
This creates a new index with two primary shards without a replica. Both shards should be located on the same hot
node.
Alternatively this allocation setting can be given in an index template, all new indices that use this template automatically are created on the hot node. Once the index and its shards are not considered hot
anymore, the index can be adjusted by moving them over to warm
nodes, e.g. daily log indices after a few days.
Use the Update Index API to change the allocation routing setting from hot
to warm
. This is an index setting that can be modified while the index already exists instead of its creation time.
To set the allocation attribute from hot
to warm
, update the settings of the index hot-warm-index
with the following command:
✅ Update the allocation routing setting to warm
for index hot-warm-index
.
curl -X PUT 'http://localhost:9200/hot-warm-index/_settings' -H 'Content-Type: application/json' -d '{
"index.routing.allocation.require.data": "warm"
}'
It may take a few minutes until the shards move from the hot
node to a warm
node due to allocation delay settings. All shards of this index should then have been moved to nodes with the attribute warm
.
The same applies for any other phase that might be introduced, e.g. cold
. A typical setup with log data may contain the following phases: hot
, warm
, delete
. The delete
phase then marks the end of the index, it is then deleted, e.g. after 30 days or any other interval. A delete_index action with the Elasticsearch Curator could handle this case.
To automate the process of transferring shards from hot
to warm
or to delete existing indices, the Elasticsearch Curator CLI can be used. See the Curator chapter on how to set it up.
✅ Start the Curator and connect to it
docker-compose run curator /bin/ash
Inside the container there is the /config
folder with the ES configuration file config.yml
. The action file index_rotate.yml
can also be found there and is used in this example.
To check the output of the Curator in a dry run, execute the following command in the curator container:
✅ Run curator in dry mode
curator --dry-run --config config.yml index_rotate.yml
This outputs the actions the Curator would process, but does not apply them yet.
In order to see the curator in action, create a couple of new logstash indices with prefix logstash-
and a date. The current action configuration in index_rotate.yml
keeps one daily logstash-*
prefixed index on the hot
node, thereafter these indices are moved to the warm
nodes. There is another defined action (2
) that uses the force merge action to merge the segments of indices older than 2 days. This optimizes the number of segments for each index:
For example execute the following commands to create indices logstash-2020.11.24
, logstash-2020.11.25
, logstash-2020.11.26
:
Create a few indices using the index pattern
logstash-YYYY.MM.DD
. Note, try to choose some current dates to see the effect.
✅ Create the following logstash-*
prefixed indices
curl -X PUT 'http://localhost:9200/logstash-2020.11.26' -H 'Content-Type: application/json' -d '{
"settings": {
"index.routing.allocation.require.data": "hot",
"index.number_of_shards": 2
}
}'
curl -X PUT 'http://localhost:9200/logstash-2020.11.25' -H 'Content-Type: application/json' -d '{
"settings": {
"index.routing.allocation.require.data": "hot",
"index.number_of_shards": 2
}
}'
curl -X PUT 'http://localhost:9200/logstash-2020.11.24' -H 'Content-Type: application/json' -d '{
"settings": {
"index.routing.allocation.require.data": "hot",
"index.number_of_shards": 2
}
}'
Check the output of the Curator in dry mode again, it should not apply anything yet. Once the changes look good, run the curator again without dry run option.
✅ Run the Curator to apply all changes
/usr/local/bin/curator --config config.yml index_rotate.yml
This should move indices older than one day from hot
to warm
, moved indices also merge their segments to one segment.